Occasionally, that data isn’t available, says Priyam Pandey, a Ph.D. student in data science and engineering at the University of South Dakota. “In some scenarios, we can’t wait long enough to collect a data set that is big enough.”

Consider the COVID-19 pandemic. At first, health care professionals had gathered relatively little medical information on the patients with the disease. With enough data to train the machine learning models, “AI would have helped in mass screening and assisting doctors,” Pandey says.

Pandey uses lung X-ray images from patients with COVID-19 in her research on data sets and machine learning. A talk on her research earned her top prize in the USD Graduate School’s Three-Minute Thesis Competition.

Caused by a novel coronavirus that results in serious respiratory illness, COVID-19 presented a challenge to health care workers trying to detect the disease in patients.

Pandey developed an AI system that used only a small number of images to classify X-rays of healthy lungs versus those of patients with the virus. This involved coding an algorithm that created several synthetic X-ray images identical to one original healthy image. These new images were then used to train the AI model to recognize an anomalous unhealthy image.

“Creating these synthetic images saves us from having to collect large amounts of data,” Pandey says.

With the early days of the COVID-19 pandemic behind us, Pandey could now compare the findings of her newly designed machine learning system to those from published studies on AI systems that gathered larger amounts of data. These models used many thousands of images to gain an accuracy rate between 97% and 98%. Pandey’s model had nearly identical results with much less data.

“This model gave 97.2% accuracy by using only 160 images,” she says.

The next step in Pandey’s research is to extend these findings to other diseases affecting the lungs, such as bronchitis, pneumonia, tuberculosis and lung cancer. “I’ve already got good results on tuberculosis,” she says.

Pandey works closely with KC Santosh, Ph.D., associate professor and chair of the Department of Computer Science at USD. When she joined the program in fall of 2023, Pandey approached Santosh with the idea to work on the concept of data set size. “He helped me find the whole pathway to make this research happen,” she says.

“The concept of using limited samples without compromising performance brings machine learning models to the point of continuous cycles of exploration, experimentation and improvement,” says Santosh. “I've been engaged in active learning, also known as human-in-the-loop machine learning, since 2011, and Priyam's work is a key element of this big project.”

The data and engineering graduate degree program at USD offers a Ph.D. and is a collaborative program with the South Dakota School of Mines. USD began offering the degree to address the critical need across disciplines and industries for trained data engineers and data analysts to analyze and process a wide range of information.

Pandey says she finds herself challenged and supported in the data science and engineering program. “We have so many students doing research,” she says. “If we are stuck, we can always ask for help. And our professors are always available. I love the work environment.”

Press Contact
Hanna DeLange
Contact Email usdnews@usd.edu
Contact Website website