Could artificial intelligence (AI) software replace the second reader utilized in many breast cancer screening programs in Europe? Although more research is needed to validate the results, studies presented at ECR 2021 from German, Dutch, and French patient cohorts showed that AI may be up to the job.
"Replacing a human reader with AI in a double-reading setting in a breast cancer screening program may in the near future become an effective strategy to halve radiologists' workload," said Dr. Sylvia Heywang-Köbrunner of Brustdiagnostik München in Germany. "It appears possible to maintain sensitivity, at the cost of more cases sent to consensus."
Along with co-authors from AI software developer Screenpoint Medical, she performed a retrospective analysis of a sample of four-view screening digital mammography exams acquired at the German institution. The group found that AI and a first reader could provide the same sensitivity as double reading, but with a slight -- but not statistically significant -- decrease in specificity.
To assess the impact of replacing a human reader with an AI system in a double-reading breast cancer screening program, the researchers first gathered a cohort of 18,036 consecutive four-view digital mammograms that were acquired between January and November 2018. The final study cohort included all 114 screening-detected cancers, as well as 200 randomly selected benign cases and 2,000 randomly selected normal mammograms.
All of these cases were analyzed by the Transpara AI software (Screenpoint). The researchers then retrospectively compared the performance of AI and the first reader with the traditional double-reading regimen. The software was set to operate at the same sensitivity as the other human reader.
Performance on sample of screening mammograms in Germany | ||
Double reading | AI + first reader | |
Sensitivity | 98.2% | 98.2% |
Specificity | 82.1% | 80.1% |
The 2% relative difference in specificity was not statistically significant (p = 0.08).
Heywang-Köbrunner acknowledged the limitations of the study, including its retrospective nature. In the future, the researchers would like to include more cases and different readers in the analysis, as well as include interval and next-round screen-detected cancers, she said.
Detecting more cancers
The combination of AI and a first reader may increase sensitivity by detecting more interval cancers and next-round screening-detected cancers, according to another presentation by Dr. Ritse Mann, PhD, of Radboud University Medical Center in Nijmegen, the Netherlands.
Mann and co-authors from Radboud and Screenpoint retrospectively assessed the use of AI in a consecutive screening cohort of 23,035 screening digital mammography exams performed between September 2016 to 2017. These cases included 159 screening-detected cancers, 48 interval cancers, and 62 cancers that were subsequently detected in the next screening round.
The researchers compared the traditional approach of independent double reading with the use of AI to replace a human reader, using the human recall rate as the operating point for the software.
Performance in retrospective analysis of Dutch screening mammography cohort | ||||
Reader 1 | AI alone | Double reading after consensus | Reader 1 + AI before consensus | |
Sensitivity | 52.4% | 56.1% | 59.1% | 66.5% |
Recall rate | 3% | 3% | 3.1% | 5.2% |
Screening-detected cancers | 138 | 127 | 159 | 154 |
Interval cancers detected | 2 | 14 | 0 | 15 |
Cancers detected on next screening round | 1 | 10 | 0 | 10 |
The increased sensitivity from AI was achieved primarily due to increased detections of interval cancers and cancers that were detected on the next screening round, Mann said.
Using AI as an independent reader in a double-reading regimen could halve the radiologists' workload and potentially increase sensitivity and lower recall, according to the researchers.
"In essence, AI can serve as an independent reader because it's better than a radiologist," he said. "But an effective arbitration process is necessary, because if you just imply or apply human arbitration, you might simply erase the entire benefit of the AI system. We'll have to look into that in the future."
Identifying normals
In another presentation at ECR 2021, researchers led by Dr. Hajer Jarraya of Imagerie Médicale des Hauts-de-France in Arras, France, reported that AI could be used as an aid to the first reader to identify nearly 30% of screening mammograms as highly likely to be normal. Even better results were achieved when pairing AI and breast density analysis, however.
As a result, using AI would lead to "less anxiety for patients who do not need to wait two to three weeks to receive their final assessment, thus reducing costs and [enabling] a faster process," Jarraya said.
The researchers retrospectively assessed the performance of the Transpara software on 998 screening mammograms acquired in the French Breast Cancer Screening Program (FBCSP) from February to July 2016 and then again on subsequent mammograms in these patients two years later. The case set was enriched by 43 detected cancers. resulting in a final data set of 1,041 digital mammograms and the 1,012 prior mammograms.
Of the 52 cancers in the study, 49 had an AI risk score of ≥ 8 (out of 10). There were no cancers found in exams deemed by the AI to be likely normal (AI risk score of ≤ 4). These studies made up 29% of the total screening mammography volume.
Retrospective AI analysis of sample cohort from French Breast Cancer Screening Program | ||
AI risk score of ≤ 4 | AI risk scores of ≤ 7 and low breast density | |
Cancer cases | 0 | 0 |
Percentage of total screening volume | 29% | 47% |
How would this work in the FBCSP? If a screening mammogram is interpreted as negative by the radiologist and has an AI risk score of ≤ 4, the patient could be discharged with a final assessment as negative, Jarraya said.
"If the [AI] is positive, reader 1 would perform a second reading, a second assessment with maybe further views [on] ultrasound," she said. "If [still] negative, then the centralized second reading could be performed."
In other study findings, 43% of cancers had an AI score of 10 when looking at prior exams from two years earlier, according to Jarraya.
She acknowledged the study's limitations, including the quality parameters in the screening cohort not being strictly representative of the standards of the FBCSP.
"We need real-time, real-life prospective study and more mammograms and more diversity of data to understand the generalizability of our results," she said.