AI-powered tools in mammography screening deliver groundbreaking improvements in cancer detection, helping radiologists catch more cancers early while reducing unnecessary patient recalls.
Study: Nationwide real-world implementation of AI for cancer detection in population-based mammography screening. Image Credit: Gorodenkoff / Shutterstock
In a recent study published in the journal Nature Medicine, researchers examined the impact of artificial intelligence (AI) on cancer detection and recall rates.
Mammography screening contributes to reducing breast cancer-related mortality. Further, improving the sensitivity and specificity of screening could result in lower interval cancer rates, recall rates, and more effective treatment of breast cancer patients. Screening programs generate considerable volumes of mammograms, which, in most programs, require interpretation by two radiologists.
Additionally, a consensus conference may be required to achieve high specificity and sensitivity. As such, the work of radiologists involves repetitive tasks of interpreting a vast number of images weekly. Notably, this workload will likely increase as recent guidelines recommend mammography screening for additional age groups. Incorporating AI into cancer screening programs could mitigate some problems.
Studies suggest that AI is similar to and sometimes higher than radiologists in accuracy. Several studies observed increases in cancer detection for workflows integrating AI despite inconsistent results regarding recall rates. Nonetheless, the authors of this study emphasized that smaller samples and poor heterogeneity in radiologists, screening sites, and equipment vendors in these earlier studies limit their generalizability.
The Study and Findings
In the present study, researchers assessed the impact of AI on cancer recall and detection rates. The study was conducted within a breast cancer screening program in Germany targeting asymptomatic individuals aged 50–69. Data were collected from multiple screening sites implementing the AI system between July 2021 and February 2023.
In the screening program, four mammograms were acquired for each participant, initially read by two independent radiologists. If one of the radiologists deemed the case suspicious, a consensus conference was held. If suspicious findings persisted in the conference, the participant would be recalled for additional assessments.
Examinations were included in the AI group when the report was read and submitted using the AI-supported viewer by at least one radiologist. Examinations not submitted using the AI-based viewer were included in the control group. Radiologists could use their existing (non-AI-based) software or the AI-supported viewer.
The AI system, Vara MG, utilized two critical features: normal triaging, which flagged highly unsuspicious examinations as normal, and a safety net, which highlighted highly suspicious cases and provided localization of suspicious regions. This safety net aimed to reduce missed diagnoses by prompting radiologists to review unsuspicious findings flagged by AI.
In total, 461,818 females who underwent mammography screening were included, and 119 radiologists interpreted the examinations. Of these, 260,739 were included in the AI group and 201,079 in the control group. Around 42 per 1,000 females had suspicious findings and were recalled for additional assessments. Around one-fourth of them underwent biopsies, and over six females per 1,000 were diagnosed with breast cancer.
The AI system classified 59.4% of examinations as normal, reducing the radiologists’ workload significantly. The safety net was triggered for 1.5% of examinations in the AI group, leading to 541 recalls and 208 cancer diagnoses. Additionally, 3.1% of AI-group examinations flagged as normal by AI underwent further evaluation by the consensus group, which resulted in 20 additional cancer diagnoses. The breast cancer detection rate (BCDR) was 6.7 and 5.7 per 1,000 females for the AI and control groups, respectively.
The AI group had statistically higher BCDR and a slightly lower recall rate than the control group. AI and control groups had positive predictive values (PPVs) of recall of 17.9% and 14.9%, respectively. The AI group had an 8.2% higher biopsy rate than the control group. Nevertheless, the AI group had a higher PPV of biopsy (64.5%) than the control group (59.2%).
Broader Implications and Future Considerations
The study highlighted that integrating AI into screening workflows could increase the detection of ductal carcinoma in situ (DCIS) cases. While this may represent earlier detection, concerns about overdiagnosis and overtreatment of DCIS were noted, as these cases may not always progress to invasive cancer. The long-term impact on interval cancer rates and stage distribution requires further follow-up over two to three years.
Additionally, the researchers emphasized that rejected safety net cases represent a crucial area for further analysis, as they may include missed opportunities to detect cancers early or demonstrate the value of reducing unnecessary recalls.
Taken together, the AI approach for mammography screening provided confident suspicious and confident normal predictions. The BCDR in the AI group was 17.6% higher than in the control group. AI use also resulted in a slightly lower recall rate, albeit statistically insignificant. These findings contribute to the evidence base that AI-assisted mammography screening is safe, feasible, and can reduce workload.
Journal reference:
- Eisemann N, Bunk S, Mukama T, et al. Nationwide real-world implementation of AI for cancer detection in population-based mammography screening. Nature Medicine, 2025, DOI: 10.1038/s41591-024-03408-6, https://www.nature.com/articles/s41591-024-03408-6