A Swedish study finds AI-supported screening detects 29% more cancers without increasing false positives, while cutting radiologists’ workload by 44%.
Study: Screening performance and characteristics of breast cancer detected in the Mammography Screening with Artificial Intelligence trial (MASAI): a randomised, controlled, parallel-group, non-inferiority, single-blinded, screening accuracy study. Image Credit: Gorodenkoff/Shutterstock.com
In a recent study published in The Lancet Digital Health, researchers examined the performance of cancer screening in the mammography screening with artificial intelligence (MASAI) trial.
Background
Breast cancer is a heterogeneous disease that ranges from indolent to aggressive forms. Cancer characterization based on size, morphology, molecular subtype, immunohistochemical biomarkers, metastases, and lymph node involvement yields predictive and prognostic information that is useful in treatment planning and patient follow-up.
Artificial intelligence (AI) can potentially decrease the screen-reading workload in mammography screening and improve cancer detection.
Double reading of screening examinations is the standard of care in European screening programs. Reducing the double-reading workload by substituting (part of) human reading with AI could have a positive impact on the staffing of breast radiologists.
A few prospective studies have suggested that AI use in mammography screening increases cancer detection. However, cancer detection supported with AI should not predominately identify indolent cancers or occur at the expense of more false positives; instead, AI usage should increase the detection of clinically relevant cancers.
About the study
In the present study, researchers assessed the performance of cancer screening measures in the MASAI trial. The trial was designed to compare AI-supported mammography screening with standard double-reading.
Participants were females eligible for participation in population-based mammography screening in Sweden. After mammogram acquisition, examinations were randomized to standard double reading (control) or AI-supported screening (intervention).
In the intervention group, examinations were analyzed using the “Transpara” AI system. Transpara provided a malignancy risk score on a 10-point scale, which was categorized into high (score: 10), intermediate (8–9), or low risk (1–7). High-risk examinations underwent double reading, while intermediate- and low-risk examinations underwent single reading.
Examinations with the highest 1% risk were flagged as extra-high risk. Control examinations underwent standard double reading and were not analyzed by AI.
Cancers were stratified as in situ, invasive, histological, Nottingham histological grade, and nuclear grade. The molecular subtype was determined using immunohistochemical biomarkers.
Tumor, node, metastasis (TNM) staging was ascertained based on lymph node involvement and pathological size. The main outcome measures were early screening performance, tumor stage and type detection, and screen-reading workload.
Performance measures included detection rate, recall rate, positive predictive value (PPV) of recall, and false-positive rate.
Findings
The analytic sample included 53,043 subjects in the intervention group and 52,872 participants in the control group. In the intervention arm, 3,800 examinations were of high risk, 655 of which were flagged as extra-high risk. Transpara did not provide a risk score for 368 examinations. About 0.1% of examinations in either group were technical recalls.
Cancer detection significantly increased by 29% with AI-supported screening relative to standard double reading. In addition, recall and false-positive rates showed non-significant increases with AI-supported screening, while PPV of recall had a significant increase. The intervention group had 48,444 fewer readings but 65 more consensus meetings than the control arm.
This meant a 44% decline in screen-reading workload. In the intervention group, 941 participants were recalled due to mammographic findings, and 169 were recalled due to reported symptoms, compared to 847 and 180 subjects in the control group, respectively.
In addition, the intervention group had more cancers detected across 10-year age groups and an increased false-positive rate from age 60 than the control group.
AI use resulted in 76 more cancers detected, comprising 23 additional in situ cancers and 53 additional invasive cancers. Moreover, detection across histological types was also increased, with the largest increase in the number of invasive cancers of no special type.
Increased detection with AI-supported screening was also observed across histological grades; grade I cancers had the highest increase.
Furthermore, 29 additional luminal A and 21 more non-luminal A invasive cancers were detected with AI-supported screening. The intervention group had 46 more lymph node-negative and five more lymph node-positive invasive cancers than the control group.
In the intervention group, most non-luminal A cancers were lymph node-negative. A similar number of subjects with TNM stage 2+ was observed between groups.
Conclusions
In sum, the findings illustrate that AI-supported screening resulted in a significant increase in cancer detection relative to standard double reading.
AI use mainly increased the detection of small, lymph node-negative invasive cancers. It substantially reduced screen-reading workload compared to standard reading but had a similar false-positive rate.
AI use did not negatively affect the rates of consensus meetings, recalls, or false positives. These results underscore the potential of AI to increase early detection of clinically relevant breast cancer without increasing false positives.