Scientists use machine learning to unveil new predictors of post-menopausal breast cancer

Download PDF Copy

By Dr. Priyom Bose, Ph.D.Reviewed by Benedette Cuffari, M.Sc.Jun 11 2023

One of the most common types of cancer affecting women worldwide is breast cancer. Multiple predictors of this disease have been identified, including inherited genetic factors, reproductive factors, and lifestyle.

Previous studies have emphasized the etiological difference between pre-and post-menopausal breast cancers. Recently, scientists have combined various approaches to accurately predict breast cancer in women.

Study: Combining machine learning with Cox models to identify predictors for incident post-menopausal breast cancer in the UK Biobank. Image Credit: aslysun / Shutterstock.com

Background

Machine learning (ML) methods can analyze large datasets on predictors and process complex non-linear relationships. Although previous studies have used ML for breast cancer risk prediction, they were not used to identify predictors.

The United Kingdom Biobank (UKB), which comprises an extensive and detailed cohort, offers the opportunity to adopt hypothesis-free approaches to identify novel predictors for breast cancer. A recent development of polygenic risk scores (PRS) can project the effect of hundreds and thousands of genetic variants associated with specific diseases or traits using genome-wide association studies (GWAS).

PRS can be used to identify people with high disease risk and target them for early statin prescription. Notably, PRS added accuracy to existing coronary artery disease risk predictors, such as the Framingham risk score.

Previously, breast cancer PRS has been combined with risk prediction models, such as the Tyrer-Cuzick model and the Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm (BOADICEA). Although the interaction between PRS and phenotypic features like gene-environment interactions have been analyzed for breast cancer, contradictory findings have been reported.

About the study

A recent Scientific Reports study utilized machine learning (ML) methods for feature selection, followed by Cox models for risk prediction. The main aim of this study was to demonstrate the effective application of ML methods for feature selection to assist classical statistical methods.

SHapley Additive exPlanation (SHAP) feature dependence plots were used to explore the potential interaction between phenotypic features and PRS. Data from UKB was used for the current study, which contains over half a million participants from England, Wales, and Scotland. Baseline data was collected through verbal interviews with a trained nurse, questionnaires, biological samples, and physical examination.

Post-menopausal women between the ages of 40 and 69 at baseline were recruited due to the aforementioned etiological heterogeneity by menopausal status. The incidence of breast cancer was identified using the International Classification of Diseases codes, in which PRS313 and PRS120k were considered as potential genetic features.

Study findings

A total of 104,313 participants were included in this study, 4,010 of whom developed breast cancer over the follow-up period of 11.9 years. Combining ML with traditional cancer epidemiology statistical approaches, several known and unknown risk factors for the incidence of post-menopausal cancer were identified.

The identified known risk factors included age at menopause, testosterone, and age. Five novel predictors, including blood biochemistry, blood counts, and urine biomarkers, were also identified.

The newly identified predictors were strongly associated with the incidence of post-menopausal breast cancer. In the future, more research is needed to understand whether these are potentially modifiable risk factors for breast cancer.

The XGBoost model selected a detailed body composition measure rather than body mass index (BMI), thus implying that precise body composition measure is an important predictor of breast cancer. The basal metabolic rate was also found to be a significant predictor for breast cancer, which contradicts a previous study that did not find any association between basal metabolic rate and breast cancer.

Plasma urea, which is a blood biomarker related to kidney function, was also associated with breast cancer. This is the first time that an association between plasma phosphate, sodium, or creatinine in urine with breast cancer has been reported.

The two polygenic risk scores were ranked as the strongest risk factors by agnostic ML models. Cox regressions proved that PRS are significant predictors for post-menopausal breast cancer.

Conclusions

The current study identified five statistically significant novel correlations with post-menopausal breast cancer, including urine biomarkers, blood counts, and blood biochemistry. Upon adding these five novel features to the baseline Cox model, the discrimination performance was maintained. Furthermore, the two pre-specified PRSs were found to be the most important features by the SHAP value.

These findings motivate further research on the use of more precise anthropometry measures to improve breast cancer prediction. External validation of the results is the next important step ahead of implementation in clinical practice.

Journal reference:

Liu, X., Morelli, D., Littlejohns, T. J., et al. (2023) Combining machine learning with Cox models to identify predictors for incident post-menopausal breast cancer in the UK Biobank. Scientific Reports 13. doi:10.1038/s41598-023-36214-0

Posted in: Medical Science News | Medical Research News | Medical Condition News | Women's Health News | Disease/Infection News

Comments (0)

Written by

Dr. Priyom Bose

Priyom holds a Ph.D. in Plant Biology and Biotechnology from the University of Madras, India. She is an active researcher and an experienced science writer. Priyom has also co-authored several original research articles that have been published in reputed peer-reviewed journals. She is also an avid reader and an amateur photographer.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Bose, Priyom. (2023, June 11). Scientists use machine learning to unveil new predictors of post-menopausal breast cancer. News-Medical. Retrieved on February 10, 2026 from https://www.news-medical.net/news/20230611/Scientists-use-machine-learning-to-unveil-new-predictors-of-post-menopausal-breast-cancer.aspx.
MLA
Bose, Priyom. "Scientists use machine learning to unveil new predictors of post-menopausal breast cancer". News-Medical. 10 February 2026. <https://www.news-medical.net/news/20230611/Scientists-use-machine-learning-to-unveil-new-predictors-of-post-menopausal-breast-cancer.aspx>.
Chicago
Bose, Priyom. "Scientists use machine learning to unveil new predictors of post-menopausal breast cancer". News-Medical. https://www.news-medical.net/news/20230611/Scientists-use-machine-learning-to-unveil-new-predictors-of-post-menopausal-breast-cancer.aspx. (accessed February 10, 2026).
Harvard
Bose, Priyom. 2023. Scientists use machine learning to unveil new predictors of post-menopausal breast cancer. News-Medical, viewed 10 February 2026, https://www.news-medical.net/news/20230611/Scientists-use-machine-learning-to-unveil-new-predictors-of-post-menopausal-breast-cancer.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.

Post a new comment

(Logout)

Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.