In a recent study published in JAMA Network Open, researchers developed and validated four machine learning (ML) models on a dataset comprising more than 30,000 participants to identify a novel ML algorithm (named ‘AutMedAI’) capable of early autism spectrum disorder (ASD) detection with minimal background and medical information.
Study: Machine Learning Prediction of Autism Spectrum Disorder From a Minimal Set of Medical and Background Information. Image Credit: vetre/Shutterstock.com
Introduction
Their findings highlight the eXtreme Gradient Boosting (XGBoost) algorithm as the best-performing ML model for these investigations.
Notably, the model significantly outperformed conventional questionnaires and previous artificial intelligence (AI) applications, needing only a minimal set (n = 28) of routine childcare background and medical data for its predictions.
This study represents a promising first step in the ideal of early and routine ASD detection, saving patients and their families substantial socioeconomic stress and improving their future quality of life.
Background
Autism spectrum disorder (ASD, formerly ‘autism’) is an umbrella term for a diverse group of neurological and developmental conditions that alter patients’ communication, learning, and behavior and may significantly hamper interpersonal communication.
Despite decades of research, the diagnosis and treatment of ASD remains an ongoing clinical and psychiatric hurdle. Reports estimate that 1% of all humans suffer from the condition, with proportions approaching 3% in developed nations like the United States (US).
ASD represents an overwhelming socioeconomic and mental health burden for both patients and their families.
While the psychological mechanisms underpinning ASD are multifaceted and beyond the scope of the present work, early detection and subsequent interventions are the best forms of recourse for improved population-wide outcomes.
Conventional gold standards in ASD screening involve the use of behavioral questionnaires (e.g., ‘the Modified Checklist for Autism in Toddlers’) presented to children or their caretakers.
While these approaches have substantially reduced the age of ASD detection, these questionnaires are often highly detailed, requiring professional expertise and specialized testing.
Modern screening approaches seek to apply machine learning (ML) and similar (AI) models to automate the process, thereby bypassing the need for professional guidance.
Unfortunately, these models have only been validated in research settings and appear to require expansive raw data for robust predictions.
About the study
In the present study, researchers aim to develop and validate a novel ML model capable of early ASD detection using only easy-to-acquire background medical and family history data.
Model development and training data were obtained from the Simons Foundation Powering Autism Research for Knowledge (SPARK) study (version 8 – June 2022), comprising 30,660 participants from 26 US states.
Inter-model performance comparisons and consensus model validation were carried out using data from SPARK version 10 (July 2023) and the Simons Simplex Collection (SSC) (n = 14,790).
Study data included: 1. Primary medical screening information, 2. Background family history, 3. Demographic information, including patients’ race/ethnicity, 4. Questionnaire scores, specifically the Child Behavior Checklist (CBCL), the Social Communication Questionnaire (SCQ) score, and the Full-scale Intelligence Quotient (FSIQ). Of these, 28 variables were chosen for their ease of acquisition and applicability to patients under 24 months.
“…selection was based on identifying easily obtainable, noninvasive, parent-reported information in the medical and background questionnaires. The selection of measures used a consensus-based approach prior to the development of the ML model. Twenty-eight variables were selected, of which 11 were present in the basic medical screening and 17 in the background history data.”
Researchers trained and tested four ML algorithms for performance and reliability: 1. Logistic regression, 2. Random forest, 3. Decision tree, and 4. eXtreme Gradient Boosting (XGBoost). All models were built using the the Python scikit-learn library.
The area under the receiver operating characteristics curve (AUROC) was computed using the DeLong algorithm for model performance evaluations. Additionally, positive predictive value (PPV), F1 score, sensitivity, and specificity were calculated for model reliability.
Study findings
The training and validation cohort comprised 30,660 participants (15,330 in the ASD and non-ASD subcohorts, respectively). Participants were found to be 63.5% male (n = 19,477) and predominantly White (59.7%) with a mean age of 113 months.
Four model testing results revealed the XGBoost algorithm performing best (AUROC = 0.895). A tuned version of this model was then used for model prediction validation under the name ‘AutMedAI.’
Prediction model testing revealed that AutMedAI could accurately identify and diagnose 78.9% (n = 9,417) of participants as having ASD or not (AUROC = 0.790).
Researchers subsequently undertook SHAP value computation to determine the features most relevant to the model’s high predictive power.
“…features like problems with eating foods, age at first use of short phrases or sentences including an action word, age at first construction of longer sentences, age at achieving bowel training, and age at first smile emerge as the most significant predictors, as evidenced by their high SHAP values.”
Conclusions
Herein, researchers develop, test, and validate AutMedAI, a novel ML model for the early detection and screening of children at high ASD risk in a US-based test and validation cohort comprising more than 45,000 total participants.
Current ASD screening approaches utilize specialized behavioral tests with the requirement of a professional tester.
AutMedAI aims to remove these requirements by achieving high predictive power using only routinely collected childcare and family background medical data.
Study findings revealed the model to be a success, with the model’s predictive power found to be comparable to current gold-standard ASD questionnaires without the need for specialized behavioral testing.
Together, AutMedaAI and similar next-gen ML platforms represent the first step in significantly reducing the mental and socioeconomic impacts of ASD on patients and their families.