In a recent study published in npj Antimicrobials and Resistance, researchers developed and validated interpretable machine learning (ML) algorithms that predict antibiotic resistance in complicated urinary tract infections (UTIs), enhancing clinical decision-making and promoting personalized treatments.
Study: Interpretable machine learning-based decision support for prediction of antibiotic resistance for complicated urinary tract infections. Image Credit: LALAKA/Shutterstock.com
Background
The rise in antimicrobial resistance (AMR) is endangering the effectiveness of antibiotic treatments, leading to possible therapy failures. While new antibiotics are crucial, their development is hampered by high costs, regulatory constraints, and reduced investments from major pharmaceutical firms.
It limits antibiotic research, and excessive reliance on broad-spectrum therapies due to AMR can fuel further resistance.
Given the growing threat of AMR and reduced antibiotic research and development (R&D) by major pharma firms, there is a need for further research for tools such as interpretable ML to predict resistance and steer effective treatments for ailments like UTIs.
About the study
The models were developed using the AMR-UTI dataset, a public resource containing information from over 80,000 UTI patients from Massachusetts General Hospital (MGH) and Brigham & Women’s Hospital (BWH) between 2007 and 2016.
This dataset primarily focused on patients with potentially complicated UTIs, totaling 101,096 samples, and included those who did not meet the criteria of a previous study that concentrated on uncomplicated UTIs.
The samples in this study represented a diverse patient group with various infection complexities requiring different antibiotic treatments.
For consistency with prior research, the researchers adopted a similar data structure and filtering technique, and each data point consisted of urine specimens analyzed for AMR with details such as the antimicrobial susceptibility profile, past specimen features for AMR prediction, and basic patient information.
The raw data were then categorized into three groups based on- susceptible (S), with both intermediate (I) and resistant (R) categories treated as resistant. EHR provided further patient details, including previous antibiotic usage, infections, procedures, and other relevant clinical data. Still, details about dosage, treatment duration, and patient encounters outside the two hospitals were not included.
All categorical variables in the dataset were converted into a format suitable for computational processing, resulting in 787 features. While most features were binary, missing data was represented by a zero.
The dataset was divided based on time for model training and testing: data from 2007-2013 for training and 2014-2016 for testing. Additionally, the dataset employed a binary classification for race, either “white” or “non-white,” though this approach was recognized as potentially perpetuating biases.
The dataset could not determine if patients had conditions such as asymptomatic bacteriuria (ASB), which could influence the study's outcomes.
The team used various ML models to predict resistance to specific antibiotics, comparing their performance using metrics like sensitivity, specificity, and area under the curve. To ensure optimal performance, the models underwent hyperparameter optimization and threshold adjustment.
Study results
In the present study, patient cohorts, including training, validation, and test sets, had a median age of 64, with about 72.9% identifying as white. This contrasted with an uncomplicated UTI cohort, consisting solely of females with a median age of 32.
The gender data for complicated UTI patients was missing, and more patients from the complex UTI test group visited the emergency room.
Resistance to antibiotics like fluoroquinolones matched United States (U.S.) 2012 estimates, which had no recent history of drug-resistant infections. Predictive models like TabNet and XGBoost were trained on 2007-2013 data and tested on 2014-2016 diagnoses.
The models showed better predictive accuracy for second-line antibiotics compared to first-line ones. XGBoost excelled in resistance prediction among all models, but its performance was notably enhanced when TabNet was pre-trained with self-supervised techniques. The models' effectiveness was further affirmed by consistent results from validation on an independent cohort.
Due to uncertainties around the documentation of race and ethnicity, the study executed an additional experiment, omitting this feature and using the XGBoost model. The outcomes of this experiment were consistent with the original models, which included race and ethnicity.
Encouragingly, these findings highlight the potential of using models to discern antibiotic resistance in complicated UTI specimens at the individual patient level. The models also displayed adaptability when applied to uncomplicated UTI specimens.
Furthermore, the models could provide insights into factors crucial for determining resistance. All models consistently highlighted prior antibiotic resistance and exposure as critical determinants. Factors like previous UTIs, especially if pathogens like E.coli were detected, also indicate resistance.
Additionally, comorbid conditions such as paralysis and renal issues were prominent across all antibiotics and models. The research also revealed that certain features significantly impacted predictive accuracy, with prior antibiotic resistance emerging as the most influential.