Using machine learning, researchers have developed a way to predict the total number of microbes in our gut from sequencing data, revealing that microbial density, influenced by factors like age and diet, is a major contributor to gut microbiome variation and could reshape how we study disease connections.
In a recent study published in Cell, a team of researchers investigated the relationship between microbial load in fecal samples and variations in the gut microbiome.
Using a machine-learning approach, they were able to predict microbial loads in fecal samples using only the abundance data. The study found that microbial load significantly affected microbiome diversity and was a major confounding factor in studies examining microbiome-disease associations.
Background
The gut microbiome has a major influence on human health, as its composition is linked to various physiological processes and diseases. Researchers have widely used metagenomics to study microbial communities by examining the relative abundances of species within the microbiome. However, this relative data lacks information on microbial load or the total microbial count, which can impact microbiome diversity and function.
Traditional approaches, such as cell counting and quantitative polymerase chain reaction (qPCR), can quantify microbial load but are often labor-intensive and not feasible for large studies. Without microbial load data, metagenomic studies risk using biased or incomplete interpretations, as the microbial load can influence observed species ratios and impact the correlations with disease or other health conditions.
Furthermore, although previous studies have identified microbial shifts in diseases such as inflammatory bowel disease and obesity, the confounding influence of microbial load is rarely considered and could potentially skew these associations.
About the study
In the present study, researchers employed a machine-learning approach to predict microbial load from gut microbiome data, utilizing large metagenomic datasets from two primary cohorts — one consisting of a heterogenous study population that included healthy individuals, as well as patients with end-stage liver disease, and the other comprising healthy individuals and patients with cardiometabolic diseases.
Fecal samples from these two cohorts were analyzed using flow cytometry to obtain microbial load data. To develop a predictive model, the relative abundances of microbial species were transformed, and the minor species were filtered out. The researchers also performed hyperparameter tuning using grid search to minimize root-mean-square error, ensuring robust model performance.
To validate the model, the researchers applied it across both datasets and examined the correlations between the predicted and actual microbial loads. Additional validation involved testing the model on external datasets with paired 16S ribosomal ribonucleic acid (rRNA) gene sequencing data to verify that the predictions remained consistent across different microbiome profiling techniques.
In parallel, the study also explored the technical impact of deoxyribonucleic acid (DNA) extraction and sequencing methods on microbial load predictions by comparing paired samples processed through different protocols. Statistical analysis assessed the influence of predicted microbial load on disease associations and microbial diversity, adjusting for confounding factors such as antibiotic use and demographic variables.
Results
The study found that microbial load plays a substantial role in shaping the gut microbiome and significantly influences disease associations. The predicted microbial loads were shown to vary considerably across individuals and were driven by factors such as age, diet, and health conditions. Furthermore, higher microbial loads were associated with slower gut transit times, which also impacted the microbial diversity and composition.
The study found that the machine learning model accurately predicted the microbial load across datasets and demonstrated robustness in analyzing the datasets from both the cohorts as well as external validation datasets.
Additionally, the analyses revealed that several diseases are associated with distinct microbial load patterns. For example, conditions such as Crohn's disease and liver cirrhosis showed lower microbial loads, while diseases such as multiple sclerosis and colorectal cancer exhibited higher loads. These differences implied that microbial load may also be the underlying cause of some of the microbial community shifts observed in these diseases, independent of specific microbial species associations.
Furthermore, by adjusting for microbial load, the study revealed that many previously reported disease-microbe associations lose significance, suggesting that microbial load acts as a confounding factor in microbiome-disease studies.
The researchers also identified an association between high or low microbial loads and the microbial species consistently associated with diseases. This suggested that microbial load adjustments are vital for accurate disease biomarker development, and ignoring load-related effects could lead to misleading conclusions about disease-specific microbiome changes.
Conclusions
To conclude, the study highlighted the role of microbial load as a critical determinant of microbiome structure and a confounder in disease association studies. Furthermore, the findings suggested that accounting for microbial load could improve research accuracy, provide more nuanced insights into microbiome-disease relationships, and help develop better gut health treatments.
Journal reference:
- Nishijima, S., Stankevic, E., Aasmets, O., Schmidt, T.S.B., Nagata, N., Keller, M.I., Ferretti, P., Juel, H.B... et al. (2024). Fecal microbial load is a major determinant of gut microbiome variation and a confounder for disease associations. Cell. doi:10.1016/j.cell.2024.10.022.