In a recent study published in npj Science of Food, researchers developed VirtuousMultiTaste, a machine-learning tool to distinguish between sweet, umami, and bitter tastes based on a compound's molecular structures and underlying physicochemical characteristics.
Background
Taste and smell are crucial to food chemosensory perception, influencing meal selection and consumption. Taste perception uses five fundamental sensations to control nutrient intake and prevent toxic substances: sweet, bitter, umami, salty, and sour. Understanding the physiochemical properties of food components is critical in influencing flavor and intake.
Advances in machine learning-based algorithms have categorized the tastes of chemical compounds; however, there is still potential for improvement in constructing multi-class models that can anticipate the complete spectrum of fundamental tastes, which limits food science and technology.
About the study
In the present study, researchers used machine learning and heuristic optimization approaches to anticipate different taste experiences in chemicals.
The study dataset included a collection of publicly available compounds with validated flavors grouped into nine taste groups. The first database had 5,290 chemicals for sweet and bitter tastes and 2,549 for umami. The final dataset comprised 4,717 chemicals, with a random selection chosen for training. The researchers oversampled the Umami class with 133 samples using the Adaptive Boosting (AdaBoost) method as an extra pre-processing step.
The researchers used Principal Component Analysis (PCA) to assess molecular characteristics, identifying 1,306 that were significantly different for dimensionality reduction. The Autocorrelation of a Topological Structure (ATS) was the most common descriptor class among the 15 traits chosen.
The researchers used ensemble dimensionality reductions using Pareto-based optimization algorithms to improve prediction accuracy, decrease chosen features, and simplify classification. The optimization targets included Accuracy (ACC), Selected Features Number Minimization 1, F1 Score 10, Precision (PRC), F2 Score 1, Recall (REC), the area under the receiver operating characteristic curve (AUC), Manhattan Distance 1, and Trees Minimization 1 (SV number).
Researchers used random forest (RF) classifiers that outperformed support vector machines (SVM) across different objectives. They compared 20 various RF models and chose the best one based on its performance and minimal number of features. They used 10-fold cross-validation (CV) on the training dataset. Autocorrelation of a Topological Structure (ATS) was the most commonly used descriptor class among the 15 specified characteristics. The researchers computed autocorrelation descriptors using Moreau-Broto autocorrelations weighted by Allred-Rocow, Pauling, and Sanderson electronegativity, mass, Gasteiger charge, atomic number, ionization potential, polarizability, and intrinsic state.
The researchers evaluated model performance against external food and natural product databases such as FooDB, FlavorDB, PhenolExplorer, Natural Product Atlas, and PhytoHub. They compared coffee and chocolate based on their proportional content in FooDB. They also evaluated the model against frequently used machine learning algorithms and pipelines.
The researchers assessed model applicability by comparing the similarity between tested substances and chemicals used during training. They used Morgan Fingerprints and the Tanimoto Similarity Index to obtain average similarity scores between test and training compounds and compared VirtuousMultiTaste to previously developed VirtuousBitterSweet and VirtuousUmami taste predictors.
Results
In cross-validation, the chosen RF model scored an AUC value of 0.92, 77% accuracy, and 77% recall. The test set showed an AUC of 0.87, with 79% accuracy and 72% recall. The umami flavor had the highest AUC values (0.98), followed by the bitter taste (0.92) and the 'other' taste group (0.86). The VirtuousMultiTaste model performed somewhat better in predicting bitter taste, with accuracy, precision, recall, and F1 and F2 values of approximately 83%. Coffee had a predominantly bitter taste profile, with 130 projected bitter chemicals, whereas chocolate had 96 bitter compounds, 33 sweet compounds, four umami compounds, and 13 other taste compounds.
VirtuousMultiTaste beat other classifiers on performance metrics. VirtuousMultiTaste and VirtuousUmami had comparable accuracy and AUC ratings but somewhat lower precision, recall, F1, and F2 values. Both methods achieved over 99% accuracy in evaluations using non-umami chemicals other than those used in training. VirtuousMultiTaste might anticipate umami compounds rather than peptides, allowing for a broader chemical study. Model performance remained consistent throughout the similarity quartiles, showing widespread applicability.
Based on the study findings, the VirtuousMultiTaste machine-learning tool can swiftly analyze chemical databases for candidate compounds with predicted taste qualities. It has demonstrated an excellent ability to anticipate numerous taste sensations concomitantly, indicating the possibility of integration into multisensory perception. The tool predicts four tastes and allows analysis of various chemicals and knowledge of chemical-physical processes that influence total taste perception.
However, intuitively understanding the chemical and physical properties of the tastants based on 15 primary characteristics is difficult. Future research should focus on simple descriptors or create particular approaches for correlating molecular descriptors to structural characteristics or functional groupings.
Journal reference:
- Androutsos, L., Pallante, L., Bompotas, A. et al. Predicting multiple taste sensations with a multiobjective machine learning method. npj Sci Food 8, 47 (2024). DOI: 10.1038/s41538-024-00287-6