In a recent study published in Nature Medicine, researchers developed a method for rapidly gathering and integrating clinical (CD) and neuropathological diagnoses (ND) data by examining medical record summaries from donors at the Netherlands Brain Bank (NBB) to detect disease trajectories.
Study: Identification of clinical disease trajectories in neurodegenerative disorders with natural language processing. Image Credit: Natali _ Mis/Shutterstock.com
Background
Neurodegenerative disorders, such as Alzheimer's disease (AD), Parkinson's disease (PD), and dementia with Lewy bodies, are a worldwide health issue due to their wide range of clinical symptoms and complicated comorbidities.
Current research struggles to acquire complete clinical data, which limits statistical designs. Innovative data-driven strategies that use large autopsy cohorts are required to improve diagnosis.
Brain banks give vital information on neurodegenerative illnesses, but drawbacks such as limited clinical data and binary case-control designs impede development.
About the study
In the present study, researchers created a computer pipeline to translate medical record summaries from Netherlands Brain Bank (NBB) donors into clinical illness trajectories, which included 84 neuropsychiatric symptoms and signs recognized using natural language processing.
They scanned NBB donor files, defined and predicted clinical features in the recorded history, translated predicted symptoms and signs into clinical illness trajectories, and applied them for downstream analysis.
The researchers developed a novel cross-disorder clinical classification system including 90 neuropsychiatric symptoms and signs related to brain illnesses and general well-being. One scorer evaluated 18,917 phrases from a randomly selected group of 293 contributors to build a dataset for refining, validating, and testing various Natural Language Processing (NLP) models.
The researchers optimized five model designs [support vector machine (SVM), bag of words (BOW), T5, PubMedBERT, and Bio_ClinicalBERT] and chose the best one based on microprecision.
The team developed the clinical illness trajectories, including several neuropsychiatric symptoms and signs, duration, and more donors than previously published. They then ran an enrichment assessment to investigate whether the estimated clinical features were more prevalent in each disease than expected.
To assess the diagnostic accuracy of this brain autopsy cohort, the researchers cleaned and matched CD descriptions to the human disease ontology and compared the generated clinical diagnosis labels to the neurodegenerative diagnoses.
The researchers incorporated machine-learning algorithms into healthcare practices to consistently predict neuropathological diagnoses from clinical illness trajectories.
They included 3,042 donors who provided 199,901 words of clinical history data and were diagnosed with different neuropathologically characterized brain illnesses.
The team chose symptoms and signs based on their medical-scientific importance, existence in the clinical history, and definition clarity.
The team used a gated recurrent unit (GRU-D) to assess the accuracy of forecasting ND from clinical illness trajectories, emphasizing the apolipoprotein E4 genotype associated with early AD and severe neurodegeneration.
The team used clinical illness trajectories to conduct temporal profiling of specific neuropsychiatric signs and symptoms across various disorders.
They also performed a survival analysis to determine whether there were differences in the overall survival rate after the first observation of a sign or symptom between donors with different neuropathological diagnoses.
Results
The researchers identified indications and symptoms that differ between often misdiagnosed illnesses and clinical subgroups of diverse brain disorders, indicating that neuronal substructures are affected differently.
The inter-annotator agreement for model reliability was high, with 269 signs and symptoms considerably enriched in particular diagnoses, 148 of which were pre-defined to be of diagnostic value.
All neuropsychiatric features showed significant enrichment in one or more brain conditions, indicating they were related to a subcategory of diseases.
As predicted, dementia and memory impairment were much more prevalent in dementias such as AD, frontotemporal dementia (FTD), vascular dementia (VD), dementia with Lewy bodies (DLB), and pervasive development disorders (PDDs), a finding not observed in Parkinson’s disease without dementia.
Likewise, multiple sclerosis (MS) demonstrated significant enrichment for mobility impairment, muscle weakness, and fatigue, consistent with the debilitating disease of the central nervous system.
Progressive supranuclear palsy (PSP), multiple system atrophy (MSA), PD, MS, PDD, and ATAXIA showed increased enrichment for reduced mobility.
In contrast, MND, VD, PSP, MS, and MSA showed higher enrichment for muscle weakness, indicating that the approach may identify a distinct set of disease-specific symptoms.
The researchers found specific indications and symptoms increased in specific subtypes of dementia, such as paranoia and façade behavior in Alzheimer’s disease and hearing issues and muscular weakness in vascular dementia.
Eighty-four percent of neuropathologically identified Alzheimer’s disease donors and 83% of neuropathologically defined FTD donors were clinically diagnosed with Alzheimer’s disease or frontotemporal dementia, respectively.
MSA was commonly clinically diagnosed as Parkinson's disease, whereas vascular dementia and PSP were classified clinically as several different conditions, indicating that NBB brain donors frequently receive a misdiagnosis.
Conclusion
The study findings highlighted NLP usage to identify the clinical trajectories of neurodegenerative diseases. The findings indicate that many brain illnesses have largely overlapping symptoms, which might indicate disturbed neuronal substructures.
The findings can help epidemiologists, molecular biologists, and computational researchers investigate the clinical symptoms of neurodegenerative disorders and build prediction models to identify new data-driven clinical subgroups for diseases such as dementia, Parkinson's disease, and multiple sclerosis.