In a recent study published in the journal npj Digital Medicine, researchers evaluated a novel artificial intelligence (AI) tool aimed at assessing the motor performance of potential Parkinson's disease (PD) patients. Their results highlight that the machine learning (ML) model outperformed Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS) certified raters and closely matched neurological experts in diagnostic performance measurements of standardized finger-tapping tests. These findings elucidate the potential for only needing a webcam, phone, or digital camera to remotely diagnose PD and other neurological conditions in areas traditionally deprived of neurological care.
Study: Using AI to measure Parkinson’s disease severity at home. Image Credit: meeboonstudio / Shutterstock
How can AI help PD patients?
Parkinson's disease (PD) is a neurological disorder characterized by unintended or uncontrollable movements, such as shaking, stiffness, and difficulty with balance and coordination. Symptoms usually begin gradually and worsen over time. As the disease progresses, people may have trouble walking and talking. Hitherto, no cure for the condition exists, but regular medication adjustments and clinical assessments can help manage symptoms and improve the patient's quality of life.
Parkinson's is the second-most common neurodegenerative disease after Alzheimer's disease. It is also the fastest-growing neurological condition, with more than 10,000,000 patients worldwide. Unfortunately, given a dearth of neurological experts, especially in remote and underdeveloped regions, many PD patients receive inadequate neurological care and, consequently, improper diagnosis and treatment. Reports estimate that 40% of PD patients in the United States (US), a developed nation, do not receive expert care. This trend is even more alarming in developing or underdeveloped regions, which may only have a single neurologist per millions of individuals.
Most PD patients are adults above the age of 65. Even in areas with sufficient neurological support, arranging regular clinical visits for elderly, motor-function-impaired individuals is challenging. Artificial intelligence (AI) and machine learning (ML) models have recently been proposed to solve this challenge via automated PD diagnosis and clinical assessment. Videos of motor tasks, most commonly finger-tapping exercises, have been used to train models to evaluate bradykinesia, the slowing of movement frequently accompanying neurodegenerative disorders, including PD.
"Imagine anyone from anywhere in the world could perform a motor task (i.e., finger-tapping) using a computer webcam and get an automated assessment of their motor performance severity."
While the above-referenced statement encapsulates the authors' vision, it presents three fundamental complications of its own – firstly, accurately collecting sufficient data from the home environment, which is traditionally noisy and heterogeneous in its setting. Secondly, identifying and developing 'digital biomarkers' to assess PD presence and severity. Thirdly, creating an online platform where potentially motor-impaired elderly can securely and privately complete required assessment tasks without professional supervision.
Prior work in the field has only included training cohorts of 20 individuals or less. Studies have focused only on binary classifications (PD or no PD) but have failed to assess disease severity in patients with the condition. Models used in video analyses have computed variables that are not clinically interpretable. Most importantly, videos used in prospective ML model training have been recorded in controlled and noise-free clinical settings with trained professional guidance. Given the sensitivity of ML models, models trained on clean, noise-free data are likely to underperform in noisy home settings, necessitating the development of an AI tool that can accurately score PD presence and severity from potentially noisy data derived from home recordings.
About the study
The present study aimed to develop, train, and performance-test ML models capable of using webcam-captured motor function videos to accurately diagnose PD severity remotely and automatically. Data was collected from 250 global participants using the online, publicly available ParkTest tool. Of these, 202 participants recorded themselves carrying out the finger-tapping task at home, while 48 participants were videoed using identical methodology in a clinic. ParkTest additionally collected sociodemographic and clinical information for enrolled patients.
Three specialist neurologists and two non-specialists administered clinical ratings for finger-tapping task video recordings. The specialist neurologists comprised associate or full professors in US Neurological institutes of repute. The non-specialists were a doctor (MBBS) with experience in PD clinical studies and an early career neurology resident with extensive (10 years) experience in movement disorder research. Ratings involved watching a video and delivering a score between zero (normal) and four (severe) following the MDS-UPDRS guideline. The guideline stipulates that each hand is treated as a separate sample, so the sample size was 500.
Ground-truth severity scores were computed using ratings exclusively from the specialist neurologists, using consensus between at least two experts as 'ground-truth.' In cases where none of the specialist scores overlapped, the average of their scores was considered ground truth. Ratings from both specialists and non-specialists were used as benchmarks against which to compare final ML model performance.
Digital biomarker features were selected by evaluating the movements of numerous vital parts of the hand. Researchers identified 21 critical points per hand using MediaPipe, "an open-source project developed by GoogleAI that provides a public API of a highly accurate state-of-the-art model for hand pose estimation." These points yielded 47 finger-tapping features, including speed, amplitude, slowing, hesitation, and rhythm. Eighteen additional features quantified wrist movement. Correlation coefficients (r) were measured for each feature, comparing the feature to the ground truth.
"The feature extraction process is comprised of five stages: (i) distinguishing left and right-hand finger-tapping from the recorded video, (ii) locating the target hand for continuous tracking, (iii) quantifying finger-tapping movements by extracting key points on the hand, (iv) reducing noise, and (v) computing features that align with established clinical guidelines, such as MDS-UPDRS."
This study used the Light Gradient Boosting Machine (LightGBM) regressor model as the AI tool. Model evaluation was carried out using the leave-one-patient-out cross-validation approach. As the name suggests, this approach uses one patient as the test cohort, while all the remaining patients are used for model training. Model performance estimation was undertaken using mean squared error (MSE), mean absolute error (MAE), mean absolute percentage error (MAPE), Kendall rank correlation coefficient (Kendall's τ), Spearman's rank correlation coefficient (Spearman's ρ), and Pearson's correlation coefficient (PCC).
Bias correction was done using sociodemographic and clinical information, specifically sex, age, ethnicity, and PD diagnostic status. Finally, to account for the heterogeneity in lighting conditions in different households, rankers were asked to rate each video's lighting quality, after which the impacts of poor quality lighting on ranker and model performance were estimated.
Study findings
The present study made three significant contributions, the first of which is that the finger-tapping task evaluation can be successfully, accurately, and reliably carried out by specialist neurologists remotely through streamed or recorded video. The specialist neurologists were in strong agreement on their diagnosis of videos. They depicted Krippendorff's alpha and Intra-class correlation coefficient (ICC) scores of 0.69 and 0.88, respectively.
"The three raters showed a difference of no more than 1 point from the ground truth in 99.2%, 99.5%, and 98.2% of the cases, respectively. These metrics suggest that the experts can reliably rate our videos recorded from home environments."
Secondly, AI tools can outperform non-specialists and almost match specialist neurologists in their diagnostic and severity evaluation of PD patient videos. The absolute MAE scores for specialists, non-specialists, and the trained LightGBM model were 0.53, 0.83, and 0.58 (where lower is better).
Finally, bias or confounds due to socioeconomic or clinical variables do not affect model accuracy or sensitivity, allowing the proposed model to potentially aid more than just the 250 patients included in this study.
Conclusions
In the present study, researchers evaluated a proof-of-concept AI tool to automatically and remotely evaluate PD occurrence and severity in patients experiencing difficulties benefitting from conventional neurological care. Their results indicate that both the ML model and with slightly greater accuracy, specialist neurologists can reliably diagnose PD and similar motor function disorders by evaluating videos of finger-tapping tests of patients taken remotely.
"…our tool is not intended to replace clinical visits for individuals who have access to them. Instead, the tool can be used frequently between clinical visits to track the progression of PD, augment the neurologists' capability to analyze the recorded videos with digital biomarkers and finetune the medications. In healthcare settings with an extreme scarcity of neurologists, the tool can take a more active role by automatically assessing the symptoms frequently and referring the patient to a neurologist if necessary."
Machine learning models continually improve with growing sample datasets. As the authors prepare to release their AI tool into the public domain, each additional patient will become an additional sample point to further finetune the model's accuracy. In the future, additional neurological biomarkers might be discovered that enhance the tool's functionality further.