In a recent study published in The Lancet Digital Health, a group of researchers estimated forced vital capacity (FVC) (Total air exhaled after the deepest breath) and forced expiratory volume in 1 second (FEV1) (Air exhaled in the first second of a forced breath) from chest x-rays using a deep learning-based model.
Study: A deep learning-based model to estimate pulmonary function from chest x-rays: multi-institutional model development and validation study in Japan. Image Credit: sopa phetcharat/Shutterstock.com
Background
Pulmonary function testing, primarily measuring FVC and FEV1 with spirometry, is essential for diagnosing and managing respiratory impairments like chronic obstructive pulmonary disease (COPD) (a chronic lung disease causing obstructed airflow) and asthma.
Since its clinical introduction in 1846, spirometry has been crucial, but it can be challenging for older adults and young children, and its use was limited during the coronavirus disease 2019 (COVID-19) pandemic.
Chest x-rays, widely used and correlated with pulmonary function, offer an alternative approach. Further research is needed to improve methods for estimating pulmonary function in diverse clinical settings and patient populations.
About the study
The retrospective study collected chest x-rays and spirometry data from five Japanese institutions between July 1, 2003, and December 31, 2021. The study, approved by the ethics board of Osaka Metropolitan University, waived informed consent as the data were obtained during routine clinical practice.
Spirometry data were labeled with FVC and FEV1 values, and chest x-rays taken within 14 days of spirometry were used. The data were divided into training, validation, and internal test datasets from three institutions (A-C), and external test datasets from the remaining two institutions (D and E).
The Artificial Intelligence (AI) model, using Convolutional Neural Network Next (ConvNeXt) and two classifiers, was trained with various loss functions and image resolutions, and the best-performing model was selected using the Python Torch (PyTorch) framework.
Performance was evaluated by calculating Pearson correlation coefficient (r), intraclass correlation coefficient (ICC), Root Mean Square Error (RMSE), Mean Square Error (MSE), and Mean Absolute Error (MAE) between predicted and actual spirometry values.
Saliency maps generated using SHapley Additive exPlanations (SHAP) highlighted regions important for predictions, which were reviewed by independent radiologists.
Statistical analyses were performed using SciPy in Python, with 99% confidence intervals estimated through bootstrapping. The study focused on the AI model's performance rather than p-value comparisons.
Study results
A total of 141,734 x-ray and spirometry-matched pairs from 81,902 patients were included in the analysis. The training, validation, and internal test datasets comprised 134,307 x-rays from 75,768 patients, with an equal distribution of 50% female and 50% male (mean age 56 years, SD 18).
The training dataset included 108,366 x-rays from 61,009 patients (50% female, mean age 54 years, SD 17), while the validation dataset included 13,180 x-rays from 7,381 patients (50% female, mean age 54 years, SD 17). The internal test dataset had 12,761 x-rays from 7,378 patients (50% female, mean age 54 years, SD 17).
External test datasets included 2,137 x-rays from 1,861 patients at institution D (40% female, mean age 65 years, SD 17) and 5,290 x-rays from 4,273 patients at institution E (46% female, mean age 63 years, SD 17).
Race and ethnicity data were not available. The best-performing model used an RMSE loss function of 0.39 and an image size of 1024 pixels at 182 epochs.
For FVC determination using external test datasets, institution D had an r-value of 0.91 (99% CI 0.90–0.92), and institution E had an r value of 0.90 (99% CI 0.89–0.91). ICC values were 0.91 and 0.89, respectively, MSE values were 0.17 L², RMSE values were 0.41 L, and MAE values were 0.31 L.
For FEV1 determination, institution D had an r value of 0.91 (99% CI 0.90–0.92) and institution E also had an r value of 0.91. ICC values were 0.90 for both institutions, MSE values were 0.13 L² and 0.11 L², RMSE values were 0.37 L and 0.33 L, and MAE values were 0.28 L and 0.25 L, respectively.
Patients with COPD had r values of 0.81 for FVC, and 0.83 for FEV1 at institutions D and E. Patients with asthma had r values of 0.89 for FVC and 0.90 for FEV1.
The area under the receiver operating characteristic curve for classifying FVC less than 80% predicted was 0.88 for institution D and 0.85 for institution E; for FEV1 less than 80% predicted, it was 0.87 for both institutions; and for FEV1/FVC ratio less than 70%, it was 0.83 for institution D and 0.87 for institution E.
Averaged saliency maps showed the AI model focused primarily on lung regions, giving lower weight to peripheral lung fields and higher weight to central lung fields.
Radiologists identified features associated with decreased FEV1, such as lung hyperinflation and bronchial wall thickening, and features linked to decreased FVC, including lung volume loss and reticular shadows at the periphery.
Conclusions
To summarize, this model, which predicts pulmonary function without active patient participation, demonstrated strong correlations (r values of 0.91) similar to those from chest Computed Tomography (CT) studies.
Radiologists identified lung hyperinflation and bronchial wall thickening as features associated with decreased FEV1, while lung volume loss and reticular shadows were linked to decreased FVC.
The model can complement spirometry, particularly for patients unable to perform spirometry and improve diagnostic accuracy by providing pulmonary function estimates from routine chest x-rays.