TrialTranslator uncovers the survival gap for high-risk patients and offers a path to better cancer research.
Study: Evaluating generalizability of oncology trial results to real-world patients using machine learning-based trial emulations. Image Credit: Komsan Loonprom/Shutterstock.com
Many cancer trial results don’t generalize well to real-world patients. A research team explored this issue with TrialTranslator, a machine-learning framework that systematically tests cancer RCT findings for generalizability. Findings published in Nature Medicine.
Poor generalizability of RCT results
Randomized controlled trials (RCTs) are considered the gold standard for evaluating cancer therapies. However, their findings often fail to translate to real-world settings, leaving patients, physicians, and drug regulators concerned about the limited generalizability of these results.
In oncology, real-world survival times and treatment benefits are often significantly lower than those reported in RCTs, with median overall survival (mOS) sometimes reduced by as much as six months. Newer anti-cancer agents, such as checkpoint inhibitors, also underperform when applied to the diverse patient populations seen outside clinical trials.
Reasons for the difference
A key reason for this gap is the restrictive eligibility criteria often used in RCTs, which create study populations that do not reflect the diversity of real-world patients. Trial participants are often younger, healthier, and less likely to have comorbidities.
Unofficial biases, such as preferential selection based on race or socioeconomic status, may also influence recruitment. These limitations fail to account for the heterogeneity of real-world patients, whose outcomes can vary widely even with identical treatment protocols.
The current study sought to address this issue by improving the prediction of real-world outcomes for cancer treatments evaluated in phase 3 RCTs. To do this, researchers developed TrialTranslator, a machine-learning (ML) framework designed to assess the generalizability of RCT results systematically.
By leveraging electronic health records (EHRs) and advanced ML algorithms, the framework identifies patterns and phenotypes that may influence treatment outcomes, allowing for a more nuanced evaluation of survival benefits across diverse patient groups.
About the study
Using a comprehensive nationwide EHR database from Flatiron Health, researchers applied TrialTranslator to evaluate 11 landmark RCTs. These trials covered four of the most common advanced solid cancers—metastatic breast cancer (mBC), metastatic prostate cancer (mPC), metastatic colorectal cancer (mCRC), and advanced non-small-cell lung cancer (aNSCLC).
Each RCT was emulated by identifying real-world patients with matching cancer types, biomarker profiles, and treatment regimens.
Patients were stratified into three prognostic phenotypes (low-risk, medium-risk, and high-risk) based on their mortality risk scores derived from ML models. The framework then assessed survival outcomes, including mOS and restricted mean survival time (RMST), to compare treatment effects across these phenotypes with the results reported in the original RCTs.
Key Findings: A Risk-Dependent Gap in Outcomes
The study revealed a striking disparity between RCT findings and real-world outcomes:
- Low- and Medium-Risk Patients: These phenotypes demonstrated survival times and treatment benefits that closely aligned with the RCT results. For instance, low-risk patients often experienced survival benefits similar to those reported in clinical trials, with only a minor reduction in mOS (approximately two months).
- High-Risk Patients: In contrast, high-risk phenotypes showed significantly worse outcomes. Survival benefits were markedly reduced—62% lower than RCT estimates—and often fell outside the 95% confidence intervals reported in the original trials. Seven of the eleven emulated trials failed to show a clinically meaningful survival improvement (greater than three months) for high-risk patients.
Overall, emulated trials consistently estimated survival outcomes that were, on average, 35% lower than those reported in the RCTs. This disparity highlights the challenges of translating trial findings to more heterogeneous real-world populations.
Robust Validation of Results
The robustness of these findings was confirmed through extensive validation. Subgroup analyses, semi-synthetic data simulations, and alternative eligibility criteria demonstrated consistent results, reinforcing the reliability of TrialTranslator. Sensitivity analyses also showed that stricter eligibility criteria had little impact on the observed disparities, suggesting that patient prognosis, rather than inclusion criteria, plays a more critical role in determining treatment outcomes.
Implications for Oncology
These findings underscore the need for a paradigm shift in clinical trial design and interpretation. Current RCTs often overlook the prognostic heterogeneity of real-world patients, which contributes to their limited generalizability. High-risk patients, in particular, are underserved by existing trials, as their outcomes deviate most significantly from RCT results.
Tools like TrialTranslator offer a promising solution. By integrating EHR-derived data with ML-based phenotyping, they can provide personalized predictions of treatment benefits at the individual patient level. This enables more informed clinical decision-making, helping patients and clinicians set realistic expectations for treatment outcomes.
Additionally, these tools could revolutionize trial design by prioritizing patient prognosis over traditional eligibility criteria. By stratifying patients based on risk phenotypes, future trials could better represent the full spectrum of cancer patients and provide more accurate estimates of treatment efficacy.
Conclusion
‘’This study highlights the substantial role that prognostic heterogeneity plays in the limited generalizability of RCT results,” the authors conclude. While low- and medium-risk patients may benefit as expected from cancer therapies, high-risk patients often experience diminished survival gains.
ML-based frameworks like TrialTranslator could help bridge this gap, enabling more inclusive trials and better real-world outcomes. With tools like this, oncology can move closer to truly personalized treatment approaches that account for the diverse needs of real-world patients.