In a recent study published in the European Journal of Human Genetics, researchers developed two multi-polygenic scores (muti-PGs) to predict coronary heart disease (CHD) by weighing and linearly integrating phenotypic genome sequences (PGS) for CHD and numerous other features.
Study: A linear weighted combination of polygenic scores for a broad range of traits improves prediction of coronary heart disease. Image Credit: Theerani lerdsri/Shutterstock.com
Background
PGs for CHD are calculated using summarized statistical information from CHD-specific genome-wide association studies (GWAS).
Pleiotropy, on the other hand, is prevalent in biological mechanisms, and disease-related variations frequently overlap causal pathways with numerous features. As a result, integrating GWAS data of additional variables may enhance PG performance for CHD.
About the study
The present study investigated whether a polygenic score with various PGS for different CHD-related features would be more strongly linked with coronary heart disease than a coronary heart disease-specific phenotypic genome sequence.
The researchers used two similar methodologies to investigate whether a linearly weighted summation of phenotypic genome sequences for different variables may enhance CHD prediction by establishing two multi-polygenic scores:
- multiPGSCHD, which includes 16 PGSS for CHD and contributing factors developed using GWAS statistics for coronary heart disease, and another atherosclerotic cardiovascular disease (ASCVD) as the training dataset and the United Kingdom Biobank (UKBB) as the tuning dataset, and
- ii) extendedPGSCHD, using 3,170 PGS in the Phenotypic Genome Sequence Catalog as the training dataset and the Atherosclerosis Risk in Communities Study (ARIC) data for tuning.
The researchers assessed the effectiveness of extendedPGSCHD and multiPGSCHD among Mayo Clinic individuals, including 43,578 European adults with available electronic health records (HER) data, of which 4,479 were CHD cases, whereas 39,099 were controls.
Lasso regression was used to train the multi-polygenic score models. Phenotypic and genetic information in the UKBB cohort was generated by imputing the Wellcome Trust Centre for Human Genetics dataset.
The Human Reference Consortium (HRC) panel imputed ARC genotype data. Regeneron Genetics genotyped 1.4 million variations in the community-dwelling Mayo Biobank population utilizing the Twist Diversity single-nucleotide polymorphism (SNP) panel.
Variants with minor allele frequency (MAF) less than 1.0%, imputation data less than 0.30 in the UKBB, call rate less than 0.990, and Hardy-Weinberg equilibrium (HWE) p-values less than 106 for the Mayo Clinic and ARIC cohorts were excluded from the analysis.
Analyses in the UKBB were limited to individuals who had no genetic relationship to other individuals. CHD status was determined among Mayo Biobank individuals using the Current Procedural Terminology (CPT) and the International Classification of Diseases (ICD) diagnostic codes.
The researchers did not utilize GWAS statistics from meta-analyses, including multi-ancestry participants in which the UKBB was included as a cohort.
Results
In models including 10 PCs, age, and gender, a one-standard-deviation elevation in extendedPGSCHD and multiPGSCHD was linked to a 1.7-fold and 1.7-fold increased chance of coronary heart disease, respectively, among Mayo Biobank individuals.
Meanwhile, CHD_PRSCS, a previously published polygenic score for coronary heart disease, elevated the probability by 1.5. CHD was present in 18%, 18%, and 16% of patients in the top deciles for multiPGSCHD, extendedPGSCHD, and CHD_PRSCS, respectively.
The UKBB, a non-European organization, was utilized to investigate the relationship between different PGS and CHD status. A one-standard-deviation rise in PGSCHD was related to a 1.8-fold CHD risk increase.
The lasso model, which included PGSCHD and multiPGSCHD, improved CHD prediction compared to a model with just base variables and only marginally compared to one with PGSCHD.
The Mayo Biobank also employed PGSCHD and multiPGSCHD for persons of non-European heritage. When tested on the same training set, the lasso model had an area under the curve (AUC) of 0.8.
PGS for coronary artery disease, cardiovascular disease, coronary atherosclerosis, ischemic stroke, hypertension, and diabetes type 2 were the most significantly linked with CHD.
The findings indicated that the extendedPGSCHD might be more successful in risk stratification than the other two PGSs. The creation of extendedPGSCHD, on the other hand, was computationally intensive due to the necessity to compute and tune thousands of PGS.
The advances might be attributed to various factors, including the inclusion of PGS generated on different ancestries and using different methodologies and the extra features included in extendedPGSCHD.
Conclusions
Overall, the study findings showed that using several PGS for cardiovascular illnesses (like peripheral artery diseases), risk factors (including hypertension and diabetes type 2), and biological markers [including high-density lipoprotein-cholesterol (HDL-C) and non-HDL-C levels] improved CHD prediction. MultiPGSCHD and extendedPGSCHD increased the CHD_PRSCS performance.
The multiPGSCHD and extendedPGSCHD multi-polygenic scores showed a considerable improvement in odds per standard deviation; however, the difference in their odds ratios was insignificant.
This technique is predicted to grow more successful if more PGS are created using bigger GWAS and made available in repositories such as the PGS Catalog, with implications for clinical settings.