In a recent study published in Nature, researchers formed the coronavirus disease 2019 (COVID-19) host genetics initiative to compile a genome-wide association meta-analysis of 60 studies from 25 countries. The study encompassed ~125,584 COVID-19 cases and data from over 2.5 million control populations.
Background
Expanding genomic research to include participants from across the globe could enable testing to determine whether the effect of COVID-19-related genetic variants is markedly different across ancestry groups. Increasing sample size and diversity are key to understanding the human genetic architecture of COVID-19.
About the study
The study meta-analysis covered three phenotypes of COVID-19:
(1) critically ill individuals who died or required respiratory support during hospitalization
(2) individuals hospitalized due to symptoms associated with the infection; and
(3) all reported COVID-19 cases, regardless of symptoms
The first study cohort comprised 9376 total cases, of which new cases and controls were 3197 and 1,776,645, respectively. The second and third cohorts had 25,027 and 125,584 total and 11,386 and 76,022 new cases, respectively. The control groups had 2,836,272 and 2,575,347 individuals in the second and third cohorts.
The team developed a Bayesian model for categorizing genetic loci based on the association patterns of three COVID-19 phenotypes examined in the study. They also performed a phenome-wide association study to understand the potential biological mechanisms governing the 23 genome-wide significant loci. Additionally, the researchers examined candidate causal genes of several of these loci. Furthermore, they applied Mendelian randomization (MR) to infer potential causal relationships between COVID-19-related phenotypes and their genetically correlated traits.
Study findings
The study pointed to 23 significant genome-wide loci, of which 20 loci remained substantial even after correction for multiple testing in accounting for the number of phenotypes tested. While all these gene loci showed the expected upsurge in statistical significance, only one locus (rs72711165) did not replicate the effects between the previous and current analysis.
While 16 loci increased >99% posterior probability of COVID-19 hospitalization risk, seven loci influenced the susceptibility to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. Intriguingly, six out of 23 loci had a significant heterogeneous effect across studies, with a P value for heterogeneity of <2.2 × 10−3. However, only Forkhead Box P4 (FOXP4 loci showed a significantly different heterogeneous effect across continental ancestry groups. Yet, even at the FOXP4 locus, all the ancestry groups showed a positive-effect size estimate. Indeed, the factors such as the variable COVID-19 severity definition due to varying thresholds for testing and hospitalization, rather than differences across ancestries, justify the observed between-study heterogeneity in the effect sizes across studies.
Multivariable MR analysis revealed that body mass index mediated a causal association of liability to type 2 diabetes on COVID-19 phenotypes.
Conclusions
By doubling the case size, the study added 11 new genome-wide significant loci, including surfactant protein D (SFTPD), mucin 5B (MUC5B), and angiotensin-converting enzyme 2 (ACE2), which revealed compelling insights regarding SARS-CoV-2 infection and COVID-19 severity.
The SFTPD binds to the S1 subunit of SARS-CoV-2 spike protein and inhibits binding to the ACE2 receptor, thus protecting the lungs against SARS-CoV-2 infection. The study findings pointed out that its missense variant rs721917:A>G (p.Met31Thr) increases the odds of hospitalization (OR=1.06) and chronic obstructive pulmonary disease (OR = 1.08). Conversely, MUC5B promoter variant rs35705950:G>T was protective against hospitalization (OR = 0.83). It also prevents deaths in patients with idiopathic pulmonary fibrosis (IPF).
The authors found that ACE2 variant rs190509934:T>C was associated with diminished susceptibility to COVID-19 (OR = 0.69). This variant is ten times more common in south Asian populations than in European populations, demonstrating the significance of diversity in variant discovery.