The ongoing coronavirus disease 2019 (COVID-19) pandemic has been caused by a novel coronavirus, namely, severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). To date, this virus has claimed more than five million lives worldwide. Although the majority of the infected individuals suffer mild flu-like symptoms, others suffer severe pneumonia with acute respiratory distress. In the case of severe infection, SARS-CoV-2 can cause multiple organ failure through cytokine release, endothelial damage, acute kidney injury, myocarditis, and microvascular and macrovascular thrombosis.
Study: Identification of LZTFL1 as a candidate effector gene at a COVID-19 risk locus. Image Credit: watchara/ Shutterstock
Genetic analysis of SARS-CoV-2 genome
Scientists have indicated the importance of genome-wide association studies (GWAS) to identify specific genes and pathways that influence complex diseases. Such identification helps the development of effective drugs. Several GWAS studies have recently identified a chromosome region of the SARS-CoV-2’s genome, i.e., 3p21.31, which strongly links with severe infection. Another study has successfully identified a locus that is strongly linked to susceptibility to COVID-19 infection.
Although previous studies showed that the 3p21.31 risk haplotype arises from Neanderthal DNA, the causal gene(s) and causal variant(s) related to increased risk of respiratory distress and mortality below sixty years are not clear. One of the difficulties in determining the causal genes and mechanisms behind GWAS is that these variants are typically in linkage disequilibrium (LD) with others. These can undergo structural polymorphism, deletions, insertions, etc. Another difficulty of these GWAS studies includes the presence of multiple mechanisms that influence the effect of variants. Translating the variants that affect enhancers (e.g., cis-regulatory elements) is challenging because many enhancers are only operational in specific cell types or at specific times.
A new study
A new study, published in Nature Genetics, focuses on developing a comprehensive platform to decode the effects of sequence variation identified by GWAS. This study has combined the computational and wet-lab experimental reports to describe the identity of effector genes, causative variants, and the associated cell types. The authors screened for potential protein-coding sequence variants from potential protein-coding sequence variants. Subsequently, the splice sites of the variants were determined using a combination of machine learning and RNA sequencing (RNA-seq) analysis.
In this study, researchers combined conventional genomic methods with machine learning to determine if variants were found within and affected cis-regulatory sequences from a series of disease-relevant cell types. This helped researchers to determine the key cell types and identify the probable causative variant.
This study also used chromosome conformation capture (3C) analysis to identify the gene promoters in physical contact with the candidate enhancer sequence in the appropriate cell type. Consequently, all the data were assimilated using gene-expression analyses. The regulatory effects of specific variants were studied using genome editing.
LZTFL1 and COVID-19
In this study, the authors identified rs17713054 as a probable variant. They also determined LZTFL1 as a candidate effector gene in pulmonary epithelial cells, strongly associated with COVID-19 infection at the 3p21.31 locus. They further stated that epithelial-mesenchymal transition (EMT) is the relevant infection response pathway regulated by LZTFL1. Scientists have indicated LZTFL1 is a candidate causal gene that increases the risk of respiratory failure twofold.
This study showed that the risk allele of the single nucleotide polymorphism (SNP), rs17713054 A, enhances transcription by increasing an epithelial–endothelial–fibroblast enhancer which is promoted via the addition of a second CCAAT/enhancer-binding protein beta (CEBPB) motif. Scientists have performed Micro Capture-C (MCC), an advanced analytical method that helped identify LZTFL1 as the only gene to interact with the rs17713054 enhancer significantly. However, this might not be the only causal gene at 3p21.31.
A couple of transcriptome-wide association study (TWAS) analyses identified eleven candidate genes at 3p21.31. However, only two genes, i.e., LZTFL1 and SLC6A20, showed strong 3C contacts with the rs17713054 enhancer and lung expression quantitative trait loci. Scientists found that both LZTFL1 and SLC6A20 genes have higher expression in the presence of the rs17713054 risk allele.
The biological role of SLC6A20 in SARS-CoV-2 infection is not yet clear. Previous studies have indicated that SLC6A20 is primarily expressed in the kidneys and gastrointestinal tract. It is also linked with Mendelian disease that causes renal calculi because of the failure of the reuptake of glycine in the nephron. LZTFL1 is highly expressed in pulmonary epithelial cells that include ciliated epithelial cells. These cells are the primary cellular target for SARS-CoV-2 infection. An earlier study also showed that the LZTFL1 gene encodes a cytosolic leucine zipper protein connected with the epithelial marker E-cadherin. This protein is associated with the trafficking of numerous signaling molecules.
Conclusion
Researchers have determined 3p21.31 as the COVID-19 risk locus and have also shown that a higher risk of severe infection is associated with increased expression of LZTFL1, a known EMT inhibitor. The authors recommended more studies on the potential role of LZTFL1 and EMT in pulmonary pathogenesis.