In a recent study posted to the bioRxiv* preprint server, researchers demonstrated severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) evolution at the transcriptional level.
*Important notice: bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.
Background
The nucleotide (nt) level mutations in the SARS-CoV-2 genome led to the emergence of novel subgenomic messenger-ribonucleic acid (sg mRNA). The phenomenon has several important implications, especially for understanding the emergence of past and future SARS-CoV-2 variants.
While the N-terminal domain (NTD), serine-arginine (SR)-rich linker, and C-terminal domain (CTD) of SARS-CoV-2 nucleocapsid (N) protein bind to SARS-CoV-2 RNA, while the N3 domain nested inside the C-terminal interacts with its membrane (M) protein, facilitating genome packaging by anchoring the viral RNA to nascent virions.
During evolution, two mutations in the SARS-CoV-2 N protein, R203K and G204R, stimulated the formation of a new transcription regulatory sequence (TRS) (AGGGGAAC→AAACGAAC), which defined the B.1.1 lineage and its descendants, including three SARS-CoV-2 variants of concern (VOCs) - Alpha (B.1.1.7), Gamma (P.1), and Omicron (B.1.1.529).
TRSes are required for the expression of the spike (S), envelope (E), M, N proteins, and accessory genes in the 3′ end of coronavirus (CoV) genomes downstream of open reading frames (ORFs), ORF1a and ORF1b.
During transcription, the SARS-CoV-2 polymerase copies a TRS in the body of the genome (TRS-B). The nascent RNA strand dissociates from the genomic template and re-anneal to the homologous TRS in the 5′ Leader (TRS-L) to re-initiate transcription giving rise to sg RNAs, which are copied back to the positive sense, yielding novel sg mRNAs. It is noteworthy that this discontinuous transcription is a hallmark of the order Nidovirales.
About the study
In the present study, researchers infected Vero E6 cells at a high multiplicity with SARS-CoV-2 B lineage viruses, including B.1.1.7, B.1.351, BA.1, and B.1.1.529. They harvested the infected VeroE6 cells 24 hours post-infection (hpi) and extracted their RNA for analysis by reverse transcription-polymerase chain reaction (RT-PCR). The RT-PCR assay used a forward primer against the 5′ Leader and a reverse primer against the 3′-end of the N protein.
Next, they confirmed the presence of the Nigh ORF Three (N.iORF3) sg mRNA in humans infected with SARS-CoV-2, for which they analyzed 12 clinical swab samples from the coronavirus disease 2019 (COVID-19) waves that occurred between late 2020 to early 2022.
They used a quantitative RT-PCR assay to quantify and compare N.iORF3 sg mRNA in human samples with other viral RNA species. Probes spanned the TRS-L sg mRNA junction for quantifying N.iORF3, N, and E sg mRNA copy numbers.
To determine whether N.iORF3 synthesized a protein in infected cells, the team infected VeroE6 angiotensin-converting enzyme 2 (ACE2)- transmembrane serine protease 2 (TMPRSS2) cells with lineages B, B.1.1, Alpha, Beta and Delta viruses. They harvested these cells at 20 hpi and analyzed them by immunoblotting.
Further, they tested the hypothesis of whether N.iORF3 protein, which encompasses the CTD of N, acts as an interferon (IFN) antagonist. To this end, the researchers transfected HEK293T cells with plasmids encoding N.iORF3. After 24 hours, they transfected cells with an immunostimulant polyinosinic: polycytidylic acidpoly (I: C).
Further, they searched public SARS-CoV-2 genomes within the complete Ultrafast Sample placement on Existing tRee (UShER) phylogenetic tree. The UShER contains the R203K, G204R mutation in the N protein but not in the B.1.1 lineage viruses, hence could be used to determine whether N.iORF3 sg mRNA has evolved independently outside of the B.1.1 lineage.
Lastly, the researchers searched sequence repositories, such as Nextstrain, the global initiative on sharing the Avian influenza data (GISAID), and CoV-GLUE for novel TRS-B sites.
Study findings
All SARS-CoV-2-infected VeroE6 cells had the canonical full-length N sg mRNA; however, an additional shorter product corresponding to the N.iORF3 sg mRNA was detected in Alpha- and Omicron-infected cells, but not lineage B-, and Beta-infected cells. RT-PCR results revealed that N.iORF3 sg mRNA was present in human clinical samples infected with Alpha and Omicron but not from the B.1 lineage (Delta).
In both clinical swabs and infected cells, the authors observed that both N.iORF3 sg mRNA and E sg mRNA consistently expressed at an equivalent level (30-150%) but around 100-fold lower than N sg mRNA.
Although below the limit of detection (102 copies), nanopore sequencing of endpoint PCR products also detected a distinct ORF9b-specific sg mRNA in the Alpha-infected cells. N protein expression mostly remained consistent among the cells infected with different variants; additionally, there was a band at ~25 kDa in B.1.1- and Alpha-infected cells.
Cells transfected with N.iORF3 expressed lower IFNβ and interferon-induced proteins with tetratricopeptide repeats 1 (IFIT1) mRNA, indicating that N.iORF3 antagonized IFN signaling downstream of double-stranded (ds)RNA sensing in the cytoplasm.
Multiple depositing laboratories have detected the convergent evolution of the N.iORF3 sg mRNA, within the SARS-CoV-2 iota variant (B.1.526). Interestingly, the authors also detected a silent double-nt substitution at S202 in the entire Gamma lineage, at least six times within the Alpha lineage and even within Omicron, indicating further evolution of the N.iORF3 TRS region.
Surprisingly, after N.iORF3, the most frequent site was located at the end of ORF1ab, within the region coding for SARS-CoV-2 non-structural proteins (nsp)16.
The authors observed at least 21 occurrences of convergent evolution of the nsp16.iORF TRS-B, with 82% of sequences matching with samples collected during the B.1.1.44 lineage outbreak in Scotland. Further investigation also confirmed that the nsp16.iORF sg mRNA was expressed during human SARS-CoV-2 infection.
Conclusions
The study findings demonstrated that the emergence of novel ORFs and convergent TRS-B evolution is representative of SARS-CoV-2 evolution and its adaptation to new hosts.
This evolution has primarily taken two routes: horizontal gene transfer between CoVs and non-homologous recombination with unrelated viruses. Interestingly, coronaviruses also duplicate parts of their genome,e.g., in SARS-CoV-2 and related viruses, ORF3a diverged from a copy of the gene encoding the M protein.
Moreover, the 3′ regions of ORF6 and ORF8 in SARS-CoV-2 have significant sequence homology to the 5′UTR, suggesting that TRS and its flanking sequences can provide new genetic material. In fact, these processes are fundamental to the evolution of RNA viruses.
The authors could not identify the complete set of functions of the proteins encoded by newly identified sg mRNAs in SARS-CoV-2. Nevertheless, the study results highlighted features of SARS-CoV-2 evolution at the functional RNA level, demonstrating how convergent evolution of TRSes makes it challenging to trace the early evolution of SARS-CoV-2 in humans.
In the future, the identification of functionally-important nt-level changes along with amino-acid mutations will be crucial for understanding the course of the emergence of newer SARS-CoV-2 variants.
*Important notice: bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.