Identification of transcription regulatory sequences and genes in previously unannotated coronaviruses

The ongoing coronavirus disease 2019 (COVID-19) pandemic is caused by a novel coronavirus, namely, severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). Examples of other coronaviruses with high transmissibility and infect humans are Middle East Respiratory Syndrome (MERS) and severe acute respiratory syndrome (SARS).

Study: CORSID enables de novo identification of transcription regulatory sequences and genes in coronaviruses. Image Credit: vchal/ ShutterstockStudy: CORSID enables de novo identification of transcription regulatory sequences and genes in coronaviruses. Image Credit: vchal/ Shutterstock

This news article was a review of a preliminary scientific report that had not undergone peer-review at the time of publication. Since its initial publication, the scientific report has now been peer reviewed and accepted for publication in a Scientific Journal. Links to the preliminary and peer-reviewed reports are available in the Sources section at the bottom of this article. View Sources

Translation of coronavirus genome

Coronaviruses are single-stranded and positive-sense RNA genomes that are translated by the host ribosome. The coronavirus genome consists of multiple genes which are expressed and translated via two different mechanisms. The first mechanism involves the invasion of the virus into the host cell and the translation of the viral genome using the hosts’ machinery to produce polypeptides. These proteins correspond to one or two overlapping open reading frames (ORFs). The second mechanism involves the auto-cleaving of polypeptides to synthesize several non-structural proteins. These proteins include the formation of RNA-dependent-RNA-polymerase (RdRP), whose function is to mediate the expression of the remaining viral genes via discontinuous transcription.

Previous studies have revealed that RdRP tends to switch templates after encountering transcription regulatory sequences (TRSs). These are positioned in the 5’ untranslated region (UTR) of the genome, known as TRS-L (L stands for leader), and upstream of each viral gene, called TRS-B (B stands for the body). This mechanism is associated with the synthesis of many subgenomic mRNAs that are translated into the structural and accessory viral proteins essential for the viral life cycle. Hence, identification and characterization of the TRS region are essential to elucidate the regulation and expression of the viral proteins.

Scientists have hypothesized that the presence of regulatory sequences could be effectively used to instantaneously and accurately identify TRS sites as well as the related viral genes in unannotated coronavirus genomes. This study is available on the bioRxiv* preprint server.

Although previous studies have formulated methods to identify either TRS sites or viral genes, to date, researchers have not developed a method to identify both simultaneously. Earlier studies have revealed that TRSs contain 6 − 7 nt long conserved sequences (core sequences), and both TRSL and TRS-Bs can be identified in coronaviruses using general-purpose motif finding methods.

MEME is a commonly used method based on expectation maximization to simultaneously locate the appearances of multiple motifs. Scientists indicated that the only method available to date to identify TRS sites in coronaviruses particularly is SuPER. This method uses coronavirus genome sequence with specified gene locations and taxonomic and secondary structure information as inputs for analysis. Another gap in the research highlighted by researchers is the unavailability of methods to identify viral genes in unannotated coronavirus genome sequences.

Gene identification

Two of the commonly used gene prediction tools include Glimmer3 and Prodigal. Glimmer3 is based on the Markov model to determine scores of similarity to ORFs, following which it identifies overlapping genes to generate the list of predicted genes. On the contrary, Prodigal is based on a heuristic approach associated with fine-tuned parameters, optimized to identify desired genes in prokaryotes. However, these gene tools are unable to study the regulatory sequence and the TRS sites located upstream of the genes in the genome.

Interestingly, in this study, researchers introduced the TRS Identification (TRS-ID) and the TRS and Gene Identification (TRS-GENE-ID), to locate TRS sites in a coronavirus genome with specified gene annotations. Additionally, both TRS sites and regulatory genes in an unannotated coronavirus genome could be identified simultaneously. Researchers introduced CORSID-A (CORe Sequence IDentifier), a dynamic programming (DP) algorithm that extends classical Smith-Waterman recurrence to identify TRS-I.

CORSID was also applied to solve the TRS-GENE-ID problem. It can incorporate a maximum-weight independent set formulation on an interval graph to locate TRS sites and genes. Researchers assessed the performance of the newly developed methods on coronavirus genomes obtained from GenBank. They found that CORSID-A is more advanced than MEME and SuPER in identifying TRS sites. Furthermore, CORSID showed better results compared to two other commonly used aforementioned gene tools, Glimmer3 and Prodigal. This method can also identify recombination events in a genome. Additionally, scientists revealed that CORSID allows de novo identification of TRS sites and genes in previously unannotated coronaviruses.

Conclusion and future research

The authors stated that CORSID is the first method that can conduct simultaneous and accurate identification of TRS sites as well as genes in coronavirus genomes without requiring information related to the taxonomic or secondary structure of the protein.

The authors recommended several avenues for future research. For instance, presently, CORSID requires the complete genome as input to identify the TRS sites and the genes. However, researchers aim to modify their method such that it can perform gene identification using partial reference genomes. This could be attained by leveraging information from other coronaviruses that have complete genomes with similar TRS sites. At present, this method is focused on the gene identification of coronaviruses; however, it can be extended to other viruses as well.

This news article was a review of a preliminary scientific report that had not undergone peer-review at the time of publication. Since its initial publication, the scientific report has now been peer reviewed and accepted for publication in a Scientific Journal. Links to the preliminary and peer-reviewed reports are available in the Sources section at the bottom of this article. View Sources

Journal references:

Article Revisions

  • May 15 2023 - The preprint preliminary research paper that this article was based upon was accepted for publication in a peer-reviewed Scientific Journal. This article was edited accordingly to include a link to the final peer-reviewed paper, now shown in the sources section.
Dr. Priyom Bose

Written by

Dr. Priyom Bose

Priyom holds a Ph.D. in Plant Biology and Biotechnology from the University of Madras, India. She is an active researcher and an experienced science writer. Priyom has also co-authored several original research articles that have been published in reputed peer-reviewed journals. She is also an avid reader and an amateur photographer.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Bose, Priyom. (2023, May 15). Identification of transcription regulatory sequences and genes in previously unannotated coronaviruses. News-Medical. Retrieved on December 21, 2024 from https://www.news-medical.net/news/20211116/Identification-of-transcription-regulatory-sequences-and-genes-in-previously-unannotated-coronaviruses.aspx.

  • MLA

    Bose, Priyom. "Identification of transcription regulatory sequences and genes in previously unannotated coronaviruses". News-Medical. 21 December 2024. <https://www.news-medical.net/news/20211116/Identification-of-transcription-regulatory-sequences-and-genes-in-previously-unannotated-coronaviruses.aspx>.

  • Chicago

    Bose, Priyom. "Identification of transcription regulatory sequences and genes in previously unannotated coronaviruses". News-Medical. https://www.news-medical.net/news/20211116/Identification-of-transcription-regulatory-sequences-and-genes-in-previously-unannotated-coronaviruses.aspx. (accessed December 21, 2024).

  • Harvard

    Bose, Priyom. 2023. Identification of transcription regulatory sequences and genes in previously unannotated coronaviruses. News-Medical, viewed 21 December 2024, https://www.news-medical.net/news/20211116/Identification-of-transcription-regulatory-sequences-and-genes-in-previously-unannotated-coronaviruses.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Reinstating masking curbs hospital viral outbreaks, study confirms