To date, coronavirus disease 2019 (COVID-19), which is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has caused the deaths of over 4.8 million worldwide. Since SARS-CoV-2 first emerged in Wuhan, China, its origin has been contentiously debated.
Study: Evidence Against the Veracity of SARS-CoV-2 Genomes Intermediate between Lineages A and B. Image Credit: peterschreiber.media / Shutterstock.com
*Important notice: Virological publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.
Although scientists have since confirmed that the closest relatives of SARS-CoV-2 are bats, there remains limited information as to how the virus was able to eventually reach humans. In an effort to gain a better understanding of whether SARS-CoV-2 resulted from a natural spillover into humans, researchers recently discuss their findings on the veracity of two early SARS-CoV-2 lineages in a study published on the preprint server Virological*.
About the study
Two primary lineages are often used to describe the early SARS-CoV-2 genomes. Whereas lineage B includes the reference genome Hu-1, it is also defined by nucleotides C8782 and T28144. Comparatively, lineage A is defined by substitutions C8782T and T28144C that are relative to the reference genome.
Since early 2020, numerous intermediate SARS-CoV-2 sequences containing either C8782T or T28144C, but not both, have been reported. In the current study, these genomes are referred to as C/C or T/T, as they have the same nucleotide at these two key sites.
In the current study, the researchers initially collected SARS-CoV-2 consensus genomes from GISAID between February 2020 to December 2020, as the diversity of the genome was not clear at the start of the pandemic. All animal samples, as well as any sequences with incomplete collection dates, were excluded, which left a total of 1,716 sequences for the current study.
Of the selected genomes, the researchers then identified any genomes that contained an intermediate C/C or T/T genome, along with a major lineage with shared mutations.
The presence of these mutations indicated that they developed independently in both the putative intermediate and its pure counterpart or that the putative intermediate is not actually intermediate.
Study findings
Taken together, the researchers were able to identify 28 C/C genomes, of which 6 did not have any additional mutations aside from the 28144 mutations. Notably, 16 of the C/C/ genomes were found to share nucleotide substitutions also found in lineage A, whereas 11 of the C/C genomes shared substitutions within lineage B.
Comparatively, 10 T/T genomes were collected, of which two genomes had additional C3037T and A23403G mutations aside from those present in lineage B. An additional two T/T genomes had G11410A and G26211T mutations.
Conclusion
The reoccurring appearance of various derived mutations on either side of a given mutation is difficult to arrange through homoplasy events. The study showed many apparent homoplasies in 41.6% of the C/C intermediate genomes and 58.3% of the T/T genomes. The generation of these homoplasies could arise due to contamination, error in sample preparation, sequencing technology, or consensus calling approaches.
Taken together, the findings of the current study form visible doubt on the authenticity of the C/C and T/T intermediate genomes in early 2020.
"We suggest that these early C/C and T/T genomes are erroneous and should be excluded from phylogenetic analyses".
*Important notice: Virological publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.