In a recent study published in PLoS ONE, researchers uncovered distinct genomic features of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Delta and Omicron variants.
Understanding SARS-CoV-2, the causal pathogen of the coronavirus disease 2019 (COVID-19) pandemic, is still challenging. It has been suggested that the SARS-CoV-2 genome might have formed due to the recombination of genomes close to those of bat and pangolin coronaviruses (CoVs). It is critical to investigate the origin of SARS-CoV-2 to prevent the occurrence of pandemics in the future.
SARS-CoV-2 Delta and Omicron variants feature common and unique mutations in the spike protein. Previously, the authors described GenomeBits [a statistical algorithm that maps nucleotide bases into a finite alternating sum series of distributed terms of binary values (0, 1)] and revealed distinct genomic patterns for SARS-CoV-2 Alpha, Beta, Gamma, Epsilon, and Eta variants.
The study and findings
In the present study, researchers applied the GenomeBits method to uncover the distinctive patterns from SARS-CoV-2 Delta and Omicron genomic sequences. Genomic sequence data were obtained from the global initiative on sharing avian influenza data (GISAID) repository. In similarity plots generated using the Waterman-Eggert algorithm with lalign36 alignment software, the authors observed a more significant deviation of Omicron variant (B.1.1.529) than Delta variant (AY.4.2) from the ancestral SARS-CoV-2 (Wuhan-Hu-1) sequences.
The sequences of the Delta variant from Spain exhibited more significant deviations when queried against Omicron sequences from Spain. Similar variations were noted with Delta sequences from the United States (US) against Omicron sequences from the US. Conventional similarity methods provide limited information on nucleotide bases: adenine (A), cytosine (C), thymine (T), and guanine (G), and determining the parameters to achieve optimal alignment could be difficult. Moreover, the computational resources substantially increase based on the number and length of sequences.
On the contrary, the GenomeBits method runs efficiently with less processing time for massive genomic data. The technique considers an alternating sum series with terms of nucleotide variables converted to binary values (0, 1). The significant difference between GenomeBits and other binary representation techniques is the alternating signs (±) of the terms in the GenomeBits sums. That is, if a term at a given nucleotide position is negative, then the successive term would be positive, and vice versa.
In the GenomeBits representation, the authors observed that the curves of Delta sequences mirrored those of Omicron sequences. This became more prominent when both curves were averaged. The regions of null (low noise) or constant average values were indicative of perfect mirroring. The technique illustrated and ordered (constant) to disordered (peak) transition near the non-structural protein (NSP)-5 polymerase within the open reading frame (ORF)-1a region up to the part of the spike protein.
Distinct patterns were also observed around the spike region. The disordered (peak) curves diverged rapidly, denoting dissimilarities with the increasing base position. The positive and negative terms partly canceled out, converging at some non-zero values. Furthermore, data noise reduction could be observed by including sliding windows of different sizes up to 500 bases.
Conclusions
The researchers observed constant and peaked transitions around the spike protein region of SARS-CoV-2 Delta and Omicron variants using the GenomeBits method. Numerical representations of genomic sequences have been instrumental in bioinformatics and could help handle enormous sequence data. GenomeBits might help with future bioinformatics surveillance of infectious diseases, and sequence-to-numeral mapping methods would likely prevail for characterizing new sequences.