An international team of almost one hundred scientists has uncovered the complete, gap-free human genome by deciphering the remaining and hitherto unknown sequences – opening the door for novel approaches to treat various diseases. This seminal, historical study is published in the renowned journal Science.
The complete sequence of a human genome. Image Credit: Gio.tto / Shutterstock
All the way back in 2003, the historic Human Genome Project was able to sequence 92% of the human genome. These were essentially codes to human euchromatin, which contains many loosely packaged genes that code for many essential proteins with pivotal roles in our physiology.
However, for almost two decades, researchers were struggling to decipher the remaining 8%, which is a smaller and tightly packaged segment of the genome known as heterochromatin. Its salient characteristic is that it is not responsible for producing proteins.
This was one of the reasons why scientists initially chose to prioritize euchromatin, but also due to the fact that sequencing heterochromatin is extremely demanding. In other words, we needed much more advanced genomic tools to take a deep dive into this part of the genome.
This means that for a long time, we had a massive gap in our knowledge regarding certain basic cellular functions. If we look at the reference genome, there are many long runs of unknown bases, and not even all of the euchromatic genome has been adequately sequenced, as many errors have been noticed (such as duplications).
That now changed in this flagship study that was conducted by the Telomere-to-Telomere (T2T) Consortium, which joined the researchers from different academic institutions and the National Institutes of Health (NIH) in the United States.
Using Merfin and long-read methods
With state-of-the-art techniques and renewed determination, this group of researchers has been able to help in finalizing what the Human Genome Project has successfully started by revising errors found in euchromatic regions but also providing a full display of heterochromatic regions.
One of the most important tools they have used for that quest is Merfin, which conveniently cleans up some of the most difficult sequences found in the human genome. More specifically, this tool enables sequence accuracy testing and finding a potentially misaligned code, subsequently correcting those mistakes.
Furthermore, in this study, researchers have also leveraged the complementary aspects of PacBio HiFi and Oxford Nanopore ultralong-read sequencing, which are both used to resolve large and complex genomes with almost 100 percent precision. Both of these methods are known as long-read methods.
A gapless human DNA blueprint
In short, the work in this study includes gapless telomere-to-telomere assemblies (i.e., from one end of the chromosome to the other) for all 22 human autosomes and chromosome X, resulting in 3,054,815,472 base pairs of nuclear DNA – alongside a 16,569-bp mitochondrial genome.
The completed and sequenced regions now include all centromeric satellite arrays, short arms of acrocentric chromosomes and recent segmental duplications, which unlocks these previously unknown regions to complex functional and variational studies.
In a way, this is the first meticulous view of our human DNA blueprint. The aforementioned long-read methods opened the door to understanding the most cumbersome, repeat-rich segments of the human genome.
Towards personalized medicine
We are still a long way from complete genome sequencing on an individual level, but this will now inform studies on diseases linked to the heterochromatic genome, primarily cancer associated with centromere abnormalities (centromere being a constricted chromosome region that separates it into a short and long arm).
“This 8% of the genome has not been overlooked because of a lack of importance but rather because of technological limitations”, the research group states in their groundbreaking Science paper.
“High-accuracy long-read sequencing has finally removed this technological barrier, enabling comprehensive studies of genomic variation across the entire human genome, which we expect to drive future discovery in human genomic health and disease,” they add.
In any case, this study (and accompanying research endeavors) will substantially impact genome analysis and are a salient step toward assembly models that represent the genetic code of humanity. Benefiting all of us will also open the door for personalized medicine and genome editing in the future.