A recent study published in Cell illustrated a human genome map with dosage sensitivity across disorders.
Background
Duplications and deletions of genomic segments, commonly called copy-number variants (CNVs), have been identified as significant evolutionary pathways for over five decades. Nevertheless, only a few instances of CNVs in humans offer adaptive advantages.
On the other hand, rare CNVs (rCNVs), comprising duplications and deletions uncommon in the whole human population, can significantly increase the risk of disease. These rCNVs have been associated widely with complex and Mendelian diseases. Moreover, genomic disorders (GDs), a subgroup of disease-linked rCNVs, have been extensively discussed in the literature for many years.
Interestingly, it has historically been challenging to find dosage-sensitive (DS) driver genes among rCNVs. Additionally, the genome-wide DS segments and gene annotations are still lacking. There are no commonly accepted frameworks for assessing triplosensitivity and haploinsufficiency for any human gene.
Furthermore, it is generally uncertain whether two or more separate haploinsufficient (HI), triplosensitive (TS) genes, or the same bidirectionally DS gene are responsible for the duplication- and deletion-linked reciprocal GD phenotypes. There is also an urgent need for detailed maps of bidirectional dosage sensitivity throughout disorders for clinical interpretation and study of human disease.
About the study
The present study sought to measure the features of haploinsufficiency or deletion intolerance and triplosensitivity or duplication intolerance over the whole human genome. The team harmonized and meta-analyzed rCNVs from 950,278 people to create a genome-wide library of rCNV relationships for 54 illness phenotypes.
In addition, they used 145 genome annotations with these rCNVs to forecast the probabilities of triplosensitivity (pTriplo) and haploinsufficiency (pHaplo) for all protein-coding genes.
In detail, the investigators collected rCNVs identified by microarrays spanning 17 sources, extending from diagnostic labs to national biobanks. They took advantage of the current sample size, drawing on years of influential research on CNV in illness, to methodically find rCNV connections for each phenotype.
The researchers then aimed to discover specific genes augmented for coding rCNVs among patients than controls using exome-wide rCNV connection testing. They hypothesized that, even if incomplete, a library of dosage sensitivity measurements for every gene would constitute a potentially helpful tool for clinical genetics and genomics study. Thus, the team created a two-step technique to computationally forecast the pTriplo and pHaplo for 18,641 autosomal protein-coding genes.
Results and discussion
Collectively, the investigators produced a genome-wide library of standardized rCNV correlation statistics by meta-analyzing a sizable collection of biomedical datasets to evaluate the impact of rCNVs on 54 human illnesses. With a high-confidence selection of 88 DS genomic segments having rigorous genome-wide importance, this catalog comprises a consensus catalog of 178 DS genomic segments associated with human disease.
The researchers also demonstrated that a sizeable portion of these segments presumably contains a minimum of one DS driver gene depending on enrichments of restricted disease genes and non-uniform concentrations of damaging de novo mutations (DNMs) inside rCNV segments.
The enhanced density of restricted genes the team detected for pleiotropic rCNVs was consistent with a basic framework of roughly one causative gene per phenotype per segment. Besides, it was congruent with available information about a few significant GDs, such as the link of 22q11.2 GD deletions with heart and kidney abnormalities in T-box transcription factor 1 (TBX1) and CT10 regulator of kinase-like proto-oncogene, adaptor protein (CRKL), respectively.
The entire genetic consequences of most rCNVs were likely to be more intricate, considering the known cis-regulatory impacts, gene-gene contacts, and varied penetrance or expressivity attributable to the polygenic backdrop and secondary variants.
The authors repurposed fine-mapping tools from genome-wide association studies (GWASs) to statically select specific genes inside large rCNVs throughout a spectrum of genetic architectures and effect sizes. rCNVs and short variants commonly congregate on the same causal genes at disease-linked loci, according to patterns the team discovered by combining short variant datasets. This convergence might indicate a mechanism, as shown by the CNV direction-selective augmentations of uncovered protein-truncating variants (PTVs) and missense DNMs.
Finally, the team used the study data to forecast each autosomal protein-coding gene's dosage sensitivity. For analyzing rare duplications and even a few disease-related missense variants, for which loss-of-function (LoF) and gain-of-function effects are difficult to discern in silico, the current triplosensitivity scores particularly may offer a unique perspective.
Conclusion
Overall, the harmonization and meta-analyzation of rCNVs from about 1,000,000 people allowed the authors to create a genome-wide library of dosage sensitivity spanning 54 illnesses. This process defined 163 DS segments connected with at least one disease. The team ranked these segments utilizing statistical fine-mapping because they frequently contained dominant DS driver genes and were generally gene dense.
Finally, the scientists created an ensemble machine-learning framework to estimate dosage sensitivity probabilities (pHaplo and pTriplo) for all autosomal genes. This model revealed 2,987 HI and 1,559 TS genes, comprising unique 648 TS genes.
Notably, the researchers made all metrics and maps from the current study available to the public as an open resource. They anticipated that the study findings on dose sensitivity would be very beneficial for studying human diseases and clinical genetics.