A comprehensive analysis of over 500,000 human protein variants reveals that 60% of disease-causing missense mutations reduce protein stability
A comprehensive analysis of over 500,000 human protein variants reveals that 60% of disease-causing missense mutations reduce protein stability
In a recent study published in Nature, researchers used cutting-edge assays, including massive in parallel synthesis (MPS) and abundance protein fragment complementation assays (PCA), to construct a database of human missense mutations and their impacts on non-communicable (genetic) diseases.
The aptly named "Human Domainome 1" library computed 563,534 mutant variants across 522 protein domains, representing a more than five-fold increase in scientific knowledge.
The study further measured the contribution of mutant variants to protein (fold) stability and, in turn, their role in fitness and pathogenicity. Finally, the study elucidates the impacts of fold stability on the manifestations of dominant and recessive disorders and the energetics of conserved mutational effects.
Background
Missense mutations are genetic alterations arising from the substitution of a single base pair (bp) in an amino acid (aa) sequence with a different base pair, resulting in the translation of a different protein than intended. While most are harmless and go unnoticed in day-to-day life, researchers estimate that missense mutations are responsible for ~33% of human genetic diseases.
Missense mutations are surprisingly common – the human population harbors tens of millions of these genetic variants. Unfortunately, their functional impacts on human pathogenicity and disease remain largely unknown.
About the study
The present study uses a large-scale experimental dataset comprising more than 563,534 genetic variants to elucidate their clinical effects on 522 human protein domains, thereby providing unprecedented insights into the impacts of minor mutations on human disease.
The study utilized DNA synthesis in tandem with cellular selection experiments to create the first 'Human Domainome' – a catalog of how each aa, when mutated to any other possible aa (n = 19) at the same position in the domain, is functionally altered.
Study data was created using cutting-edge microchip-based massive in parallel synthesis (mMPS) technology, generating 1,230,584 aa variants corresponding to 1,248 protein domains.
The prevalent scientific view on the effects of missense mutations on health posits that changes in aa sequences alter the folding stability of resultant proteins, thereby triggering genetic diseases. To verify this view, abundance protein fragment complementation assays (aPCA) were carried out. The assays use high-throughput sequencing of pooled variants (hundreds of thousands of proteins) to quantify the effects of genetic alterations on cellular growth rates and enzyme production (of yeast cells) and reveal their molecular mechanisms.
These assays enabled the study to evaluate the relative contributions of different genetic variants to human fitness, identify functional sites (potential targets of future clinical research), and assess the relationship between protein stability and pathogenicity.
Additionally, the study evaluated the performance of computational variant effect predictors (VEPs) – the current gold standard in clinical variant classification, for their accuracy. Notably, the evaluation dataset provided by this study is fivefold larger than previously known to science.
Finally, the study used a Boltzmann partition function-derived thermodynamic model to elucidate the conservation of mutational effects and the energetic underpinnings of protein folding interactions.
Contributions
The study elucidates the role of protein folding in genetic disease, highlighting that ~60% of pathogenic missense mutations trigger diseases via reductions in protein stability.
Protein instability was found to be the major driver of heritable cataract formation, other eye diseases, muscle wasting, and neurological illness. For example, the study revealed that 72% (13 out of 18 evaluated mutations) of cataract-associated genetic variants were due to the folding and destabilization of beta-gamma crystallins, the proteins responsible for maintaining lens clarity.
Of the 621 well-known disease-causing mutations evaluated, 61% were found to result from similar destabilization processes. Recessive disorders were particularly susceptible to stability-induced pathogenicity. All-beta protein families were found to be impacted more strongly than other protein families, biologically observable as reductions in evolutionary fitness.
The Human Dominome 1 database serves as the most extensive current source of data for clinicians, researchers, and computational/predictive machine learning models, allowing for the early detection and management of genetic diseases.
Notably, despite being almost five-fold larger than prior collective scientific knowledge on missense mutations, it is estimated to cover only 2.5% of known human proteins, requiring future updates to reach its true potential.