Researchers from the University of Illinois at Urbana-Champaign and the University of California-Davis (UC Davis) are combining in vivo experimentation with computation for highly accurate prediction of the genome-wide binding pattern of a key protein involved in brain disorders.
"The MeCP2 gene is critical for proper brain development and expressed at near-histone levels in neurons, but the mechanism of its genomic localization remains poorly understood," explained Jun Song, a professor of bioengineering and of physics at the University of Illinois at Urbana-Champaign. "Using high-resolution MeCP2 binding data, we show that DNA sequence features alone can predict binding with 88% accuracy."
Even though every cell in a person's body contains the same DNA sequence, it is possible to have hundreds of different cell types with distinct shapes and functions, because the access to genetic information encoded in DNA is regulated in a cell type-specific manner. One way of regulating the information access is through chemically modifying DNA with methylation, which is in turn recognized by various proteins that physically interact with other factors to control transcriptional activities.
According to Tomas Rube, a postdoctoral researcher in Song's research group, MeCP2 is one of the proteins previously identified to bind methylated CG dinucleotides.
"Mutations in the MeCP2 gene are directly linked to a severe brain disorder known as the Rett Syndrome, but the genome-wide binding pattern and function of MeCP2 remain poorly understood," said Rube, the lead author of the paper, "Sequence Features Accurately Predict Genome-wide MeCP2 Binding in vivo," appearing in Nature Communications.
In neurons, MeCP2 is approximately as abundant as histone octamers in the nucleus and is believed to be broadly distributed throughout chromatin. This high abundance has posed a major technical challenge in mapping the genome-wide binding sites of MeCP2 and characterizing the precise DNA sequence features that help recruit MeCP2.
The researchers showed that MeCP2 densely covers the genome in a manner that can be accurately predicted using DNA sequence features alone and that local MeCP2 binding activities can help explain the pattern of gene expression in neurons.
"These findings provide key insights into this important epigenetic regulator and highlight the complexity of understanding the relation between DNA sequence and gene regulation," stated Wooje Lee, a co-first author and a postdoctoral fellow in the laboratory of Qizhi Gong, professor of cell biology and human anatomy at the UC Davis School of Medicine and co-senior author of this study. Dr. Gong's laboratory and her colleagues at UC Davis did the experimental work, while Dr. Song's group handled the complex computation and modeling.
The research team, representing several universities in the U.S. and Korea, used new high-resolution MeCP2 ChIP-seq data from olfactory epithelium, to develop a predictive model of genome-wide MeCP2 binding pattern. Although there is strong evidence in vitro supporting the ability of MeCP2 to bind methyl-CpG (mCpG), MeCP2 may actually bind diverse sequences in vivo, as reflected in its multifaceted roles. The functional impact of MeCP2 has been previously examined by attempting to identify MeCP2 target genes in neurons. In addition to a number of genes found to be suppressed by MeCP2, multiple studies have also identified a global reduction of transcription in neurons lacking functional MeCP2, suggesting a novel activating role of MeCP2.
"Contrary to the common belief that MeCP2 can bind only methylated CG, our study shows that MeCP2 in fact has diverse modes of binding, largely attributable to the GC sequence content and often independent of the methylation status of DNA," Song said, adding that the lack of a fine-resolution genome-wide binding map has been a major bottleneck in understanding the mechanism of MeCP2 function to this point. "This study shows that MeCP2 binds distinct but numerous sites throughout the genome in a manner that can be accurately predicted using DNA sequence features alone."