ChIP-seq or chromatin immunoprecipitation sequencing is a technique that combines ChIP with next-generation sequencing (NGS) for the investigation of the interactions that occur between proteins and DNA.
Credit: YuriiHrb/Shutterstock.com
Interactions have a strong influence on the regulation of gene expression and are crucial to understanding how a cell functions. ChIP-seq is a powerful tool to identify genomic sites that DNA-binding proteins such as transcription factors associate with and to find out more about the chromatin structure.
What is ChIP?
ChIP involves immunoprecipitation of DNA-bound protein using a specific antibody. The DNA found in the immunoprecipitate is then purified and sequenced, allowing researchers to determine the genomic location of the protein’s binding site. The DNA fragments can be analysed using polymerase chain reaction (PCR), microarrays, or in the case of ChIP-seq, next-generation sequencing (NGS).
The technique is one of the first to harness the power of NGS to significantly improve real-time PCR. In the ChIP-seq assay, millions of short DNA fragments are aligned with the genome and current systems generate up to 1.5 billion of these fragments per run.
Such advances have enabled initiatives such as the Encyclopaedia of DNA Elements (ENCODE) to produce more than 1000 ChIP-seq datasets.
The ChIP-seq procedure
The ChIP-seq procedure involves building cross-links between DNA and proteins in cells or tissues using formaldehyde fixation. Post cross-linking, the chromatin is fragmented into sequences of around 150 to 500 base pairs (bps). Fragmentation must be sufficient and reproducible since the library of fragments used for sequencing requires sequences of 200 to 300bp.
Following fragmentation, an antibody specific to a particular protein is used to isolate the DNA fragments (immunoprecipitation) associated with the protein. The material is then amplified and 200 to 300bp fragments are selected for sequencing.
The short fragments, which are referred to as ‘tags’ are then mapped against a reference genome and a process called ‘peak-calling’ uses algorithms to identify regions where the tags are enriched. Workflows often use techniques such as differential binding or motif analysis to perform further analysis.
Types of analytes
ChIP-seq analysis can be categorized into different groups depending on the size of the peaks identified. Typically, analysis of transcription factors generates clearly defined peaks comprising 100 to 200 bp.
Study of the polyclonal antibody H3K27me3 generates less defined peaks of up to several hundred kilobases and analysis of RNA polymerase II generates a mixture of clearly defined and less well defined peaks.
Most peak-calling algorithms are designed for experiments that generate clearly defined peaks, since motif analysis and differential binding can be performed at a nucleotide resolution.
Unlike other techniques used to interrogate the genome, ChIP-seq does not require prior knowledge of DNA binding sites and the use of probes developed from known sequences. The technique has improved the understanding of genetic regulation across a wide range of diseases and biological pathways and allowed detailed investigation of the DNA−protein interactions across the entire genome.
Further Reading