Led by biomedical engineer Justin Zook of the National Institute of Standards and Technology, a team of scientists from Harvard University and the Virginia Bioinformatics Institute of Virginia Tech has presented new methods to integrate data from different sequencing platforms, thus producing a reliable set of genotypes to benchmark human genome sequencing.
"Understanding the human genome is an immensely complex task and we need great methods to guide this research," Zook says. "By establishing reference materials and gold standard data sets, scientists are one step closer to bringing genome sequencing into clinical practice."
The methods put forth by the researchers make it increasingly possible to use an individual's genetic profile to guide medical decisions to prevent, diagnose, and treat diseases - a priority of the National Institutes of Health. Their work was published this week in Nature Biotechnology.
"We minimize biases toward any sequencing platform or data set by comparing and integrating 11 whole human genome and three exome data sets from five sequencing platforms," says Zook.
The National Institute of Standards and Technology organized the Genome in a Bottle Consortium to make well-characterized, whole-genome reference materials available to research, commercial, and clinical laboratories.
Though several methods to integrate genomic information have been put forth by the 1000 Genomes Project - an international research effort to establish a detailed catalogue of human genetic variation - the idea to arbitrate between data sets from different sequencing methods on the same genome had never been done.
The team addressed the challenge with the expertise of David Mittelman, an associate professor of biological sciences at the Virginia Bioinformatics Institute, who creates tools that analyze vast amounts of genomic information.
The researchers created a metric to determine the accuracy of gene variations and understand biases and sources of error in sequencing and bioinformatics methods.
Their findings are available to the public on the Genome Comparison and Analytic Testing website, known as GCAT, to enable real-time benchmarking of any DNA-sequencing method. The collaborative, free online resource compares multiple analysis tools across a variety of crowd-sourced metrics and data sets.