Affymetrix, Inc. (NASDAQ: AFFX) today announced the release of a complete data set of 5 million variants on its website. The genotyping data set, part of the Axiom™ Genomic Database, is based on extensive validation of genomic variants from the Single Nucleotide Polymorphism Database (dbSNP), 1000 Genomes Project, NHGRI Database of Published Associations, and collaborations that have led to the discovery of novel SNPs and insertion/deletions (indels). The data set includes genotyping data for more than 2 million validated rare and common genomic variants that Affymetrix recently contributed to the 1000 Genomes Project, many of which were not previously available from any source. The data will be incorporated into the 1000 Genomes Project's public data repository.
"With the availability of millions of novel markers in more diverse populations, along with the increasing capacity of our microarray technology, we are providing researchers with more resources to accelerate their disease association studies," said Kevin King, President and CEO of Affymetrix. "By releasing this unparalleled data set, Affymetrix is giving clinical researchers better access to valuable content and making it easier to optimize their discovery and validation efforts."
Affymetrix has ensured that all variants in the Axiom Genomic Database are highly informative and reliable through a comprehensive SNP screening and validation effort. This program is critical because many SNP discovery sources contain only putative SNPs, in which the minor allele frequency (MAF) is unknown or observed infrequently in only two chromosomes. As a result, many of these sources contain monomorphic SNPs that can yield a false positive rate greater than 20 percent, or may not generate high data quality in an assay. Competing arrays are often designed in silico against putative SNPs, which requires the research community to spend time and money validating the genomic content themselves. By pre-screening all content against stringent performance metrics, Affymetrix alleviates this burden for the researcher and ensures each marker can be reliably genotyped for the rare allele in the Axiom assay.
All variants were tested against a large, diverse sample set, including 1,300 samples across 11 populations from the International HapMap Project. The result is that all validated markers offer a broader view of alleles across diverse populations. Additionally, the data set offers a conversion rate of greater than 98 percent for all validated markers, which allows researchers to quickly design and implement fully customizable Axiom™ myDesign™ Arrays without any gaps in genomic coverage or the need to optimize the assay in their laboratories.
The Axiom myDesign Arrays enable researchers to create their own customized genotyping arrays with up to 2.6 million markers for candidate gene and genome-wide association studies by disease, pathway, and population. Scientists can submit their own sequences to design their arrays and leverage the Axiom Genomic Database, which includes an unprecedented number of genetic variants in key pathways, such as cardiovascular, cancer, drug metabolism, human leukocyte antigen (HLA), and immunity/inflammation, as well as other SNP classifications. The database also provides greater support for meta-analysis through imputation.
The files, located at www.affymetrix.com/Axiomdatabase, contain data on 5.4 million SNPs, including approximately 1.8 million from HapMap and dbSNP, approximately 3 million from the 1000 Genomes Project 2009 release, and an additional 0.6 million from collaborative discovery projects. Almost all of the 1.8 million SNPs from HapMap and dbSNP were genotyped on 11 populations comprising more than 1,000 individuals, while the remaining 3.6 million SNPs were genotyped on the three HapMap populations comprising roughly 270 individuals, including Utah residents with European ancestry (CEU), Han Chinese in Beijing, China (CHB), Japanese in Tokyo, Japan (JPT), and Yoruba in Ibadan, Nigeria (YRI).
The Axiom Genomic Database and its genotyping data set is the result of an ongoing screening pipeline that leverages the company's production-scale infrastructure and its capacity to analyze billions of genotypes. Affymetrix continues to screen novel putative SNPs to expand the Axiom Genomic Database and will publicly release additional data sets throughout 2011 and beyond.