Building on data from The Cancer Genome Atlas (TCGA) project, a multi-institutional team of scientists has completed the first large-scale "proteogenomic" study of breast cancer, linking DNA mutations to protein signaling and helping pinpoint the genes that drive cancer. Conducted by members of the National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium (CPTAC), including Baylor College of Medicine, Broad Institute of MIT and Harvard, Fred Hutchinson Cancer Research Center, New York University Langone Medical Center and Washington University School of Medicine, the study takes aim at proteins, the workhorses of the cell, and their modifications to better understand cancer. Appearing in Nature online May 25, the study illustrates the power of integrating genomic and proteomic data to yield a more complete picture of cancer biology than either analysis could do alone.
"We don't fully understand how complex cancer genomes translate into the driving biology that causes relapse and mortality," said Dr. Matthew Ellis, professor and director of the Lester and Sue Smith Breast Center at Baylor College of Medicine and a senior author of the paper. "These findings show that proteogenomic integration could one day prove to be a powerful clinical tool, allowing us to traverse the large knowledge gap between cancer genomics and clinical action."
The effort produced a broad overview of the landscape of the proteome (all the proteins found in a cell) and the phosphoproteome (the sites at which proteins are tagged by phosphorylation, a chemical modification that drives communication in the cell) across a set of 77 breast cancer tumors that had been genomically characterized in the TCGA project. Although the TCGA produced an extensive catalog of somatic mutations found in cancer, the effects of many of those mutations on cellular functions or patients' outcomes are unknown. In addition, not all mutated genes are true "drivers" of cancer -- some are merely "passenger" mutations that have little functional consequence. And some mutations are found within very large DNA regions that are deleted or present in extra copies, so winnowing the list of candidate genes by studying the activity of their protein products can help identify therapeutic targets.
In this study, the researchers analyzed breast tumors using accurate mass, high-resolution mass spectrometry, a technology that extends the coverage of the proteome far beyond the coverage that can be achieved by traditional antibody-based methods. This allowed them to scale their efforts and quantify more than 12,000 proteins and 33,000 phosphosites, an extremely deep level of coverage.
"Advances in sample handling and instrumentation have brought on a revolution in mass spectrometry-based proteomics," said senior author Dr. Steven Carr, director of the Broad Institute's Proteomics Platform and a CPTAC principal investigator. "We can now apply that to the phosphoproteome, which is of central importance to understanding signaling in cancer and other diseases. Our approach produces robust and reproducible data, at a scale unachievable before."
As with other cancers, breast cancer tumors are known to harbor many mutations, so studying them all would require an endless number of experiments to discern the effects of various combinations of mutations in a model system. With this approach, however, the team can study the cancer cell in which those mutations evolved and analyze the integrated output of the cell's proteins.
"There is great potential for new insights to come from the combined analysis of cancer proteomic and genomic data, as proteomic data can now reproducibly provide information about protein levels and activities that are difficult or impossible to infer from genomic data alone," said Dr. Douglas Lowy, acting director of the National Cancer Institute, part of the National Institutes of Health.
This analysis uncovered new protein markers and signaling pathways for breast cancer subtypes and tumors carrying frequent mutations such as PIK3CA and TP53 mutations. The team also correlated copy number alterations (extra or missing DNA) in some genes with protein levels, allowing them to identify 10 new candidate regulators. Two of these candidate genes, SKP1 and CETN3, can be connected to the oncogene EGFR, which is a marker for a particularly aggressive breast cancer subtype, known as "basal-like" tumors.
Using transcriptional (mRNA) profiling, scientists have divided breast cancer into four major subtypes: luminal A and B subtypes, basal-like tumors, and HER2-enriched tumors. In this work, the researchers used proteomic and phosphoproteomic data to recapitulate basal and luminal subtypes. They were also able to identify a stromal-enriched cluster and, by clustering tumors based on phosphorylation pathways, they highlighted a G-protein-coupled receptor subgroup not seen with mRNA approaches.
In the study and treatment of breast cancer, scientists and physicians hope to identify more druggable kinase proteins in addition to HER2, which can be targeted with trastuzumab (Herceptin) but only in 20 percent of breast cancers that overexpress the HER2 protein. In this study, the researchers conducted an outlier analysis of the phosphorylation states of kinase enzymes, which highlighted aberrantly activated kinases in breast cancer samples, such as HER2, CDK12, PAK1, PTK2, RIPK2 and TLK2.
"It's always been important to get through to the molecules at work in the cell -- the proteins -- and this integrative exercise really gives us a whole new understanding of the landscape," said Dr. Li Ding, assistant director of the McDonnell Genome Institute at Washington University in St. Louis. "The proteogenomic approach shows potential for funneling down to a much smaller set of proteins and modifications that are the interesting drivers that we should think about from a therapeutic standpoint."