In this interview, News-Medical speaks to Maxwell Sherman, an MIT graduate student and one of the lead authors of a study that used a new method to investigate cancer genomes.
Please can you introduce yourself, tell us about your scientific background, and what inspired your latest research?
We are a multi-disciplinary team of computer scientists, mathematicians, and biologists fortunate enough to be working in the MIT and Harvard ecosystems. Most past work trying to identify mutations that drive the emergence and progression of cancer has focused on the 2% of the genome that codes for proteins. We wanted to empower the cancer research community to search 100% of the genome for mutations that may cause cancer.
Cancer cells can have thousands of mutations in their DNA. What is the difference between a mutation that drives the progression of cancer and a relatively neutral mutation?
Cancer can be understood through the lens of Darwinian evolution. Driver mutations enable a cell to grow and divide faster, thus producing more cells as progeny. Cancer results from this cellular race: once a cell accumulates enough of these driver mutations, it can divide without limit, escape the immune system, and eventually spread to other tissues, all hallmarks of cancer. On the other hand, neutral “passenger” mutations are mutations that do not affect the ability of a cell to grow or reproduce and thus do not play a role in the Darwinian evolution of cells. The vast majority of somatic mutations in our cells appear to be neutral.
What do we currently know and not know about mutations that drive cancer?
This is a big question that is difficult to answer both succinctly and accurately. Suffice it to say that decades of research have revealed major drivers of numerous types of cancer, leading to many breakthroughs in medicine’s ability to treat patients in the clinic. Yet there is still an immense amount we do not know. We don’t know the full spectrum of driver mutations in the non-coding genome, unraveling all the complexities of copy number variation (shout-out to the recent Nature papers making huge progress on this), or the role of repeat expansions. But there is undoubtedly so much we don’t know that is still waiting to be discovered.
A new model allowed you to scan the genome of cancer cells. Could you describe the model and what new insights it provided?
Our model uses a deep-learning procedure to map the somatic mutation rates across the entire genome for a cancer of interest. It then uses a tailored probabilistic model to query those maps nearly instantaneously to estimate the number of passenger mutations that should be in any region of the genome.
Our approach has several key features: 1) a mutation rate map must be trained only once for a given type of cancer (and we have already trained and made publicly available maps for 37 cancer types). It can then be applied to any cohort of patients of that tumor type; 2) users have the flexibility to specify regions anywhere in the genome down to the resolution of a single base pair; 3) our model is fast and efficient enough that users can complete genome-wide analysis in minutes on a personal computer.
One type of non-coding mutation you focused on was cryptic splice mutations. What are cryptic splice mutations, and how do they drive cancer?
Cryptic splice mutations are mutations that occur far from the boundaries of a gene’s exons but nonetheless confuse the cellular machinery that is responsible for splicing out introns and stitching the exons back together. These mutations thus lead to incorrect splicing of the gene. This most often results in nonsense mRNA transcripts that the cell just recycles or a nonfunctional protein. Either way, the gene’s correct protein product is not being made. Tumor suppressor genes generally put the brakes on cell division, keeping a cell from dividing uncontrollably. Cryptic splice mutations can render these genes nonfunctional, removing the cell’s own defenses against cancer.
This novel model also allowed you to look at known cancer-driving mutations. What did you learn about these mutations within the 37 different cancer types you studied?
We found that genes that often drive one type of cancer may occasionally driver other types of cancer as well. The construction of Dig was key to this insight. Because our model can be trained over one set of patients and applied to another set, we were able to pool together thousands of patient samples from heterogeneous sequencing studies, providing the statistical power necessary to examine these rare events.
Considering your model utilized a deep neural network, a type of deep learning, how do you see types of machine learning influencing cancer research in the future?
As the field generates larger data in sheer size and complexity, the need for tools that can automatically parse and extract meaning from these datasets will only grow. Machine learning algorithms can provide powerful approaches to this challenge. It can be particularly powerful for generating and prioritizing data-driven hypotheses about molecular mechanisms, which can then be explored experimentally, an approach that the field (including our lab) is increasingly embracing.
How may the findings of your study and the model itself influence the future development of cancer therapeutics?
We hope the cancer community will make important discoveries about cancer biology by exploring the non-coding genome. Each discovery has the potential to open up new avenues for therapeutics. Our model is a tool to help the cancer community do just that.
What is next for yourself and your research?
We’re working on some exciting new things that we look forward to sharing soon.
Where can readers find more information?
About Maxwell Sherman
I am currently a fourth-year Ph.D. candidate jointly supervised by Professor Bonnie Berger and Professor Po-Ru Loh. My research focuses on developing algorithms to uncover the role of somatic mutations in human health and disease.