The National Institutes of Health wants to make the process of finding new drugs faster and better. The effort will help all 27 of its research institutes and centers. So, the nation's medical research agency awarded Tudor Oprea, MD, PhD, a 2-year $4.9 million grant to develop a tool scientists can use to link information about drugs, diseases and genes.
The effort is so large that the NIH divided it into different parts. Dr. Oprea, at the University of New Mexico Cancer Center, will oversee the entire project. Larry Sklar, PhD, will develop the Administrative Core for the IDG-KMC and Anton Simeonov, PhD, will develop the User Interface Portal. Dr. Oprea says, "The 27 [NIH] institutes support this initiative because they recognize that everyone needs to work on new drug targets. If this is the bottleneck, how do we prioritize it?"
Dr. Oprea's new project, called the "Illuminating the Druggable Genome Knowledge Management Center," or IDG-KMC, will improve the way scientists manage and share what they know. The IDG-KMC will link known facts about drug molecules, the genes and cellular pathways they influence and the diseases on which they have been tested. And part of the work requires connecting genes to the proteins they produce in a cell. Genes provide the blueprints for many, many different kinds of proteins and each protein has a unique shape and function. Dr. Oprea explains, "We're trying to map diseases to small molecules to [protein] targets. And making those associations is not trivial." Initially, Dr. Oprea and the IDG-KMC team will focus on four large families of proteins.
Organizing this large set of information will make the links between drugs, diseases and genes better. But Dr. Oprea and his team will go farther: they will develop tools to suggest genes for drug targeting that researchers might otherwise overlook. To explain, Dr. Oprea uses the joke of looking for a key under the lamppost. "The key might be in the dark," he says, "but we tend to look under the lamppost because that's where the light is. With the IDG-KMC, we want to illuminate the genome and prioritize proteins for more studies."
The effort to build such an advanced database and keep it current will require painstaking work. Dr. Oprea and his team will need to collect and verify each piece of information. They will work with Danish scientists Søren Brunak, PhD, and Lars Juhl Jensen, PhD, who have developed award-winning technologies for searching through large amounts of text. Called "text mining," Drs. Brunak and Jensen have used these automated searches to discover connections between genes and their effects on the cell. Dr. Oprea's team will search papers in published journals, and where possible, they will use the automated tools to find how drugs, diseases and genes relate to each other.
The IDG-KMC team will work with English scientist John Overington, PhD, whose team will search patents. They will also work with Stephan Schürer, PhD, in Miami who creates dictionaries to help computers distinguish between several meanings in English text. But the team will still need to handle information that automated searches can't. "You want the computer to take sentences and parse them in an automated way so that the computer can 'reason' if given a structured syntax," Dr. Oprea says. "We're not there yet."
Dr. Oprea's team will also use additional sources of information. Private or pilot studies, for example, can verify other pieces of information even though the scientists conducting these studies may not yet be able to publish their data. And the team plans use tissue samples and full-genome sequencing from people who have given their permission to use their information in scientific work. Clinical trials, too, may offer insight. "Why do some drugs work better than others?" asks Dr. Oprea. "In target profiling, they look identical. And yet, some of them work and some of them don't. That's why we still conduct clinical trials," he says. Dr. Oprea's team will work with teams led by Avi Ma'ayan, PhD, and Joel Dudley, PhD, at Mount Sinai Medical School. Their teams have developed unique tools to extract knowledge from large volumes of genomic data.
Dr. Oprea's team has already developed many of the tools the IDG-KMC will require. "At UNM, we've developed the technologies to map small molecule chemicals to diseases," Dr. Oprea says. More than 8,000 diseases have been indexed. Physicians can treat only about 2,000 of these diseases with drugs. "Even accounting for surgery and other [medical] specialties, this number still implies that more than half of the known diseases have no cure. So we're still a long way to go."
When complete, the pilot phase of the IDG-KMC will offer scientists a public, prioritized list of drug targets in the genome. It will also provide a Web site, or "portal," to connect drugs, diseases and genes. Scientists can expand the information as they discover new data and can create new ways to search and organize that information. "So if pharmaceutical companies, private sector or academics want to take our prioritized lists and work on them, they will do that," says Dr. Oprea. "It will be open."