Researchers from the Universidad Politécnica de Madrid's Biomedical Informatics Group based at the Facultad de Informática have created a tool called PubDNA Finder. This tool is the first search engine specialized in linking biomedical articles to nucleic acid sequences.
PubDNA Finder is an on line repository created to link documents archived at PubMed Central with the nucleic acid sequences that they contain. PubMed Central is a free digital archive maintained by the United States National Institutes of Health. Developed and administered by the National Center for Biotechnology Information (NCBI), it contains the principal documentation related to biomedicine and the life sciences published in scientific journals all over the world.
PubDNA Finder extends the capabilities of the search engine provided by PubMed Central, enabling biomedical researchers to run advanced searches on nucleic acid sequences. One of its features is to search documents that cite one or more specific nucleic acid sequences and retrieve the genetic sequences appearing in different articles.
These additional consultation facilities are provided by a search index created by archiving all 176,672 documents available at PubMed Central and the nucleic acids that they contain.
The researchers used an original method to automatically extract the genetic sequences returned by each search: an innovative system combining combines natural language processing, text mining and knowledge engineering runs unsupervised searches to retrieve genetic sequences.
The database is automatically updated every month by means of a FTP connection to the PubMed Central site to retrieve the manuscripts and new indexes. Users can query the database over the Web.