Jul 23 2008
Newly developed software will help to allay patients' fears about who has access to their confidential data. Research published today in the open access journal BMC Medical Informatics and Decision Making describes a computer program capable of deleting details from medical records which may identify patients, while leaving important medical information intact.
Patient records that are to be shared within the research community must have any identifying information removed. Manual removal of identifying information is prohibitively expensive and time consuming. Considerable research by many investigators has focussed on developing automated techniques for "de-identifying" medical records. A team from the Massachusetts Institute of Technology (MIT) funded by the National Institutes of Health (NIH) aimed to solve this problem, pointing out that: "Text-based patient medical records are a vital resource in research. The expense of manual de-identification, coupled with the fact that it is time-consuming and prone to error, necessitates automatic methods for large-scale de-identification."
The MIT team tested their censoring software on a meticulously hand-annotated database of 1836 nursing notes (a total of 296,400 words). According to the authors, "The software successfully deleted more than 94% of the confidential information, while wrongly deleting only 0.2% of the useful content. This is significantly better than one expert working alone, at least as good as two trained medical professionals checking each other's work and many, many times faster than either."
The MIT team is also providing access to the fully-scrubbed annotated data together with the software to allow others to improve their systems, and to allow the software to be adapted to other data types that may exhibit different qualities.