You are here

Applying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis.

TitleApplying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis.
Publication TypeJournal Article
Year of Publication2014
AuthorsHanauer, DA, Saeed, M, Zheng, K, Mei, Q, Shedden, K, Aronson, AR, Ramakrishnan, N
JournalJ Am Med Inform Assoc
Volume21
Issue5
Pagination925-37
Date Published2014 Sep-Oct
ISSN1527-974X
KeywordsData Mining, Feasibility Studies, Humans, International Classification of Diseases, MEDLINE, Natural Language Processing, Unified Medical Language System
Abstract

OBJECTIVE: We describe experiments designed to determine the feasibility of distinguishing known from novel associations based on a clinical dataset comprised of International Classification of Disease, V.9 (ICD-9) codes from 1.6 million patients by comparing them to associations of ICD-9 codes derived from 20.5 million Medline citations processed using MetaMap. Associations appearing only in the clinical dataset, but not in Medline citations, are potentially novel.METHODS: Pairwise associations of ICD-9 codes were independently identified in both the clinical and Medline datasets, which were then compared to quantify their degree of overlap. We also performed a manual review of a subset of the associations to validate how well MetaMap performed in identifying diagnoses mentioned in Medline citations that formed the basis of the Medline associations.RESULTS: The overlap of associations based on ICD-9 codes in the clinical and Medline datasets was low: only 6.6% of the 3.1 million associations found in the clinical dataset were also present in the Medline dataset. Further, a manual review of a subset of the associations that appeared in both datasets revealed that co-occurring diagnoses from Medline citations do not always represent clinically meaningful associations.DISCUSSION: Identifying novel associations derived from large clinical datasets remains challenging. Medline as a sole data source for existing knowledge may not be adequate to filter out widely known associations.CONCLUSIONS: In this study, novel associations were not readily identified. Further improvements in accuracy and relevance for tools such as MetaMap are needed to realize their expected utility.

DOI10.1136/amiajnl-2014-002767
Alternate JournalJ Am Med Inform Assoc
PubMed ID24928177
PubMed Central IDPMC4147617
Grant ListUL1TR000433 / TR / NCATS NIH HHS / United States
People: 
David Hanauer
University of Michigan Comprehensive Cancer Center at North Campus Reserach Complex
1600 Huron Parkway, Bldg 100, Rm 100 
Mailing Address: 2800 Plymouth Rd, NCRC 100-1004
Ann Arbor, MI 48109-2800 
Ph. (734) 764-8848 Fax. (734) 615-0517
Please acknowledge the Cancer Center Support Grant (P30 CA046592) when publishing manuscripts or abstracts that utilized the services of the University of Michigan's Comprehensive Cancer Center's Shared Resource: Cancer Informatics.
Suggested language: "Research reported in this [publication/press release] was supported by the National Cancer Institute of the National Institutes of Health under award number P30CA046592."

Copyright © Cancer Center Informatics-2011 Regents of the University of Michigan