You are here

Applying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis.

TitleApplying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis.
Publication TypeJournal Article
Year of Publication2014
AuthorsHanauer, DA, Saeed, M, Zheng, K, Mei, Q, Shedden, K, Aronson, AR, Ramakrishnan, N
JournalJ Am Med Inform Assoc
Volume21
Issue5
Pagination925-37
Date Published2014 Sep-Oct
ISSN1527-974X
KeywordsData Mining, Feasibility Studies, Humans, International Classification of Diseases, MEDLINE, Natural Language Processing, Unified Medical Language System
Abstract

OBJECTIVE: We describe experiments designed to determine the feasibility of distinguishing known from novel associations based on a clinical dataset comprised of International Classification of Disease, V.9 (ICD-9) codes from 1.6 million patients by comparing them to associations of ICD-9 codes derived from 20.5 million Medline citations processed using MetaMap. Associations appearing only in the clinical dataset, but not in Medline citations, are potentially novel.METHODS: Pairwise associations of ICD-9 codes were independently identified in both the clinical and Medline datasets, which were then compared to quantify their degree of overlap. We also performed a manual review of a subset of the associations to validate how well MetaMap performed in identifying diagnoses mentioned in Medline citations that formed the basis of the Medline associations.RESULTS: The overlap of associations based on ICD-9 codes in the clinical and Medline datasets was low: only 6.6% of the 3.1 million associations found in the clinical dataset were also present in the Medline dataset. Further, a manual review of a subset of the associations that appeared in both datasets revealed that co-occurring diagnoses from Medline citations do not always represent clinically meaningful associations.DISCUSSION: Identifying novel associations derived from large clinical datasets remains challenging. Medline as a sole data source for existing knowledge may not be adequate to filter out widely known associations.CONCLUSIONS: In this study, novel associations were not readily identified. Further improvements in accuracy and relevance for tools such as MetaMap are needed to realize their expected utility.

DOI10.1136/amiajnl-2014-002767
Alternate JournalJ Am Med Inform Assoc
PubMed ID24928177
PubMed Central IDPMC4147617
Grant ListUL1TR000433 / TR / NCATS NIH HHS / United States
People: 
David Hanauer
University of Michigan Rogel Cancer Center at North Campus Research Complex
1600 Huron Parkway, Bldg 100, Rm 1004 
Mailing Address: 2800 Plymouth Rd, NCRC 100-1004
Ann Arbor, MI 48109-2800 

Research reported in this publication was supported by the National Cancer Institutes of
Health under Award Number P30CA046592. The content is solely the responsibility
of the authors and does not necessarily represent the official views of the
National Institutes of Health.

Research reported in this publication was supported by the National Cancer Institutes of
Health under Award Number P30CA046592 by the use of the following Cancer Center
Shared Resource(s): Biostatistics, Analytics & Bioinformatics; Flow Cytometry;
Transgenic Animal Models; Tissue and Molecular Pathology; Structure & Drug
Screening; Cell & Tissue Imaging; Experimental Irradiation; Preclinical
Imaging & Computational Analysis; Health Communications; Immune Monitoring;
Pharmacokinetics)

Copyright © Cancer Center Informatics-2011 Regents of the University of Michigan