You are here

The MITRE Identification Scrubber Toolkit: design, training, and assessment.

published by adorack on Fri, 01/02/2015 - 17:05

Title	The MITRE Identification Scrubber Toolkit: design, training, and assessment.
Publication Type	Journal Article
Year of Publication	2010
Authors	Aberdeen, J, Bayer, S, Yeniterzi, R, Wellner, B, Clark, C, Hanauer, DA, Malin, B, Hirschman, L
Journal	Int J Med Inform
Volume	79
Issue	12
Pagination	849-59
Date Published	2010 Dec
ISSN	1872-8243
Keywords	Algorithms, Confidentiality, Data Collection, Electronic Health Records, Humans, Medical Record Linkage, Patient Identification Systems, Software
Abstract	PURPOSE: Medical records must often be stripped of patient identifiers, or de-identified, before being shared. De-identification by humans is time-consuming, and existing software is limited in its generality. The open source MITRE Identification Scrubber Toolkit (MIST) provides an environment to support rapid tailoring of automated de-identification to different document types, using automatically learned classifiers to de-identify and protect sensitive information.METHODS: MIST was evaluated with four classes of patient records from the Vanderbilt University Medical Center: discharge summaries, laboratory reports, letters, and order summaries. We trained and tested MIST on each class of record separately, as well as on pooled sets of records. We measured precision, recall, F-measure and accuracy at the word level for the detection of patient identifiers as designated by the HIPAA Safe Harbor Rule.RESULTS: MIST was applied to medical records that differed in the amounts and types of protected health information (PHI): lab reports contained only two types of PHI (dates, names) compared to discharge summaries, which were much richer. Performance of the de-identification tool depended on record class; F-measure results were 0.996 for order summaries, 0.996 for discharge summaries, 0.943 for letters and 0.934 for laboratory reports. Experiments suggest the tool requires several hundred training exemplars to reach an F-measure of at least 0.9.CONCLUSIONS: The MIST toolkit makes possible the rapid tailoring of automated de-identification to particular document types and supports the transition of the de-identification software to medical end users, avoiding the need for developers to have access to original medical records. We are making the MIST toolkit available under an open source license to encourage its application to diverse data sets at multiple institutions.
DOI	10.1016/j.ijmedinf.2010.09.007
Alternate Journal	Int J Med Inform
PubMed ID	20951082

People:

David Hanauer

University of Michigan Rogel Cancer Center at North Campus Research Complex
1600 Huron Parkway, Bldg 100, Rm 1004
Mailing Address: 2800 Plymouth Rd, NCRC 100-1004
Ann Arbor, MI 48109-2800

Research reported in this publication was supported by the National Cancer Institutes of
Health under Award Number P30CA046592. The content is solely the responsibility
of the authors and does not necessarily represent the official views of the
National Institutes of Health.

Research reported in this publication was supported by the National Cancer Institutes of
Health under Award Number P30CA046592 by the use of the following Cancer Center
Shared Resource(s): Biostatistics, Analytics & Bioinformatics; Flow Cytometry;
Transgenic Animal Models; Tissue and Molecular Pathology; Structure & Drug
Screening; Cell & Tissue Imaging; Experimental Irradiation; Preclinical
Imaging & Computational Analysis; Health Communications; Immune Monitoring;
Pharmacokinetics)

Copyright © Cancer Center Informatics-2011 Regents of the University of Michigan