exploiting semantic structure for mapping user-specified form terms to snomed ct concepts

Click here to load reader

Upload: ritu-khare

Post on 24-May-2015

596 views

Category:

Technology


3 download

TRANSCRIPT

  • 1. Exploiting Semantic Structure for MappingUser-specified Form Terms to SNOMED CT Concepts Ritu Khare1,2, Yuan An1, Jiexun Li1, Il-Yeol Song1, Xiaohua Hu1The iSchool at Drexel1 College of Medicine2Drexel University, Philadelphia, PA, USA

2. Presentation Order1. Motivation2. Problems3. Solutions4. Evaluation5. Final Remarks2 3. General Motivation Database Integration and Interoperability Semantic Heterogeneity across clinical data sources(Halevy, 2005, Henry et al. 1993, Hernandez et al. 2005, Wright et al., 1999) ? MRNMed Rec #Medical Record Number BloodDiastolic Pressure Systolic BP Physical StatusConstitutional Vital Signs Recommendation: Controlled Medical Vocabularies should be involved in the design artifacts of the healthcare systems. (Jean et al., 2007, Sugumaran and Storey, 2002)3 4. Specific MotivationClinical Encounter Form Electronic Health Records (EHR) The terms on the clinical forms are mapped to, or annotatedby, a standard terminology. Domain experts may manually perform the annotation costly and tediousResearch Objective: Design an automatic tool for mapping4form terms to standard terminologies. 5. 1. Motivation2. Problem3. Solutions4. Evaluation5. Final Remarks5 6. The Mapping ProblemClinical Encounter Form SNOMED CT The Systematized Nomenclature ofMedicine - Clinical Terms (Intl.Health Terminology Stds. Dev. Org) Most comprehensive clinicalvocabulary (SNOMED CT UserGuide, 2009). >360,000 logically-defined clinicalconcepts (Hina et al., 2010,Stenzhorn et al., 2009).FormTermSNOMED CT ConceptPatient 11615400: Patient(person)MRN398225001: Medicalrecord number6 (observable entity) 7. SNOMED CT Concepts SNOMED CTconcept id: 0231832Semantic CategoriesFully-specified-name: Respiratory Rate (Observable Entity) AttributePreferred Term: Respiratory Rate Body StructureSynonym: Respiration Frequency Disorder Finding Observable Entityconcept id: 362508001 OccupationFully-specified-name: Both eyes, entire (Body Structure) PersonPreferred Term: Both eyes, entire Physical ObjectSynonym: OU- Both eyes Procedure Racial Group Situation7 8. SNOMED CT Browsers: (Rogers and Bodenreider, 2008)Existing Mapping ServicesGeneral MappingCategory Specific Mapping8 9. Challenges:Mapping Form Terms to SNOMED CT Concepts Diversity Challenge Context Challenge Different clinicians - different Same Form Term - DifferenttermsConcepts. MRN, Med. Rec.# Vital signs, Constitutional,Physical status9 10. 1. Motivation 2. Problem 3. Solution 4. Evaluation 5. Final Remarks10 11. Premises The first, i.e., the most string- The key is to identify the similar, result retrieved by the SNOMED CT semantic category-specific mapping is category appropriate for a usually the desired concept. given term. How to automatically determine the SNOMED CT Semantic ? Category appropriate for a given form term ?11 12. The term context can be derived from the SEMANTIC STUCTURE of 1 the form. The FORM TREE accurately captures the semantic intentions of the designer. Inspired by hierarchical modeling of forms (Dragut et al. 2009, Wu et al. 2009)12 13. The implicit relationship between2the term context (i.e., the semantic structure) andthe desiredsemantic category Nave Bayes Classifier can be formally captured into Based on the Bayes theorem a STATISTICAL MODEL. (Han and Kamber 2006). Procedure Class Labels (SNOMED CT Person rootsemantic categories ) attribute, body structure,ObservableEntityPatientExaminationdisorder, Data Attributes (local Name Genderstructure) Respiratory Observable Node type Entity Parent node type Observable Child node Type Entity M F Parent Semantic Categorynl perc. Grandparent Semantic FindingCategoryQualifier Value QualifierValue13 14. Overall Mapping ApproachForm TreeTraining DataNode Category Semantic SNOMEDFormStructure Attributes Classificatio Membership CategoryCT CategorySNOMED CTTermAnalyzer n Model Category ProbabilitiesPickerSpecific Concept Mapping Procedure Person rootObservableEntity PatientExaminationNameGenderRespiratory Observable ObservableEntity Entity Novelty: Hybrid Approach (leverages semantic structure as well as term 14linguistics) 15. 1. Motivation 2. Problem 3. Solution 4. Evaluation 5. Final Remarks15 16. DataManual (Gold) Annotations954 (63.55%) terms Dataset FormsTotal TermConcept IDTermsPatien11615400: Patient 1 Walk in clinic encounter 161 t (person) forms (3 forms)MRN 398225001: Medical 2 Nursing patient261 record number admission forms (6 (observable entity) forms) . 3 Labor & delivery DB294 data-entry forms (7 forms) Some Unmapped Terms 4 Adult visit encounter388no scleral icterus forms (5 forms)chronic back pain 5 Child visit encounter397 Follow up with PCP forms (5 forms)Sent to ER16 26 Forms 1501 17. Implementation (JAVA) and SettingsGold Form Design Annotations InterfaceAPI, provided bythe DatalineForm Tree Training Data Software LimitedCategory Semantic SNOMEDForm Structure Node Classificatio Membership CategoryCTCategory SNOMED CTTerm AnalyzerAttributes n Model CategoryProbabilitiesPicker Specific ConceptMapping Cross Validation 17(leave 1 out) for each dataset 18. Goal: To study whetherExperiment Designsemantic structure can improve mapping performance. SNOMED Form CT General SNOMED CTMeasures TermMapping ConceptPrecision # correct annotations/# Baseline (linguisticsannotationsonly) Recall# correct annotations/# goldannotationsCategory SemanticSNOMEDForm Structure Node Classificatio Membership Category CT SNOMED CTCategoryTerm AnalyzerAttributes n ModelCategoryProbabilitiesPickerSpecific Concept MappingHybrid (linguistics + semantic structure)Category CategorySemantic SNOMEDForm Structure Node Classificatio MembershipCategoryCT SNOMED CTPickerTerm AnalyzerAttributes n Model +candidate CategoryProbabilities Specific Concept setexpansionMapping18 Hybrid++ 19. Mapping Duration Results/form = 1- 11 s Baseline Recall low: Precision: 0.63, Recall: 0.45 SNOMED CT API uses exact Baseline to Hybridstring matching Precision by 18%. Couldnt handle the variation of terms, i.e., diversity Hybrid to Hybrid++challenge. Precision by16% , Recall by23% Hybrid++19 Precision: 0.86, Recall: 0.55 20. More Results Term processing component remove special characters -, #, /, etc. acronym expansiondictionary T (Temperature) BTL (Bilateral Tubal Precision only slightly Litigation)improved 3-5% VTE (Venous Recall improved majorly Thromboembolism) 25% Final Precision =0.89, Recall20=0.76 21. Implications Impact of Semantic Structure Overall mapping performance More number of correct predictions (context challenge) Impact of Linguistics Majorly on recall Reaches more number of relevant terms (diversity challenge) Overall Promising performance, even with limited training data Recall low because of simplicity of linguistic techniques - can be further improved using sophisticated techniques.21 22. 1. Motivation 2. Problem 3. Solution 4. Evaluation 5. Final Remarks22 23. Contributions PROBLEM: NEW problem of standardizing the terms on clinical encounter forms using SNOMED CT. Existing works (Henry et al., 1993, Barrows Jr. et al. 1994,Patrick et al. 2007) standardization of clinical notes: diagnosis, medication information, patient complaints, etc. SOLUTION: Context-based method that leverages SEMANTIC STRUCTURE of forms along with term linguistics. Existing works linguistic techniques (synonyms, morphemes, lexicalvariants)23 24. Contributions EVALUATION: 26 healthcare forms containing 950+ mappable terms specified by multiple clinicians. Improvement over existing services 23% precision, 38% recall Promising Performance precision: 0.89, recall: 0.76 FINDINGS: Linguistics helps overcome diversity challenge and improverecall Semantic structure helps overcome context challenge andimproves precision and recall. Design synergistic hybrid approaches to address allmapping challenges, and Achieve a superior performance24 25. Limitations TECHNIQUE TECHNICAL EVALUATION Post coordinated mapping Compare with other models: Handle Missing and Bayesian networks, k Inapplicable Values inNeural Networks, Training data Classification Association Rules STUDY Test the validity of Domain Expert Annotator assumptions Class conditional independence Correctness of most linguistic matching concept Classification Attributes Compare/Combine with25 other UMLS terminology 26. Future Directions Fully explore SNOMED In larger frameworks, does CTannotation help improve Defining relationships Data/Database Integration ? Data Quality ? Customize for Form Patient Diagnosis ? Categories User Interventions ? Encounter, Regular Visit, Work In Progress: Larger Knowledge Base for Integrate with flexible Electronic Training Datasets Health Record system (IHI 2010) Integration of new forms in EHR improve database integration process26 27. Thank you27