technical issues in aggregating and analyzing data … · technical issues in aggregating and...

17
Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD, MS [email protected] Vanderbilt University, Nashville, Tennessee, USA 2/12/2015

Upload: ngohanh

Post on 11-Aug-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Technical Issues in Aggregating and Analyzing Data … · Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD, MS josh.denny@vanderbilt.edu

TechnicalIssuesinAggregatingandAnalyzingDatafrom

HeterogeneousEHRSystemsJosh Denny, MD, MS

[email protected] University, Nashville, Tennessee, USA

2/12/2015

Page 2: Technical Issues in Aggregating and Analyzing Data … · Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD, MS josh.denny@vanderbilt.edu

EHRdataaredense

196,693 individuals in an EHR DNA Biobank (BioVU)• Mean follow‐up – 5.7 yrs• Distinct ICD9 codes – 19 million• Labs – 121 million

– Distinct labs – 5948– Avg labs/patient – 662

• Drugs – 122 million• Notes – 26 million (average 132 notes/individual)• Radiology tests – 2 million

Page 3: Technical Issues in Aggregating and Analyzing Data … · Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD, MS josh.denny@vanderbilt.edu

Identify phenotype of interest

Case & control algorithm 

development and refinement

Manual review; assess 

precision

Deploy at site 1 Genetic 

association tests;

replicate

PPV≥95%

PPV<95%

ApproachtoEHRphenotyping

Validate at other sites

Extant Genotypes

Page 4: Technical Issues in Aggregating and Analyzing Data … · Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD, MS josh.denny@vanderbilt.edu

Clinical Notes(NLP - natural language

processing)Billing codesICD9 & CPT

MedicationsePrescribing

& NLPLabs & test results

NLP

Whatwe’velearned‐ FindingphenotypesintheEMR

True cases

Page 5: Technical Issues in Aggregating and Analyzing Data … · Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD, MS josh.denny@vanderbilt.edu

Findingcases:RheumatoidArthritis

255 507 1184

Definite Cases(algorithm-defined)

Possible Cases(require manual review)

Controls(algorithm-defined)

Excluded(algorithm-defined)

7121

Analysis

Optional Manual Review

Page 6: Technical Issues in Aggregating and Analyzing Data … · Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD, MS josh.denny@vanderbilt.edu

ReplicatingknownstudiesintheEHR

0.5 5.01.0Odds Ratio

rs2200733 Chr. 4q25rs10033464 Chr. 4q25rs11805303 IL23Rrs17234657 Chr. 5rs1000113 Chr. 5rs17221417 NOD2rs2542151 PTPN22rs3135388 DRB1*1501rs2104286 IL2RArs6897932 IL7RArs6457617 Chr. 6rs6679677 RSBN1rs2476601 PTPN22rs4506565 TCF7L2rs12255372 TCF7L2rs12243326 TCF7L2rs10811661 CDKN2Brs8050136 FTOrs5219 KCNJ11rs5215 KCNJ11rs4402960 IGF2BP2

Atrial fibrillation

Crohn's disease

Multiple sclerosis

Rheumatoid arthritis

Type 2 diabetes

disease gene / regionmarker

2.0

Am J Hum Genet. 2010;86:560‐72.

observedpublished

Page 7: Technical Issues in Aggregating and Analyzing Data … · Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD, MS josh.denny@vanderbilt.edu

DiscoveryscienceineMERGE

Am J Hum Genet. 2011;89:529-42

Algorithms can be deployed

across multiple EMRs

Analyses can be performed using extant

data

Page 8: Technical Issues in Aggregating and Analyzing Data … · Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD, MS josh.denny@vanderbilt.edu

CompletedeMERGEGWASDiseases• Dementia• Cataracts• Autoimmune Hypothyroidism• Diverticulosis/diverticulitis• Type 2 Diabetes• Diabetic retinopathy • Herpes zoster• PheWAS• Peripheral Arterial Disease• Venous Thromboembolism • Glaucoma• Ocular hypertension• Abdominal Aortic Aneurysm • Colon polyps

Endophenotypes• PR Duration• QRS Duration• HDL/LDL• height • white blood cell counts• red blood cell counts• Cardiorespiratory Fitness• ESR levels• Platelet levels

Pharmacogenomic phenotypes• ACE inhibitor cough• Heparin induced thrombocytopenia• Resistant hypertension• Drug Induced Liver Injury• C. difficile colitisbold=GWAS completed with 

significant results

Selected consortia contributions• Height• QTc• Rheum. Arthritis• Myocardial Infarction 

Genetics Consortium• Intl. Mult Sclerosis Genet. 

Consort.• Genomic Investigation of 

Statin Therapy

Page 9: Technical Issues in Aggregating and Analyzing Data … · Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD, MS josh.denny@vanderbilt.edu

85 phenotypes from eMERGE, PGRN, PCORnet47 have validation data118 total implementations

Page 10: Technical Issues in Aggregating and Analyzing Data … · Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD, MS josh.denny@vanderbilt.edu

Hypothyroidismalgorithm

Page 11: Technical Issues in Aggregating and Analyzing Data … · Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD, MS josh.denny@vanderbilt.edu

Performanceof88PhenotypeAlgorithmsinPheKB

0%

20%

40%

60%

80%

100%

Primary site Secondary sites

Positiv

e Pred

ictiv

e Va

lue

Positive Predict Value

Site Implementations

MedianDrug-induced

liver injury

Page 12: Technical Issues in Aggregating and Analyzing Data … · Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD, MS josh.denny@vanderbilt.edu

The phenome‐wide association study

Target phenotype

chromosomal locationasso

ciat

ion

P va

lue

Target genotype

diagnosis code

asso

ciat

ion

P va

lue

PheWAS requirement: A large cohort of patients with genotype data and many diagnoses

The genome‐wide association studyExample new PheWAS associations for IRF4

Known: hair, skin, eye color

Page 13: Technical Issues in Aggregating and Analyzing Data … · Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD, MS josh.denny@vanderbilt.edu

Phenotype Cases ControlsClopidogrel in CV disease 225 468Warfarin stable dose 1,167 N/AEarly Repolarization 544 2,609Vancomycin stable dose 1,067 N/AC. difficile colitis 941 1,710Anthracycline cardiomyopathy 528 N/AGuillain-Barre Syndrome 97 6,536Heart Transplant 181 N/AKidney transplant 1,078 N/AClopidogrel in strokes/TIAs 6 123Statin-related myopathy 11 4,342Heparin-induced thrombocytopenia 73 2,300CV events with COX2 therapy 85 395Serious bleeding during warfarin 259 276Amiodarone toxicity (lung, thyroid) 97 343Chronic inflammatory polyneuropathy 12 14,000*

Rheumatic Heart Disease 108 3,464ACEi cough 1,174 978Fluoroquinolones and tenopathy 87 537Warfarin stable dose in children 92 N/AMetformin efficacy 80 N/AMetformin and cancer 619 421Bisphosphonates and Atypical Fracture/Jaw Osteonecrosis 16 1,454

Wolff-Parkinson-White 197 5,551Steroid-induced Osteonecrosis 83 352Shellfish Anaphylaxis 157 14,000*Aspirin Anaphylaxis 101 4,334Bell's Palsy# 577 14,000*

Studyingdrugresponseswith

GWAS

Bowton et al., Sci Trans Med. 2014

“Only” about 120,000 samples at time of study – underpowered for many rare outcomes

90% participated in >1 study

Page 14: Technical Issues in Aggregating and Analyzing Data … · Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD, MS josh.denny@vanderbilt.edu

Strengths

• Rich, longitudinal data stores • Ability to go back to the chart to find out more• Research‐quality phenotypes available via algorithms

• Potential for closed‐loop discovery and implementation

• Expensive testing available “for free”• Ability to explore rare, detailed, drug‐response, and mortal phenotypes

• Samples easily reused for many studies 

Page 15: Technical Issues in Aggregating and Analyzing Data … · Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD, MS josh.denny@vanderbilt.edu

Challenges

• Developing algorithms takes time and people, and then implementation requires local expertise

• EHR data can be inaccurate, heterogeneous, unavailable, lack organization, have different storage structures

• Fragmentation between healthcare systems• Mining of EHR data is not trivial (though improving): text data, duration and temporality 

Page 16: Technical Issues in Aggregating and Analyzing Data … · Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD, MS josh.denny@vanderbilt.edu

Howdoyousharegeneticdata?

Site 5

Site 1

Site 2

Site 3Site 4

Edges (unique DUAs): n(n‐1)/2 = 10

Site 5

Site 1

Site 2

Site 3Site 4

Edges: n = 5

Coordinating Center

10 sites = 45 vs. 1020 sites = 190 vs. 2030 sites = 435 vs. 30

Page 17: Technical Issues in Aggregating and Analyzing Data … · Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD, MS josh.denny@vanderbilt.edu

Coordinating Center

: pediatric sites

Kaiser Permanente

Network DNA samples GWAS

eMERGE 361k 51k (100k)

Million Veterans Program 350k 200k

Kaiser Permanente 300k 100k

Total >1 million >351k