question answering and semantic relations · …clinical data warehouse …clinical data network an...

46
大规 模医学临床试验文本的信息抽取与分析 Information Extraction from Large Scale Clinical Trial Text 郝天永 Tony Hao 广东外语外贸大学 信息学院 / 语言工程与计算广东省重点实验室

Upload: others

Post on 23-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

大规模医学临床试验文本的信息抽取与分析Information Extraction from Large Scale Clinical Trial Text

郝天永 Tony Hao

广东外语外贸大学信息学院 / 语言工程与计算广东省重点实验室

Page 2: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

The Problems

Page 3: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

The importance of Clinical Trial

Clinical trial is an important step in discovering newtreatments for diseases as well as new ways to detect,diagnose, and reduce the risk of disease.

Clinical trial shows researchers what does and doesn’t work inpeople.

Clinical trial helps researchers and doctors decide if the sideeffects of a new treatment are acceptable when weighedagainst the benefits offered by the new treatment.

Clinical trials help economy

Francis S. Collins, M.D., Ph.D., Director of the NIH, “United States isvery wise to invest in clinical trials”. “Not only do such investmentssave lives and improve health, they can have a powerful effect on oureconomy.”

3

Page 4: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

4

Page 5: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

1) Costly

According to the estimates released by the Tufts Center for the Study ofDrug Development, the estimated average cost of developing a newmedicine was $2.6 billion in 2013, with another $312 million for post-approval research.

2) Time-consuming

6 years on average for breast cancer

The Difficulties

5

Page 6: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

3) Complex procedure

4) Hard to recruit

The Difficulties

6

Page 7: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

7

About 90% clinical trials fails!

Page 8: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Eligibility Criteria: Central to Translational Research

author participate summarize apply

perceive screen analyze interpret

understand

translate execute

electronic screening

Page 9: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Question 1

Can we develop an method to automatically extract necessary meta

information from the free-text eligibility criteria for electronic

screening to speed up patient recruitment?

More effective & efficient

lower cost

The Research Questions

Page 10: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Study Populations: Planned vs. Real

Write Recruitment

Investigator

Publish

Target Population Study Participants

Outcome

Patient Population

Target Population

Page 11: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Related Literature

Ha, C., T.A. Ullman, C.A. Siegel, and A. Kornbluth, Patients Enrolled in Randomized Controlled Trials Do Not Represent the Inflammatory Bowel Disease Patient Population. Clinical Gastroenterology and Hepatology, 2012. 10(9): p. 1002-1007.

Sokka, T. and T. Pincus, Eligibility of patients in routine care for major clinical trials of anti–tumor necrosis factor α agents in rheumatoid arthritis. Arthritis & Rheumatism, 2003. 48(2): p. 313-318.

Schoenmaker, N. and W.A. Van Gool, The age gap between patients in clinical studies and in the general population: a pitfall for dementia research. The Lancet Neurology, 2004. 3(10): p. 627-630.

Masoudi, F.A., E.P. Havranek, P. Wolfe, C.P. Gross, S.S. Rathore, J.F. Steiner, D.L. Ordin, and H.M. Krumholz, Most hospitalized older persons do not meet the enrollment criteria for clinical trials in heart failure. American Heart Journal, 2003. 146(2): p. 250-257.

Davis, K.L., et al., A Double-Blind, Placebo-Controlled Multicenter Study of Tacrine for Alzheimer's Disease. New England Journal of Medicine, 1992. 327(18): p. 1253-1259.

Page 12: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Question 2

How to reduce the gap between clinical trial study population

representation and patient population?

Automated

High quality

Large scale

The Research Questions

Page 13: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

The Collaboration

Page 14: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

The Collaboration Project

Collaboration Project:

EliXR - Eligibility Criteria Extraction and Representation

DBMI, Columbia University Medical Center

MedNLP, Guangdong University of Foreign Studies

Grants: NIH R01LM009886, NLM, NSFC

Active Projects:

Information extraction from free-text eligibility criteria

Bridging the gap between eligibility criteria and clinical data

Coordinating clinical trial research and patient care workflows

14

Page 15: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and
Page 16: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Project developed systems

16IMPACT (Integrated Model for Patient Care and Clinical Trials)

Page 17: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Our Research Framework

17

Page 18: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

The data -

Large Scale of Clinical Trial Text

Page 19: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

…Electronic Health Records

…Clinical Data Warehouse

…Clinical Data Network

An Opportunity

ClinicalTrials.gov is a registry and results database of publicly and privately supported clinical studies of human participants conducted around the world.

ClinicalTrials.gov currently lists 217,479 studieswith locations in all 50 States and in 193 countries.

Page 20: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Clinical Trial Summaries

20

Page 21: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

21

https://www.clinicaltrial.gov/ct2/show/NCT02675257

https://www.clinicaltrial.gov/ct2/show/results/NCT01009138

Page 22: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Eligibility Criteria Examples

Known HbA1c (patient report or available records at time of enrollment) <7.5% within prior 6 months

Advanced visual acuity loss in both eyes which prohibits ability to read study materials (tested as needed with reading test using materials in appropriate size script)

Subjects with impaired fasting glucose: blood glucose ≥ 1.10 g/l and < 1 .26 g/L

History of diabetes-related medical complications clinical diagnosis of diabetes

Complex!

Extracting needed information remains a research problem!

Page 23: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Our Research Topics

Page 24: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Topic 1-Semantic Tag Mining

Background

Machine-powered Semantic Tag extraction has been proved tobe able to reduce the workload of human experts, who nolonger need to parse large number of clinical documentsmanually.

Objectives

This study is to develop a reusable and extensible automatedSemantic Tag Mining method to adapt to various clinical textsas well as provide more reliable extraction performance,aiming at reduce both developing efforts and deployment cost.

24

Page 25: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

25

Page 26: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

26

Page 27: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Topic 2-TEXer: Temporal Expression Extraction and Normalization

Background:

Automatic translation of clinical researcher data requests to executable database queries is instrumental to an effective interface between clinical researchers and “Big Clinical Data”.

A necessary step towards this goal is to parse ample temporal expressions in free-text researcher requests.

Objectives:

Develop an effective method for temporal expression extraction and normalization

TEXer develops a pattern learning algorithm and heuristic rules for extracting and normalizing temporal expressions in clinical data requests.

27

Page 28: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

› 28

Page 29: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

29

Publication

Page 30: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Topic 3- Similar Clinical Trial Clustering

Objectives

To automatically identify and cluster clinical trials with similar eligibility features

Highlights

A feature-based approach identifies clinical trials with similar eligibility criteria.

Identified clusters are effective based on Crowd sourced evaluation.

Trial clusters can facilitate similarity-based trial search and systematic review.

30

Page 31: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

31

Page 32: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Clinical trial clustering by similar semantic phenotypes Identifying similar semantic phenotypes for 5488 diseases

hospitals /researchers: view trial-phenotypes associations

Patients: a convenient way to retrieve similar clinical trials to current attended/familiar trials.

32

Page 33: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Tianyong Hao, Alexander Rusanov, Mary Regina Boland, Chunhua Weng. Clustering Clinical Trials with Similar Eligibility Criteria Features. Journal of Biomedical Informatics, Volume 52, pp. 112–120, 2014. (SCI, 5-year IF 3.442, ISI-JCR Q1)

33

Publication

Page 34: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Topic 4 -TrialSmart: Towards Personalized Search of Clinical Trials through Preference Learning

Finding an appropriate clinical trial often pose difficulties for potential volunteers.

This study proposed a model for personalized search of clinical trial based on users’ profiles and search behaviors by leveraging the widespread availability of mobile devices.

The model utilizes a user-preference matrix for trial filtering and a behavior-learning module for preference prediction.

We developed a search tool – TrialSmart, enabling users to search clinical trials using mobile devices based on their personalized preferences.

34

Page 35: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

35

Page 36: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

36

Page 37: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Topic 5-Gior: A Virtual Population Gender Architecture for Fast Enhancing Clinical Trial Recruitment

Gender is fundamental and essential information for eligibility criteria electronic prescreening aiming for recruiting appropriate participants for human studies. Transgender population is not identified

Objectives: aims to design flexible and extensible virtual population gender architecture for fast enhancing trial recruiting, as well as an automated approach for high accurate gender identification and validation.

Significances:

An unique approach for recruiting transgender population for clinical study, as well as reduce trial filtering effort for other types of population.

A low-cost and extensible virtual gender architecture that adapts current trial recruitment systems.

An automated approach for population gender identification from free text for the purpose of gender validation.

37

Page 38: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

38

Relation type Description Example

Implicationrelation

x has an implication of y x = Transgender Femaley = Biological Male

Reverse relation x is NOT y x = Transgender Maley = Not Transgender Male

Constrainedequivalencerelation

x is equivalent to y in thecondition z

x = Transgender Maley = Not TransgenderFemalez = Transgender

Topologicalrelation

y is the topology of x x = Transgender Femaley = Transgender Both

Trial ID Original Gold standard Annotator 1 Annotator 2 Annotator3

NCT0011

1501

both 'Biological Both',

'Transgender

Both'

'Biological Both',

'Transgender

Both'

'Biological Both',

'Transgender

Both'

'Biological Both',

'Transgender

Both'

NCT0013

5148

female 'Biological

Female',

'Transgender

Female'

'Biological

Female','Transge

nder Female'

'Biological

Female','Transge

nder Female'

'Biological

Female','Transge

nder Female'

NCT0014

6146

female 'Biological

Female',

'Transgender

Male'

'Transgender

Male'

'Biological

Female','Transge

nder Male'

'Transgender

Male'

NCT0024

1202

both 'Biological Both',

'Transgender

Both'

'Biological Both',

'Transgender

Both'

'Transgender

Both'

'Biological Both',

'Transgender

Both'

NCT0043

5513

both 'Transgender

Both'

'Transgender

Both'

'Transgender

Both'

'Biological Both',

'Transgender

Both'

NCT0063

4218

both 'Biological Both',

'Transgender

Both'

'Biological Both',

'Transgender

Both'

'Biological Both',

'Transgender

Both'

'Biological Both',

'Transgender

Both'

NCT0086

5566

male 'Biological Male',

'Transgender

Female'

'Biological Male',

'Transgender

Female'

'Biological Male',

'Transgender

Both'

'Biological Male',

'Transgender

Female'

NCT0106

5220

both 'Transgender

Both'

'Transgender

Both'

'Transgender

Both'

'Transgender

Both'

NCT0107

2825

both 'Transgender

Both'

'Transgender

Both'

'Transgender

Both'

'Transgender

Both'

NCT0128

3360

male 'Biological Male',

'Transgender

Female'

'Biological Male',

'Transgender

Female'

'Biological Male',

'Transgender

Female'

'Biological Male',

'Transgender

Female'

›Gior achieved a True Positive Rate of 0.917 and a False Negative Rate of 0.083. All the non-transgender trials were correctly identified.

›All the transgender trials were correctly identified as transgender type and 91.7% were correctly annotated compared with human annotations

Page 39: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

39

Page 40: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Topic 6: Valx A system for extracting and structuring numeric lab test comparison statements

Motivations

For patients: provide with the opportunity for using their known test results to accurately search eligible trials rather than to read every criteria to determine the fitness.

For hospitals/recruiters: have potential to speed up patient recruitment; clinical designer training.

For researchers: drug use analysis for specific diseases; the analysis of criteria setting for disease-specific trials.

Objectives:

To develop an automated method to accurately extract and normalize quantifiable clinical variables from free-text eligibility criteria of clinical trial summaries.

“BMI value must between 20-40kg/m” is annotated into “<VL Label=BMI Source=DK>BMI value</VL> must <VML Logic=greater_equal Unit=kg/m>20</VML> - <VML Logic=lower_equalUnit=kg/m>40</VML> ”

40

Page 41: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Method

41

Page 42: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Evaluation Results

42

Page 43: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Error analysis

43

Page 44: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Variable specification analysis

The specifications statistic of all the Diabetes Type 2 (blue lines) and Type 1 (red lines) trials containing “HBA1C” (left) and “Glucose” (right) and associated expressions

44

Page 45: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

45

Page 46: Question answering and semantic relations · …Clinical Data Warehouse …Clinical Data Network An Opportunity ClinicalTrials.gov is a registry and results database of publicly and

Thanks !

[email protected]