delroy cameron's dissertation defense: a contenxt-driven subgraph model for literature-based...

Post on 18-May-2015

703 Views

Category:

Education

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Literature-Based Discovery (LBD) refers to the process of uncovering hidden connections that are implicit in scientific literature. Numerous hypotheses have been generated from scientific literature, which influenced innovations in diagnosis, treatment, preventions and overall public health. However, much of the existing research on discovering hidden connections among concepts have used distributional statistics and graph-theoretic measures to capture implicit associations. Such metrics do not explicitly capture the semantics of hidden connections. ... While effective in some situations, the practice of relying on domain expertise, structured background knowledge and heuristics to complement distributional and graph-theoretic approaches, has serious limitations. .. This dissertation proposes an innovative context-driven, automatic subgraph creation method for finding hidden and complex associations among concepts, along multiple thematic dimensions. It outlines definitions for context and shared context, based on implicit and explicit (or formal) semantics, which compensate for deficiencies in statistical and graph-based metrics. It also eliminates the need for heuristics a priori. An evidence-based evaluation of the proposed framework showed that 8 out of 9 existing scientific discoveries could be recovered using this approach. Additionally, insights into the meaning of associations could be obtained using provenance provided by the system. In a statistical evaluation to determine the interestingness of the generated subgraphs, it was observed that an arbitrary association is mentioned in only approximately 4 articles in MEDLINE, on average. These results suggest that leveraging implicit and explicit context, as defined in this dissertation, is an advancement of the state-of-the-art in LBD research. Ph.D. Committee: Drs. Amit Sheth (Advisor), TK Prasad, Michael Raymer, Ramakanth Kavuluru (UKY), Thomas C. Rindflesch (NLM) and Varun Bhagwan (Yahoo! Labs) Relevant Publications (more at: http://knoesis.wright.edu/students/delroy/) D. Cameron, R. Kavuluru, T. C. Rindflesch, O. Bodenreider, A. P. Sheth, K. Thirunarayan. Leveraging Distributional Semantics for Domain Agnostic Literature-Based Discovery (under preparation) D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13), 46(2): 238–251, 2013 D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature International Bioinformatics and Biomedical Conference (BIBM11), pp. 512–519, 2011 (acceptance rate=19.4%) D. Cameron, P. N. Mendes, A. P. Sheth, V. Chan. Semantics-empowered Text Exploration for Knowledge Discovery. ACM Southeast Conference (ACMSE10), 14, 2010

TRANSCRIPT

A CONTEXT-DRIVEN SUBGRAPH MODEL FOR LITERATURE-BASED DISCOVERY

PH.D. DISSERTATION DEFENSEDELROY CAMERONAUGUST 18, 2014

PH.D. COMMITTEEAMIT P. SHETH (ADVISOR)KRISHNAPRASAD THIRUNARAYANMICHAEL RAYMERRAMAKANTH KAVULURU (UKY)THOMAS C. RINDFLESCH (NIH)VARUN BHAGWAN (YAHOO! LABS)

All truths are easy to understand once they are discovered; the point is to discover them. (Galileo Galilei, 1564–1642)

2

Historical Perspectives

Walter Sutton(1877 – 1916)

Theodor Boveri(1862 – 1915)

Gregor Johann Mendel(1822 – 1884)

Mendelian Laws of Inheritance(1866)

Boveri-Sutton Chromosome Theory(1903)

3

Science of Making Discoveries

Discovery

Information ProcessingSystem

×What is promising?

4

Thesis Statement

An information processing system that leverages rich representations of textual content from scientific literature based on implicit and explicit context can provide effective means for literature-based discovery.

5

Motivation

Rofecoxib Osteoarthritis1999 TREAT

Merck & Co.

Increased risk of Heart Attack

2002

2004

$254.3 millionSettlement

2005

VioxxWithdrawn

$4.85 billionSettlement

Confirmed byClinical Trial

2007 2011

$950 millionSettlement

2013

$23 millionSettlement

6

Motivation

Literature-Based Discovery (LBD)

7

Literature-Based Discovery (LBD)

ABC Model

AnC Model

Context-Driven Subgraph Model

A CB

A CB

1

B2

Bi

Source: Wikipedia - http://en.wikipedia.org/wiki/Don_R._Swanson

Keyword-basedConcept-based

Relations-based

2006 20111986 1996

ARROWSMITH v1Term Frequency

1999

IRIDESCENTTerm Co-occurrence

2001

DADMetaMAP

UMLS

2003

LitlinkerMeSH, UMLS, Rules

Level of Support

Contribution #1Context-Driven

Subgraph Model for LBD

SemBTSemantic Predications

Level of Support

Discovery Browsing Degree Centrality

Cooperative Reciprocity

Manual

2013

ManjalUMLS, MeSH

Topic Profiles, TF-IDF

2004

RajolinkMeSH, Rarity

BioSbKDSUMLS Relations

MeSH

2005

BITOLAUMLS, MeSHAssoc. Rules,

Confidence

Graph-based

ACS (2004)MeSH,

Hebbian Learning

A CBCAUSESINHIBITS

CAUSESA CDISRUPTS

PRODUCES

INHIBITS

STIM

ULATE

S

PRODUCES

INHIB

TS

ISA

TREATS

Discovery Patterns

Hybrid

ARROWSMITH v28 Features (2007)

Semantic MEDLINESummarization

Discovery Browsing

EpiphanetPredications-based Semantic Indexing

CoPubKeywords, Mutual

Information

2010

Literature-based discovery refers to the use of papers and other academic publications (the “literature”) to find new relationships between existing knowledge (the “discovery”).

Definition courtesy of Wikipedia: http://en.wikipedia.org/wiki/Literature-based_discovery

8

Application: Raynaud Syndrome – Fish Oil

ISA

Prostaglandin I3

CONVERTS_TO

Dietary Fish Oils

Platelet Aggregation

DISRUPTS

ISA

DISRUPTS

DISRUPTS

Epoprostenol

DISRUPTS

ISA

STIMULATES

Prostaglandin

CONVERTS_TO

Raynaud Syndrome

TREATS

CAUSES

D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13). 46(2): 238–251, 2013.

Dietary Fish Oils

Platelet Aggregation

Raynaud Syndrome

DISRUPTS CAUSESDietary Fish Oils

Platelet Aggregation

Raynaud Syndrome

Keyword/Conceptbased

Relationsbased

Subgraphbased

Inferred predicates

9

Comparison

Scenario Intermediate Cameron [19] Srinivasan [88, 89]

Weeber [101, 102]

Gordon [36,37,38]

Hristovski [40]

Raynaud Syndrome – Dietary Fish

Oils

Blood Viscosity × × × × ×

Platelet Aggregation × × × × ×

Vascular Reactivity × × × ×

Ramakrishnan [72]*

?

?

?

Table 1: Comparison of intermediates rediscovered for Raynaud Syndrome – Dietary Fish Oil

DISRUPTS

ISA

ISA

Dietary Fish Oils

Platelet Aggregation

DISRUPTS Raynaud SyndromeCAUSES

Prostaglandins

CONVERTS_TO

Prostacyclin (PGI2)

DISRUPTSProstaglandin I3

(PGI3) TREATSSTIMULATES

Raynaud Syndrome

Dietary Fish Oils

Fatty Acid

Essential Fatty Acid

Triglyceride

Lipid

ISA

DISRUPTS CAUSES

ISAINHIBIT

AFFECTS

ISA

INHIBITS

Blood Viscosity

Cellular Activity

Blood Physiology

Problem

How to automate this?

TissueFunction

D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis usingSemantic Predications. Journal of Biomedical Informatics (JBI13). 46(2): 238–251, 2013.

DISRUPTS

ISA

Dietary Fish Oils

Prostaglandin I3 (PGI3)

Prostacyclin (PGI2)

Raynaud SyndromeCAUSESVasoconstrictionINHIBIT

CONVERTS_TO

AFFECTS DISRUPTSTREATS

Literature-Based

Discovery

Context-Driven

Subgraph Model

Foundations

Automatic Subgraph Creation

Experimental Results

Dissertation Contribution

s

Knowledge Exploration

Limitations & Future

Work

PREDICATIONS GRAPH

13

. . .

Subgraph Model

Predications Graph (G)

CandidateGraph (RG)

Subgraphs (SG)

No two contexts are the same

R(s,t)(c1) R(s,t)(c2) R(s,t)(ck)

R(s,t)

. . .

. . .

What is context?

Literature-Based

DiscoveryContext-Driven

Subgraph Model

Foundations

Automatic Subgraph Creation

Experimental Results

Dissertation Contribution

s

Knowledge Exploration

Limitations &

Future Work

15

• Path Relatedness• Semantic Predication Context

Context Distribution Assumption: The context of a semantic predication can be expressed as the distribution of all MeSH descriptors associated with all articles that contain it.

Semantic Underpinnings

Relational Semantic Summary

Textual Semantic Summary

Concept-LevelSemantic Summary

Interchangeability Assumption: The concept-level and relational semantic summary of a MEDLINE article are interchangeable.

16

Linguistic Underpinnings

Linguistic items with similar distributions have similar meanings

“You shall know a word by the company it keeps”

– J. R. Firth 1957

Semantic Predications with shared contexts in their distributions are related

Distributional Semantics

Context-sensitive nature of meaning

Literature-Based

DiscoveryContext-Driven

Subgraph Model

Foundations

Automatic Subgraph Creation

Experimental Results

Dissertation Contribution

s

Knowledge Exploration

Limitations &

Future Work

18MeSH Hierarchy

MeSH Hierarchy

Automatic Subgraph Creation

m1 m2

m7 m8

m1 m7 m2 m8

m1 m5 m9 m8

Semantic Relatednessof MeSH Context Vectorsm9m1

m5 m8

Contribution #2 Context of a path

as a vector of MeSH Descriptors

pi

pj

19

Path Relatedness

3 32

5 42

2

53 6

Objective #1: Maximize weights of In-Context Descriptors

Objective #2: Minimize weights of Out-Of-Context Descriptors

C(pi)

C(pj) 1 3 1 2

2

3 00 00 02 0 0 03 22

5 42 53 61 3 1 20 00

p – patht – semantic predication

m1 m2 m3 m4 m5

m1 m2 m6 m7 m8 m9 m10 m11 m12 m13

m1 m2 m6 m7 m8 m9 m10 m11 m12 m13m3 m4 m5

C(pi)

C(pj)

20

Path Relatedness: Shared Context

1 00 00 01 0 0 01 11

1 11 11 11 1 1 10 00

Platelet aggregation

Plateletactivation

EpoprostenolPlatelet

adhesivenessProstaglandinsm3 m4 m5 m9 m10 m11 m12 m13

G-Tree

platelet aggregation

hemostasis

Blood physiological

process

Blood physiological phenomena

Circulatory and respiratory physiological phenomena

platelet adhesiveness

platelet activation Epoprostenol

D-Tree

Prostaglandins I

Arachidonic Acids

Fatty Acids, Unsaturated

Fatty Acids

Lipids

Prostaglandins

Eicosanoids

Contribution #3 Structured Background Knowledge

for computing shared context of paths

C(pi)

C(pj)

21

Path Relatedness Score

*Dictionary of Distances, Elena Deza, Michel-Marie Deza, Elsevier, 2006

22

Hierarchical Agglomerative Clustering

A C A CA CA C A CA CA C A C

Iteration 1

Iteration n

. . .Bucket PopulationBucket Merging. . .

A C

A C

A C

A C

Path Relatedness Threshold

1. Bucket Population

2. Bucket Merging

3. Subgraph Ranking

23

Summary of Metrics

• Path Relatedness– Model: MeSH Context Vectors– Metrics: Semantics-enhanced shared context, Log Reduction– Threshold: ??

• MeSH Semantic Similarity– Model: MeSH Hierarchy– Metrics: Dice Similarity– Threshold: Manually

24

Automatic Threshold Selection

RS-DFO Experiment

Manual Threshold = 3.0

Gaussian Distribution

Path Relatedness Score

Num

ber

of P

ath

Pai

rs

25

Automatic Threshold Selection

Gaussian Function

Path Relatedness Score

Exp

ecte

d V

alue

26

Automatic Threshold Selection

• Gaussian Distribution

Diagram courtesy of Wikipedia*

Points of Inflection

27

Threshold Comparisons

ScenarioPath Relatedness Score

Max2 Std Dev. Manual 3 Std Dev.

RS-DFO 2.68 3.0 3.04 3.38

Testosterone-Sleep 3.35 3.5 3.8262 6.22

DEHP-Sepsis 3.94 4.0 4.53 4.84

Table 2: Path Relatedness Threshold Comparisons

28

Bucket Merging

Ba

Bb

Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze: Introduction to information retrieval. Cambridge University Press 2008, ISBN 978-0-521-86571-5, pp. I-XXI, 1-482

Straggly Clusters Compact Clusters

Broad Clusters

29

Subgraph Ranking

Intra-Cluster Rank

30

Singleton Ranking

Association Rarity

31

Summary of Metrics

• Path Relatedness– Model: MeSH Context Vectors– Metrics: Semantics-enhanced shared context, Log Reduction– Manual Threshold for Semantic Similarity, Dice Similarity– Threshold: 2nd Standard Deviation from Mean of Gaussian

• Bucket Relatedness– Model: Set of Paths– Metric: Inter-Cluster Similarity– Threshold: 2nd Standard Deviation from Mean of Gaussian

• Subgraph Ranking– Metrics: Intra-Cluster Similarity, Singleton Rank (Association Rarity)

32

Algorithm

Time Complexity: Θ(N 2logN )

Literature-Based

DiscoveryContext-Driven

Subgraph Model

Foundations

Automatic Subgraph Creation

Experimental Results

DissertationContribution

s

Knowledge Exploration

Limitations &

Future Work

34

Raynaud Syndrome – Dietary Fish Oil

Inferred predicates

Path Relatedness Threshold = 3σ

Scenario 1: Raynaud Syndrome – Dietary Fish Oil

Details Intermediate Association Status

Cut-off date: Nov. 1985By. D. R. Swanson(Article)

Blood ViscosityDietary Fish Oils INHIBITS Blood

ViscosityBlood Viscosity CAUSES Raynaud

SyndromeZR-15

Platelet AggregationDietary Fish Oils INHIBITS Platelet

AggregationPlatelet Aggregation CAUSES Raynaud

SyndromeS1

VasoconstrictionDietary Fish Oils INHIBITS

VasoconstrictionVasoconstriction CAUSES Raynaud

Syndrome

Legend

ZR-zero rarity singleton

S-Subgraph

Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/

Scenario 2: Magnesium – Migraine

Details Intermediate Association Status

Cut-off date: Apr. 1987By. D. R. Swanson(Article)

Calcium Channel BlockersMagnesium ISA Calcium Channel

BlockerCalcium Channel Blockers TREATS

MigraineS22

Epilepsy Magnesium AFFECTS Epilepsy Epilepsy CO_EXISTS_WITH Migraine S9

Hypoxia Magnesium INHIBITS Hypoxia Hypoxia ASSOCIATED_WITH Migraine

Inflammation Magnesium INHIBITS Inflammation Inflammation CAUSES Migraine ZR-3

Platelet ActivityMagnesium INHIBITS Platelet

AggregationPlatelet Aggregation CAUSES Migraine S1

ProstaglandinsMagnesium STIMULATES

ProstaglandinsProstaglandins DISRUPTS Migraine S4

Stress/Type A Personality STRESS INHIBITS Magnesium Stress ASSOICATED_WITH Migraine

Serotonin Magnesium INHIBITS Serotonin Serotonin CAUSES Migraine S1

Cortical DepressionMagnesium INHIBITS Spreading

Cortical DepressionSpreading Cortical Depression CAUSES

Migraine

Substance P Magnesium INHIBITS Substance P Substance P CAUSES Migraine

Vascular Mechanisms Magnesium INHIBITS Vasoconstriction Vasoconstriction CAUSES Migraine S9

Legend

ZR-zero rarity singleton

S-Subgraph

Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/

Scenario 3: Somatomedin C – Arginine

Details Intermediate Association Status

Cut-off date: Apr. 1989By. D. R. Swanson(Article)

Growth HormoneArginine STIMULATES Growth

HormoneGrowth Hormone STIMULATES

Somatomedins (IGF1)S5

Body Weight (body mass)Somatomedins (IGF1) STIMULATES

GrowthArginine STIMULATES Growth S7

Malnutrition Somatomedins TREATS Malnutrition Arginine TREATS Malnutrition S7

Wound Healing (NK activity)

Somatomedins STIMULATES Wound Healing

Arginine STIMULATES Wound Healing

Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/

Legend

ZR-zero rarity singleton

S-Subgraph

Not Found

Scenario 4: Indomethacin – Alzheimer’s Disease

Details Intermediate Association Status

Cut-off date: Jul. 1995By. Swanson/Smalheiser(Article)

Acetylcholine Indomethacin INHIBITS Acetylcholine Acetylcholine CAUSES Alzheimers S4

Lipid PeroxidationIndomethacin INHIBITS Lipid

PeroxidationLipid Peroxidation CAUSES Alzheimers S2

M2-MuscarinicIndomethacin INHIBITS M2-

MuscarinicM2-Muscarinic CAUSES Alzheimers

Membrane FluidityIndomethacin INHIBITS Membrane

Fluidity Membrane Fluidity CAUSES Alzheimers

LymphocytesIndomethacin STIMULATES Natural

Killer T-Cell ActivityT-Cell Activity INHIBITS Alzheimers S14

ThyrotropinIndomethacin STIMULATES

ThyrotropinThyrotropin AFFECTS Alzheimers ZR-20

T-lymphocytes (T-Cells)Indomethacin STIMULATES T-

lymphocytesT-lymphocyte Activity INHIBITS

AlzheimersS3

Legend

ZR-zero rarity singleton

S-Subgraph

Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/

Scenario 5: Estrogen – Alzheimer’s Disease

Details Intermediate Association Status

Cut-off date: Jul. 1995By. Swanson/Smalheiser(Article)

Antioxidant Activity Estrogen INHIBITS Antioxidant Activity Antioxidant Activity CAUSES Alzheimers S4

Aliproprotein E (ApoE) Estrogen INHIBITS ApoE ApoE CAUSES Alzheimers S3

Calbindin D28kEstrogen REGULATES Caldindin

D28kCalbindin D28k AFFECTS Alzheimers S4

Cathepsin D Estrogen STIMULATES Cathepsin D Cathepsin D PREVENTS Alzheimers

Cytochrome C Oxidase Subunit III

Estrogen STIMULATES Cytochrome C Oxidase Subunit III

Cytochrome C Oxidase Subunit IIIAFFECTS Alzheimers

Glutamate Estrogen STIMULATES Glutamate Glutamate AFFECTS Alzheimers

Receptor PolymorphismEstrogen EXHIBITS Receptor

PolymorphismReceptor Polymorphism AFFECTS

Alzheimers

Legend

ZR-zero rarity singleton

S-Subgraph

Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/

Scenario 6: Calcium Independent PLA2 – Schizophrenia

Details Intermediate Association StatusCut-off date: 1997By. Swanson/Smalheiser(Article)

Oxidative StressOxidative Stress INHIBITS Calcium-

Independent PLA2Oxidative Stress CAUSES Schizophrenia ZR-2

SeleniumSelenium INHIBITS Calcium-

Independent PLA2Selenium PREVENTS Schizophrenia ZR-2

Vitamin EVitamin E INHIBITS Calcium-

Independent PLA2Vitamin E PREVENTS Schizophrenia ZR-2

Legend

ZR-zero rarity singleton

S-Subgraph

Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/

Scenario 7: Chlorpromazine – Cardiac Hypertrophy

Details Intermediate Association StatusCut-off date: 01/01/2002By. J. D. Wren(Article)

Calcineurin Chlorpromazine INHIBITS CalcineurinCalcineurin CAUSES Cardiac

HypertrophyS5

IsoproterenolChlorpromazine INHIBITS

IsoproterenolIsoproterenol CAUSES Cardiamegaly S12

Legend

ZR-zero rarity singleton

S-Subgraph

Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/

Scenario 8: Testosterone – Sleep

Details Intermediate Association StatusCut-off date: 01/01/2012By. Miller/Rindflesch(Article)

Cortisol/Hydrocortisone Testosterone INHIBITS Cortisol Cortisol DISRUPTS Sleep S7

Legend

ZR-zero rarity singleton

S-Subgraph

Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/

Scenario 9: Diethylhexyl Phthalate (DEHP) – Sepsis

Details Intermediate Association StatusCut-off date: 01/01/2013By. Cairelli/Rindflesch(Article)

PParGamma DEHP STIMULATES PParGamma PParGamma INHIBITS Sepsis

Legend

ZR-zero rarity singleton

S-Subgraph

Not FoundResults available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/

44

Statistical Evaluation

Association Rarity Interestingness

45

Statistical Evaluation

Experiment # Unique Associations

Total MEDLINE

Frequency

Rarity r(E)

Interestingness I(E)

Raynaud-Fish Oil 10 0 0.00 1.00

Magnesium-Migraine 48 27 0.56 0.64

SomaC-Arginine 18 306 17.00 0.06

Indomethacin-Alzheimers

21 9 0.43 0.70

Estrogen-Alzheimers 42 36 0.86 0.54

PLA2-Schizophrenia 10 0 0.00 1.00

CPZ-Cardiac Hypertrophy

21 2 0.10 0.91

Testosterone-Sleep 61 654 10.72 0.09

Average 29 129 3.71 0.62

Table 3: Rarity and Interestingness score of the subgraphs in the rediscoveries

Literature-Based

DiscoveryContext-Driven

Subgraph Model

Foundations

Automatic Subgraph Creation

Experimental Results

Dissertation Contribution

s

Knowledge Exploration

Limitations &

Future Work

47

Predications-based Knowledge Exploration

Corpus

Predications Graph

Definitional Knowledge (UMLS + MeSH)

Provenance

Knowledge Abstraction

D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature International Bioinformatics and Biomedical Conference (BIBM11). 512–519 , 2011.

Contribution #4 Combining Assertional and

Definitional Knowledgefor Knowledge Exploration

48

Levels of Contexts

A CBPredication

Context

A CB

1

B2

Bi

PathContext

A CB

1

B2

B3

A CB

1

B2Shared

Context

CAUSESA CDISRUPTS

PRODUCES

INHIBITS

STIM

ULATE

S

PRODUCES

INHIB

TS

ISA

TREATSSubgraphContext

… A C

A C

A C

Dimensions

Literature-Based

DiscoveryContext-Driven

Subgraph Model

Foundations

Automatic Subgraph Creation

Experimental Results

DissertationContribution

s

Knowledge Exploration

Limitations &

Future Work

50

Dissertation Contributions

1. Context-Driven Subgraph Model– Knowledge Rediscovery & Decomposition

2. Predication/Path Context– Vector of MeSH Descriptors

3. Shared Context– Background Knowledge (MeSH Hierarchy)

4. Semantic Predications-based Text Exploration– Obvio Web Application

51

Innovation

System/TechniqueTechnique

TypeAutomatic Relational

Evidence-based

Thematic

Results

#Discoveries #Rediscoveries

IRIDESCENT [108] Keyword 1 0

ARROWSMITH [84]Keyword/Concept

5 0

DAD [101,102] Concept 0 2

BITOLA [46] Concept 0 1

Litlinker [110] Concept 0 2

Manjal [87,88] Concept × 0 5

SemBT [40,41,42] Relations × × 0 1

BioSbKDS [47] Relations × × 0 1

Wilkowski [107] Graph × × 0 0

Ramakrishnan [72] Graph × × 0 1*

Zhang [114] Graph × × × 0 0

Obvio [19, 21] Graph × × × × 0 8

ARROWSMITH v2 [86,98] Hybrid × 0 6*

Semantic MEDLINE [18,63] Hybrid × × 2 0

Note: References are from the PhD Dissertation manuscript entitled: A Context Driven Subgraph Model for Literature-Based Discovery

Table 4: Comparison of capabilities and accomplishments of LBD techniques

Literature-Based

DiscoveryContext-Driven

Subgraph Model

Foundations

Automatic Subgraph Creation

Experimental Results

DissertationContribution

s

Knowledge Exploration

Limitations &

Future Work

53

Limitations

1. Manual Threshold– MeSH Semantic Similarity

2. Path Relatedness Threshold– Only Approximate Gaussian

3. Definition of Context

54

Levels of Semantic Representation

Keywords

Concepts

MeSH Descriptors

Semantic Predications

Ensemble of Features

Relationships

A B

Semantic PredicationPREDICATE

55

Limitations

1. Manual Threshold– MeSH Semantic Similarity

2. Path Relatedness Threshold– Only Approximate Gaussian

3. Definition of Context

4. MEDLINE Querying– Deep integration of Assertional/Definitional

5. Contradiction Detection

6. Statistical Evaluation

7. Scalability of Clustering Algorithm

8. Subgraph Labeling

56

Take Away

• Future of Information Processing– Rich Knowledge Representations

o Implicit, Formal, Powerful semantics

– Application to Literature-Based Discovery

57

Conclusion

• Context-Driven Subgraph Model – Manually create Complex Associations– Automatic Subgraph Creation

o Novel definitions for Context and Shared Contexto Multiple Thematic Dimensions

– Predications-based Knowledge Exploration o Predicateso Highlighted MEDLINE sentences

– Knowledge Rediscoveryo 8 out of 9 existing scientific discoveries

58

Publications

1. D. Cameron, R. Kavuluru, T. C. Rindflesch, O. Bodenreider, A. P. Sheth, K. Thirunarayan. Context-Driven Automatic Subgraph Creation for Literature-Based Discovery (under preparation)

2. D. Cameron, A. P. Sheth, N. Jaykumar, G. Anand, K. Thirunarayan, G. A. Smith. A Hybrid Approach to Finding Relevant Social Media Content for Domain Specific Information Needs. (submitted to the Journal of Web Semantics)

3. D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13). 46(2): 238–251, 2013.

4. D. Cameron, G. A. Smith, R. Daniulaityte, A. P. Sheth, D. Dave, L. Chen, G. Anand, R. Carlson, K. Z. Watkins, R. Falck. PREDOSE: A Semantic Web Platform for Drug Abuse Epidemiology using Social Media Journal of Biomedical Informatics (JBI13). 46(6): 985–997, 2013.

5. R. Daniulaityte, R. Carlson, R. Falck, D. Cameron, S. Perera, L. Chen, A. P. Sheth. “I just wanted to tell you that Loperamide WILL WORK: A Web-Based Study of Extra-medical use of Loperamide. Journal of Drug and Alcohol Dependence (DAD13) 130(1–3): 241–244, 2013.

6. D. Cameron, V. Bhagwan, A. P. Sheth. Towards Comprehensive Longitudinal Healthcare Data Capture. International Workshop on Semantic Web in Literature-Based Discovery (SWLBD12). 241–247, 2012.

7. R. Daniulaityte, R. Carlson, R. Falck, D. Cameron, S. Perera, L. Chen, A. P. Sheth. A Web-Based Study of Extra-medical use of Loperamide. The College on Problems of Drug Dependence (CPDD12), 2012.

8. D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature. International Bioinformatics and Biomedical Conference (BIBM11). 512–519, 2011.

9. D. Cameron, B. Aleman-Meza, I. B. Arpinar, S. L. Decker, A. P. Sheth. A Taxonomy-based Model for Expertise Extrapolation. International Conference on Semantic Computing (ICSC10). 333–240, 2010.

10. D. Cameron, P. N. Mendes, A. P. Sheth, V. Chan. Semantics-empowered Text Exploration for Knowledge Discovery. ACM Southeast Conference (ACMSE10). 14, 2010.

11. C. Thomas, W. Wang, P. Mehra, D. Cameron, P. N. Mendes, A. P. Sheth. What Goes Around Comes Around – Improving Linked Open Data through On-Demand Model Creation. Web Science Conference (WebSci10), 2010.

12. P. N. Mendes, P. Kapanipathi, D. Cameron, A. P. Sheth. Dynamic Associative Relationships on the Linked Data Web. Web Science Conference (WebSci10), 2010.

59

Research Expertise

Literature-Based Discovery

Text MiningQuestion Answering

[1]

InformationRetrieval

[2]

[3]

[6]

[4]

[8]

[10]

[5]

[7]

60

Parting Words

“...some day the piecing together of dissociated knowledge will open up such terrifying vistas of reality,...that we shall either go mad from the revelation or flee from the deadly light into the peace and safety of a new dark age.”

– H. P. Lovecraft (The Call of Cthulhu, The Horror in Clay).

H. P. Lovecraft. The Call of Cthulhu. In S. T. Joshi, editor. The Call of Cthulhu and Other Weird Stories. Penguin Books Ltd., London, 1999

61

Acknowledgements

• Olivier Bodenreider• Marcelo Fiszman• Mike Cairelli• Swapna Abhyankar• Drashti Dave• Dongwook Shin

• Special Thankso Pavano Shreyansho Swapnilo Nishita

• PREDOSE Teamo Nishitao Gaurisho Alano Revathy

62

Ph.D. Committee Members

Amit P. Sheth (Advisor)

T.K. Prasad Michael Raymer

Ramakanth Kavuluru Thomas C. Rindflesch Varun Bhagwan

top related