causal-association network extraction

36
Causal-Association Network Extraction Brett Bojduj ボボボボ ボボボ 平平平平平平平平平平

Upload: huey

Post on 06-Jan-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Causal-Association Network Extraction. 平成弐拾年 クリスマス. Brett Bojduj ボイドイ・ブレツ. Automatically create Causal-Association Network from unstructured text data Method for filtering out non-causal sentences Method fo r determining polarity of causal relation. Main Contributions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Causal-Association Network Extraction

Causal-Association Network Extraction

Brett Bojdujボイドイ・ブレツ

平成弐拾年クリスマス

Page 2: Causal-Association Network Extraction

Main Contributions

Automatically create Causal-Association Network from unstructured text data

Method for filtering out non-causal sentences

Method for determining polarity of causal relation

Page 3: Causal-Association Network Extraction

Causal-Association Network

Graph of domain terms◦Directed◦Polarity

Positive, negative, or neutral

Purpose is to aid decision-making◦Tools such as the “simulation” mode, help to promote

creative interaction

Page 4: Causal-Association Network Extraction

Tuple Extraction

Query (Yahoo!)

Extract Sentence

s

Filter Sentence

s

Extract Tuples

Score Terms in Tuples

Generate New

Queries

Page 5: Causal-Association Network Extraction

Query (Yahoo!)

Query (Yahoo!)

Extract Sentence

s

Filter Sentence

s

Extract Tuples

Score Terms in Tuples

Generate New

Queries

Term DB VerbDB

Citigroup causes

Page 6: Causal-Association Network Extraction

Extract Sentences

Query (Yahoo!)

Extract Sentence

s

Filter Sentence

s

Extract Tuples

Score Terms in Tuples

Generate New

Queries

Citigroup causes

Citigroup causes depression.

Term DB VerbDB

Citigroup supports causes.

Page 7: Causal-Association Network Extraction

Filter Sentences

Query (Yahoo!)

Extract Sentence

s

Filter Sentence

s

Extract Tuples

Score Terms in Tuples

Generate New

QueriesCitigroup causes depression.Citigroup supports causes.

Bayesian Filter

Citigroup causes depression.

Page 8: Causal-Association Network Extraction

Extract Tuples

Query (Yahoo!)

Extract Sentence

s

Filter Sentence

s

Extract Tuples

Score Terms in Tuples

Generate New

QueriesSentence Parser

<citigroup, causes, depression>

Citigroup causes depression.

Page 9: Causal-Association Network Extraction

Score Terms in Tuples

Query (Yahoo!)

Extract Sentence

s

Filter Sentence

s

Extract Tuples

Score Terms in Tuples

Generate New

Queries Term Frequency Score

Citigroup 10 101

Depression 3 10

Picture 1 2

Page 10: Causal-Association Network Extraction

Generate New Queries

Query (Yahoo!)

Extract Sentence

s

Filter Sentence

s

Extract Tuples

Score Terms in Tuples

Generate New

Queries

Term Frequency Score

Citigroup 10 101

Depression 3 10

Picture 1 2

Term DB

Page 11: Causal-Association Network Extraction

Query (Yahoo!)

Queries generated from terms and verbs◦Terms: “bankruptcy,” “oil prices,” “recession”◦Causal verbs: troponyms of verb “cause” from WordNet

with pluralizations (94 verbs)

Query structure: ◦TERM * VERB◦VERB * TERM◦e.g. “oil prices * cause”

Query

(Yahoo!) Extr

act Sentence

s

Filter Senten

ces

Extract Tuples

Score

Terms in Tuples

Generate New Queries

Page 12: Causal-Association Network Extraction

Query (Yahoo!) – Verb List

Query

(Yahoo!) Extr

act Sentence

s

Filter Senten

ces

Extract Tuples

Score

Terms in Tuples

Generate New Queries

Page 13: Causal-Association Network Extraction

Extract Sentences

E.g.: “Citigroup causes a global economic crisis.”

Sentence has TERM + VERB Save Sentence to DB

Query

(Yahoo!) Extr

act Sentence

s

Filter Senten

ces

Extract Tuples

Score

Terms in Tuples

Generate New Queries

Page 14: Causal-Association Network Extraction

Filter Sentences

Many errors in cause and effect extraction are caused by trying to extract from sentences that do not contain a causal relation.◦E.g. “Scientists predict that effects of global warming

will take many decades.”

Our remedy to this:

Bayesian Classifier

Binary Classify as Causal or

Not

Only Process Causal

SentencesQuer

y (Yahoo!) Extr

act Sentence

s

Filter Senten

ces

Extract Tuples

Score

Terms in Tuples

Generate New Queries

Page 15: Causal-Association Network Extraction

Bayesian Causal Classifier

Features:1. Bag of words without common words2. Decreasing words marked with “_dec” tag and

original word is also kept3. Causal verbs are marked with “_verb” tag and

original word is also kept4. Verb patterns plus phrase “verbPatt”5. Word Patterns plus the phrase “wordPatt”

Query

(Yahoo!) Extr

act Sentence

s

Filter Senten

ces

Extract Tuples

Score

Terms in Tuples

Generate New Queries

Page 16: Causal-Association Network Extraction

Bayesian Causal Classifier Results

Precision: 71.7%Recall: 94.3%F-Score: 81.5%

Results from 15-fold cross-validation on 1,500 annotated sentences

Possible features: CIDVSPW

Query

(Yahoo!) Extr

act Sentence

s

Filter Senten

ces

Extract Tuples

Score

Terms in Tuples

Generate New Queries

PrecisionYes

RecallYes F-ScoreYes PrecisionNo RecallNo F-ScoreNo

Baseline

65.38461538

59.13043478

62.10045662

82.65682657

86.15384615

84.36911488

C62.580645

1663.260869

5762.918918

9283.671497

5883.269230

7783.469879

52

CD62.420382

1763.913043

4863.157894

7483.867832

8582.980769

2383.421942

97

CDP69.178082

1987.826086

9677.394636

0293.886462

8882.692307

6987.934560

33

CDPW70.406504

0794.130434

7880.558139

5396.949152

5482.5

89.14285714

CDS61.616161

6266.304347

8363.874345

5584.577114

4381.730769

2383.129584

35

CDSP67.715231

7988.913043

4876.879699

2594.308035

7181.25

87.29338843

CDSPW68.779714

7494.347826

0979.560036

6697.008055

2481.057692

3188.318491

36

CDSW68.241042

3591.086956

5278.026070

7695.372460

581.25

87.74662513

CDV63.557483

7363.695652

1763.626492

9483.926852

7483.846153

8583.886483

89

CDVP70.383275

2687.826086

9678.143133

4693.952483

883.653846

1588.504577

82

CDVPW71.735537

1994.347826

0981.502347

4297.094972

0783.557692

3189.819121

45

CDVS62.448132

7865.434782

6163.906581

7484.381139

4982.596153

8583.479105

93

CDVSP68.760611

2188.043478

2677.216396

5793.962678

3882.307692

3187.749871

86CDVSPW

70.01620746

93.91304348

80.22284123

96.82899207

82.21153846

88.92355694

CDVSW69.230769

2390

78.26086957

94.90022173

82.30769231

88.15653965

CDVW70.890410

9690

79.31034483

94.97816594

83.65384615

88.95705521

CDW69.849246

2390.652173

9178.902554

495.238095

2482.692307

6988.522902

73

CI62.552301

2665

63.75266525

84.24657534

82.78846154

83.51115422

CID62.318840

5865.434782

6163.838812

384.365781

7182.5

83.42245989

CIDP68.644067

888.043478

2677.142857

1493.956043

9682.211538

4687.692307

69

CIDPW69.887278

5894.347826

0980.296022

297.042093

2982.019230

7788.900468

99

CIDS61.309523

8167.173913

0464.107883

8284.839357

4381.25

83.00589391

CIDSP67.213114

7589.130434

7876.635514

0294.382022

4780.769230

7787.046632

12

CIDSPW68.238993

7194.347826

0979.197080

2996.990740

7480.576923

0888.025210

08

CIDSW67.741935

4891.304347

8377.777777

7895.454545

4580.769230

7787.5

CIDV63.481953

2965

64.23200859

84.3537415

83.46153846

83.90526825

CIDVP69.948186

5388.043478

2677.959576

5294.028230

1883.269230

7788.322284

55

CIDVPW71.311475

4194.565217

3981.308411

2197.191011

2483.173076

9289.637305

7

CIDVS62.448979

5966.521739

1364.421052

6384.752475

2582.307692

3183.512195

12

CIDVSP68.518518

5288.478260

8777.229601

5294.150110

3882.019230

7787.667009

25CIDVSPW

69.72624799

94.13043478

80.11100833

96.92832765

81.92307692

88.79624805

CIDVSW68.988391

3890.434782

6178.269049

8695.094760

3182.019230

7788.074341

77

CIDVW70.508474

5890.434782

6179.238095

2495.164835

1683.269230

7788.820512

82

CIDW69.370860

9391.086956

5278.759398

595.424107

1482.211538

4688.326446

28

CIP68.760611

2188.043478

2677.216396

5793.962678

3882.307692

3187.749871

86

CIPW70.113085

6294.347826

0980.444856

3597.048808

1782.211538

4689.016137

43

CIS61.477045

9166.956521

7464.099895

9484.784784

7881.442307

6983.079941

15

CISP67.323481

1289.130434

7876.707202

9994.388327

7280.865384

6287.105126

88

CISPW68.346456

6994.347826

0979.269406

3996.994219

6580.673076

9288.083989

5

CISW67.857142

8690.869565

2277.695167

2995.248868

7880.961538

4687.525987

53

CIV63.225806

4563.913043

4863.567567

5783.961352

6683.557692

3183.759036

14

CIVP70.017331

0287.826086

9677.917068

4793.932827

7483.365384

6288.334182

37

CIVPW71.264367

8294.347826

0981.197380

7397.081930

4283.173076

9289.590885

55

CIVS62.525879

9265.652173

9164.050901

3884.464110

1382.596153

8583.519688

87

CIVSP68.760611

2188.043478

2677.216396

5793.962678

3882.307692

3187.749871

86

CIVSPW70.016207

4693.913043

4880.222841

2396.828992

0782.211538

4688.923556

94

CIVSW69.230769

2390

78.26086957

94.90022173

82.30769231

88.15653965

CIVW70.477815

789.782608

778.967495

2294.857768

0583.365384

6288.741044

01

CIW69.449081

890.434782

6178.564683

6695.116537

1882.403846

1588.304997

42

CP69.243986

2587.608695

6577.351247

693.790849

6782.788461

5487.946884

58

CPW70.636215

3394.130434

7880.708294

596.956031

5782.692307

6989.257913

86

CS61.585365

8565.869565

2263.655462

1884.424603

1781.826923

0883.105468

75

CSP67.661691

5488.695652

1776.763875

8294.202898

5581.25

87.24832215

CSPW68.730158

7394.130434

7879.449541

2896.896551

7281.057692

3188.272251

31

CSW68.196721

3190.434782

6177.757009

3595.056179

7881.346153

8587.668393

78

CV63.457330

4263.043478

2663.249727

3783.700862

983.942307

6983.821411

43

CVP70.454545

4587.608695

6578.100775

1993.857758

6283.75

88.51626016

CVPW71.735537

1994.347826

0981.502347

4297.094972

0783.557692

3189.819121

45

CVS62.893081

7665.217391

364.034151

5584.359726

382.980769

2383.664566

17

CVSP69.178082

1987.826086

9677.394636

0293.886462

8882.692307

6987.934560

33

CVSPW70.473083

293.913043

4880.521901

2196.843292

82.59615385

89.15412558

CVSW69.696969

790

78.55787476

94.92273731

82.69230769

88.38643371

CVW70.962199

3189.782608

779.270633

494.880174

2983.75

88.96833504

CW69.932432

4390

78.70722433

94.9339207

82.88461538

88.50102669

D65.094339

6260

62.44343891

82.89962825

85.76923077

84.3100189

DP71.701720

8481.521739

1376.297049

8591.299897

6585.769230

7788.448190

38

DPW73.090909

0987.391304

3579.603960

493.894736

8485.769230

7789.648241

21

DS64.508928

5762.826086

9663.656387

6783.745247

1584.711538

4684.225621

41

DSP70.446096

6582.391304

3575.951903

8191.580041

5884.711538

4688.011988

01

DSPW71.607142

8687.173913

0478.627450

9893.723404

2684.711538

4688.989898

99

DSW70.664206

6483.260869

5776.447105

7991.962421

7184.711538

4688.188188

19

DV65.962441

3161.086956

5263.431151

2483.333333

3386.057692

3184.673604

54

DVP72.115384

6281.521739

1376.530612

2491.326530

6186.057692

3188.613861

39

DVPW73.345588

2486.739130

4379.482071

7193.619246

8686.057692

3189.679358

72

DVS64.414414

4162.173913

0463.274336

2883.522727

2784.807692

3184.160305

34

DVSP70.244821

0981.086956

5275.277497

4891.021671

8384.807692

3187.804878

05

DVSPW71.480144

486.086956

5278.106508

8893.234672

384.807692

3188.821752

27

DVSW70.467289

7281.956521

7475.778894

4791.398963

7384.807692

3187.980049

88

DVW72.328244

2782.391304

3577.032520

3391.700819

6786.057692

3188.789682

54

DW72.022684

3182.826086

9677.047522

7591.864057

6785.769230

7788.712083

54

I65.207373

2761.521739

1363.310961

9783.395872

4285.480769

2384.425451

09

ID64.772727

2761.956521

7463.333333

3383.490566

0485.096153

8584.285714

29

IDP70.864661

6581.956521

7476.008064

5291.425619

8385.096153

8588.147410

36

IDPW72.222222

2287.608695

6579.174852

6593.949044

5985.096153

8589.303733

6

IDS64.332603

9463.913043

4864.122137

484.084372

84.32692308

84.20547288

IDSP69.981583

7982.608695

6575.772681

9591.640543

3684.326923

0887.831747

62

IDSPW71.099290

7887.173913

0478.320312

593.696581

284.326923

0888.765182

19

IDSW70.309653

9283.913043

4876.511397

4292.218717

1484.326923

0888.096433

95

IDV66.206896

5562.608695

6564.357541

983.849765

2685.865384

6284.845605

7

IDVP71.892925

4381.739130

4376.500508

6591.402251

7985.865384

6288.547347

55

IDVPW73.076923

0886.739130

4379.324055

6793.605870

0285.865384

6289.568706

12

IDVS64.317180

6263.478260

8763.894967

1883.938814

5384.423076

9284.180249

28

IDVSP69.888475

8481.739130

4375.350701

491.268191

2784.423076

9287.712287

71

IDVSPW70.967741

9486.086956

5277.799607

0793.205944

884.423076

9288.597376

39

IDVSW70.055452

8782.391304

3575.724275

7291.553701

7784.423076

9287.843921

96

IDVW72.159090

9182.826086

9677.125506

0791.872427

9885.865384

6288.767395

63

IDW71.296296

383.695652

1777 92.1875

85.09615385

88.5

IP71.401515

1581.956521

7476.315789

4791.460905

3585.480769

2388.369781

31

IPW72.595281

3186.956521

7479.129574

6893.677555

3285.480769

2389.391654

1

IS64.601769

9163.478260

8764.035087

7283.969465

6584.615384

6284.291187

74

ISP70.370370

3782.608695

6576

91.66666667

84.61538462

88

ISPW71.377459

7586.739130

4378.312070

6693.517534

5484.615384

6288.844018

17

ISW70.479704

883.043478

2676.247504

9991.858037

5884.615384

6288.088088

09

IV66.433566

4361.956521

7464.116985

3883.660130

7286.153846

1584.888678

35

IVP72.200772

281.304347

8376.482617

5991.242362

5386.153846

1588.625123

64

IVPW73.333333

3386.086956

5279.2

93.33333333

86.15384615

89.6

IVS64.653243

8562.826086

9663.726571

1183.760683

7684.807692

3184.280936

45

IVSP70.356472

881.521739

1375.528700

9191.209927

6184.807692

3187.892376

68

IVSPW71.428571

4385.869565

2277.986179

6693.136219

6484.807692

3188.777050

83

IVSW70.411985

0281.739130

4375.653923

5491.304347

8384.807692

3187.936191

43

IVW72.307692

3181.739130

4376.734693

8891.428571

4386.153846

1588.712871

29

IW71.563088

5182.608695

6576.690211

9191.744066

0585.480769

2388.501742

16

P72.200772

281.304347

8376.482617

5991.242362

5386.153846

1588.625123

64

PW73.480662

9886.739130

4379.561316

0593.625914

3286.153846

1589.734601

9

S64.705882

3562.173913

0463.414634

1583.553875

2485

84.27073403

SP70.786516

8582.173913

0476.056338

0391.511387

1685

88.13559322

SPW71.841155

2386.521739

1378.500986

1993.446088

7985

89.02316213

SW70.841121

582.391304

3576.180904

5291.606217

6285

88.17955112

V65.947242

2159.782608

762.713797

0482.917820

8786.346153

8584.597268

02

VP72.211350

2980.217391

376.004119

4690.798786

6586.346153

8588.516510

6

VPW73.556797

0285.869565

2279.237713

1493.250259

6186.346153

8589.665501

75

VS64.678899

0861.304347

8362.946428

5783.270676

6985.192307

6984.220532

32

VSP70.610687

0280.434782

6175.203252

0390.778688

5285.192307

6987.896825

4

VSPW71.897810

2285.652173

9178.174603

1793.067226

8985.192307

6988.955823

29

VSW70.833333

3381.304347

8375.708502

0291.152263

3785.192307

6988.071570

58

VW72.480620

1681.304347

8376.639344

2691.260162

686.346153

8588.735177

87

W72.360844

5381.956521

7476.860346

5991.521961

1886.153846

1588.756810

3

PrecisionYes RecallYes F-ScoreYesCDVPW 71.735 94.347 81.502CVPW 71.735 94.347 81.502CIDVPW

71.311 94.565 81.308

CIVPW 71.264 94.347 81.197CPW 70.636 94.130 80.708

Page 17: Causal-Association Network Extraction

Extract Tuples

ROOT

ADJ NP VP

NP

Input = Sentence

Stanford Parser

Search Parse Tree

Extract

Parse Tree:

Query

(Yahoo!) Extr

act Sentence

s

Filter Senten

ces

Extract Tuples

Score

Terms in Tuples

Generate New Queries

Page 18: Causal-Association Network Extraction

Tuple Format

Query

(Yahoo!) Extr

act Sentence

s

Filter Senten

ces

Extract Tuples

Score

Terms in Tuples

Generate New Queries

Tuple

CauseVerb

Effect

Page 19: Causal-Association Network Extraction

Extraction Strategy

Strategy

Find Terms

in Cause

or Effect

Build n-

grams where

no Term

Query

(Yahoo!) Extr

act Sentence

s

Filter Senten

ces

Extract Tuples

Score

Terms in Tuples

Generate New Queries

Page 20: Causal-Association Network Extraction

Extraction Strategy

Sentence: “Citigroup Corp. determines common stock price.”

Cause:◦citigroup

Effects:◦common◦common stock◦common stock price◦stock◦stock price◦price

Query

(Yahoo!) Extr

act Sentence

s

Filter Senten

ces

Extract Tuples

Score

Terms in Tuples

Generate New Queries

Term DB

Page 21: Causal-Association Network Extraction

Extraction Strategy Cont…

Sentence: “Citigroup Corp. determines common stock price.”

Cause:◦citigroup

Effect:◦stock price

Query

(Yahoo!) Extr

act Sentence

s

Filter Senten

ces

Extract Tuples

Score

Terms in Tuples

Generate New Queries

Term DB

Page 22: Causal-Association Network Extraction

Score Terms in Tuples

Storing n-grams stores lots of bad tuples in addition to the correct tuples

Our remedy to this: Score each record◦Frequency2 + NumWords

This works fairly well. For future work could consider TF/IDF and/or term clustering

Example of scores:◦ Inflation = 21,825◦War = 18,734◦Central banks = 120◦Framing = 3

Query

(Yahoo!) Extr

act Sentence

s

Filter Senten

ces

Extract Tuples

Score

Terms in Tuples

Generate New Queries

Page 23: Causal-Association Network Extraction

Cause and Effect Extraction Results

Accuracy: 86%◦Random Sample: 50◦Correct: 43

Larger sample size is in progress

Page 24: Causal-Association Network Extraction

Network Generation

POLARITY CLASSIFICATION

COMPUTE CO-OCCURRENCE OF TUPLES

CAUSAL NETWORK VIZUALIZATION

Page 25: Causal-Association Network Extraction

Polarity Classification

Bayesian classifier determines polarity◦Classifiy polarity as increasing or decreasing

“Any fuel price hike leads to consumer inflation.” “Rising food prices makes inflation control difficult.”

Neutral Polarity: ◦“Earthquakes affect the earth”

Only 12/500 sentences were neutral

POLARITY CLASSIFICATION

COMPUTE CO-OCCURRENCE OF TUPLES

CAUSAL NETWORK VIZUALIZATION

Page 26: Causal-Association Network Extraction

Bayesian Polarity Classifier

Features:1. Bag of words2. Decreasing words marked with “_dec” tag and

original word is also kept3. Causal verbs are marked with “_verb” tag and

original word is also kept4. Words plus stems of words are added

◦ Porter stemming algorithm

POLARITY CLASSIFICATION

COMPUTE CO-OCCURRENCE OF TUPLES

CAUSAL NETWORK VIZUALIZATION

Page 27: Causal-Association Network Extraction

Bayesian Polarity Classifier Results

Precision: 72.1%Recall: 80.2%F-Score: 75.97%

Results from 5-fold cross-validation on 500 annotated sentences

Possible features: CIDVSPW

POLARITY CLASSIFICATION

COMPUTE CO-OCCURRENCE OF TUPLES

CAUSAL NETWORK VIZUALIZATION

PrecisionP RecallP F-ScoreP

IDVS 72.131 80.243 75.971IDVSP 72.252 79.939 75.901IDVSPW 72.252 79.939 75.901

Page 28: Causal-Association Network Extraction

Compute Co-Occurrence of Tuples

Determine net effect of a term on another term based on frequency◦Strength of connection is based on net co-occurrence

frequency

POLARITY CLASSIFICATION

COMPUTE CO-OCCURRENCE OF TUPLES

CAUSAL NETWORK VIZUALIZATION

Page 29: Causal-Association Network Extraction

Compute Co-Occurrence of Tuples

Example:◦<fuel prices, causes, inflation> Polarity = p◦<fuel prices, makes, inflation> Polarity = p◦<fuel prices, determines, inflation> Polarity = n◦<inflation, has, fuel prices> Polarity = n

Net strength = 3 – 1 = +2 for fuel prices causing inflation

Net polarity = 2 – 1 = +1 positive

POLARITY CLASSIFICATION

COMPUTE CO-OCCURRENCE OF TUPLES

CAUSAL NETWORK VIZUALIZATION

Page 30: Causal-Association Network Extraction

Visualization Demo POLARITY CLASSIFICATION

COMPUTE CO-OCCURRENCE OF TUPLES

CAUSAL NETWORK VIZUALIZATION

Page 31: Causal-Association Network Extraction

Conclusions

Our method:◦Automated extraction of causal-association networks

Directed graph of causal relations with polarity

New contributions◦Filter out non-causal sentences◦Extraction of new terms based on tuple co-occurrence◦Polarity information

Page 32: Causal-Association Network Extraction

Future Work

Make the system run in real-time by converting it to a multi-agent system where different agents perform different tasks

Improve classification and extraction results

Use causal-association network to create a computational model of a complex-adaptive system

Combine theoretical work on causality to extract implicit relations

Page 33: Causal-Association Network Extraction

Special Thanks

島田さん 河合さん

鈴木さん 山田博士

皆さん

Page 34: Causal-Association Network Extraction

因果関係?

?

?

?

?

?

?

?

?

?

?

?

Page 35: Causal-Association Network Extraction

Extracurricular Activities

Page 36: Causal-Association Network Extraction

Extracurricular Activities