causal-association network extraction
DESCRIPTION
Causal-Association Network Extraction. 平成弐拾年 クリスマス. Brett Bojduj ボイドイ・ブレツ. Automatically create Causal-Association Network from unstructured text data Method for filtering out non-causal sentences Method fo r determining polarity of causal relation. Main Contributions. - PowerPoint PPT PresentationTRANSCRIPT
Causal-Association Network Extraction
Brett Bojdujボイドイ・ブレツ
平成弐拾年クリスマス
Main Contributions
Automatically create Causal-Association Network from unstructured text data
Method for filtering out non-causal sentences
Method for determining polarity of causal relation
Causal-Association Network
Graph of domain terms◦Directed◦Polarity
Positive, negative, or neutral
Purpose is to aid decision-making◦Tools such as the “simulation” mode, help to promote
creative interaction
Tuple Extraction
Query (Yahoo!)
Extract Sentence
s
Filter Sentence
s
Extract Tuples
Score Terms in Tuples
Generate New
Queries
Query (Yahoo!)
Query (Yahoo!)
Extract Sentence
s
Filter Sentence
s
Extract Tuples
Score Terms in Tuples
Generate New
Queries
Term DB VerbDB
Citigroup causes
Extract Sentences
Query (Yahoo!)
Extract Sentence
s
Filter Sentence
s
Extract Tuples
Score Terms in Tuples
Generate New
Queries
Citigroup causes
Citigroup causes depression.
Term DB VerbDB
Citigroup supports causes.
Filter Sentences
Query (Yahoo!)
Extract Sentence
s
Filter Sentence
s
Extract Tuples
Score Terms in Tuples
Generate New
QueriesCitigroup causes depression.Citigroup supports causes.
Bayesian Filter
Citigroup causes depression.
Extract Tuples
Query (Yahoo!)
Extract Sentence
s
Filter Sentence
s
Extract Tuples
Score Terms in Tuples
Generate New
QueriesSentence Parser
<citigroup, causes, depression>
Citigroup causes depression.
Score Terms in Tuples
Query (Yahoo!)
Extract Sentence
s
Filter Sentence
s
Extract Tuples
Score Terms in Tuples
Generate New
Queries Term Frequency Score
Citigroup 10 101
Depression 3 10
Picture 1 2
Generate New Queries
Query (Yahoo!)
Extract Sentence
s
Filter Sentence
s
Extract Tuples
Score Terms in Tuples
Generate New
Queries
Term Frequency Score
Citigroup 10 101
Depression 3 10
Picture 1 2
Term DB
Query (Yahoo!)
Queries generated from terms and verbs◦Terms: “bankruptcy,” “oil prices,” “recession”◦Causal verbs: troponyms of verb “cause” from WordNet
with pluralizations (94 verbs)
Query structure: ◦TERM * VERB◦VERB * TERM◦e.g. “oil prices * cause”
Query
(Yahoo!) Extr
act Sentence
s
Filter Senten
ces
Extract Tuples
Score
Terms in Tuples
Generate New Queries
Query (Yahoo!) – Verb List
Query
(Yahoo!) Extr
act Sentence
s
Filter Senten
ces
Extract Tuples
Score
Terms in Tuples
Generate New Queries
Extract Sentences
E.g.: “Citigroup causes a global economic crisis.”
Sentence has TERM + VERB Save Sentence to DB
Query
(Yahoo!) Extr
act Sentence
s
Filter Senten
ces
Extract Tuples
Score
Terms in Tuples
Generate New Queries
Filter Sentences
Many errors in cause and effect extraction are caused by trying to extract from sentences that do not contain a causal relation.◦E.g. “Scientists predict that effects of global warming
will take many decades.”
Our remedy to this:
Bayesian Classifier
Binary Classify as Causal or
Not
Only Process Causal
SentencesQuer
y (Yahoo!) Extr
act Sentence
s
Filter Senten
ces
Extract Tuples
Score
Terms in Tuples
Generate New Queries
Bayesian Causal Classifier
Features:1. Bag of words without common words2. Decreasing words marked with “_dec” tag and
original word is also kept3. Causal verbs are marked with “_verb” tag and
original word is also kept4. Verb patterns plus phrase “verbPatt”5. Word Patterns plus the phrase “wordPatt”
Query
(Yahoo!) Extr
act Sentence
s
Filter Senten
ces
Extract Tuples
Score
Terms in Tuples
Generate New Queries
Bayesian Causal Classifier Results
Precision: 71.7%Recall: 94.3%F-Score: 81.5%
Results from 15-fold cross-validation on 1,500 annotated sentences
Possible features: CIDVSPW
Query
(Yahoo!) Extr
act Sentence
s
Filter Senten
ces
Extract Tuples
Score
Terms in Tuples
Generate New Queries
PrecisionYes
RecallYes F-ScoreYes PrecisionNo RecallNo F-ScoreNo
Baseline
65.38461538
59.13043478
62.10045662
82.65682657
86.15384615
84.36911488
C62.580645
1663.260869
5762.918918
9283.671497
5883.269230
7783.469879
52
CD62.420382
1763.913043
4863.157894
7483.867832
8582.980769
2383.421942
97
CDP69.178082
1987.826086
9677.394636
0293.886462
8882.692307
6987.934560
33
CDPW70.406504
0794.130434
7880.558139
5396.949152
5482.5
89.14285714
CDS61.616161
6266.304347
8363.874345
5584.577114
4381.730769
2383.129584
35
CDSP67.715231
7988.913043
4876.879699
2594.308035
7181.25
87.29338843
CDSPW68.779714
7494.347826
0979.560036
6697.008055
2481.057692
3188.318491
36
CDSW68.241042
3591.086956
5278.026070
7695.372460
581.25
87.74662513
CDV63.557483
7363.695652
1763.626492
9483.926852
7483.846153
8583.886483
89
CDVP70.383275
2687.826086
9678.143133
4693.952483
883.653846
1588.504577
82
CDVPW71.735537
1994.347826
0981.502347
4297.094972
0783.557692
3189.819121
45
CDVS62.448132
7865.434782
6163.906581
7484.381139
4982.596153
8583.479105
93
CDVSP68.760611
2188.043478
2677.216396
5793.962678
3882.307692
3187.749871
86CDVSPW
70.01620746
93.91304348
80.22284123
96.82899207
82.21153846
88.92355694
CDVSW69.230769
2390
78.26086957
94.90022173
82.30769231
88.15653965
CDVW70.890410
9690
79.31034483
94.97816594
83.65384615
88.95705521
CDW69.849246
2390.652173
9178.902554
495.238095
2482.692307
6988.522902
73
CI62.552301
2665
63.75266525
84.24657534
82.78846154
83.51115422
CID62.318840
5865.434782
6163.838812
384.365781
7182.5
83.42245989
CIDP68.644067
888.043478
2677.142857
1493.956043
9682.211538
4687.692307
69
CIDPW69.887278
5894.347826
0980.296022
297.042093
2982.019230
7788.900468
99
CIDS61.309523
8167.173913
0464.107883
8284.839357
4381.25
83.00589391
CIDSP67.213114
7589.130434
7876.635514
0294.382022
4780.769230
7787.046632
12
CIDSPW68.238993
7194.347826
0979.197080
2996.990740
7480.576923
0888.025210
08
CIDSW67.741935
4891.304347
8377.777777
7895.454545
4580.769230
7787.5
CIDV63.481953
2965
64.23200859
84.3537415
83.46153846
83.90526825
CIDVP69.948186
5388.043478
2677.959576
5294.028230
1883.269230
7788.322284
55
CIDVPW71.311475
4194.565217
3981.308411
2197.191011
2483.173076
9289.637305
7
CIDVS62.448979
5966.521739
1364.421052
6384.752475
2582.307692
3183.512195
12
CIDVSP68.518518
5288.478260
8777.229601
5294.150110
3882.019230
7787.667009
25CIDVSPW
69.72624799
94.13043478
80.11100833
96.92832765
81.92307692
88.79624805
CIDVSW68.988391
3890.434782
6178.269049
8695.094760
3182.019230
7788.074341
77
CIDVW70.508474
5890.434782
6179.238095
2495.164835
1683.269230
7788.820512
82
CIDW69.370860
9391.086956
5278.759398
595.424107
1482.211538
4688.326446
28
CIP68.760611
2188.043478
2677.216396
5793.962678
3882.307692
3187.749871
86
CIPW70.113085
6294.347826
0980.444856
3597.048808
1782.211538
4689.016137
43
CIS61.477045
9166.956521
7464.099895
9484.784784
7881.442307
6983.079941
15
CISP67.323481
1289.130434
7876.707202
9994.388327
7280.865384
6287.105126
88
CISPW68.346456
6994.347826
0979.269406
3996.994219
6580.673076
9288.083989
5
CISW67.857142
8690.869565
2277.695167
2995.248868
7880.961538
4687.525987
53
CIV63.225806
4563.913043
4863.567567
5783.961352
6683.557692
3183.759036
14
CIVP70.017331
0287.826086
9677.917068
4793.932827
7483.365384
6288.334182
37
CIVPW71.264367
8294.347826
0981.197380
7397.081930
4283.173076
9289.590885
55
CIVS62.525879
9265.652173
9164.050901
3884.464110
1382.596153
8583.519688
87
CIVSP68.760611
2188.043478
2677.216396
5793.962678
3882.307692
3187.749871
86
CIVSPW70.016207
4693.913043
4880.222841
2396.828992
0782.211538
4688.923556
94
CIVSW69.230769
2390
78.26086957
94.90022173
82.30769231
88.15653965
CIVW70.477815
789.782608
778.967495
2294.857768
0583.365384
6288.741044
01
CIW69.449081
890.434782
6178.564683
6695.116537
1882.403846
1588.304997
42
CP69.243986
2587.608695
6577.351247
693.790849
6782.788461
5487.946884
58
CPW70.636215
3394.130434
7880.708294
596.956031
5782.692307
6989.257913
86
CS61.585365
8565.869565
2263.655462
1884.424603
1781.826923
0883.105468
75
CSP67.661691
5488.695652
1776.763875
8294.202898
5581.25
87.24832215
CSPW68.730158
7394.130434
7879.449541
2896.896551
7281.057692
3188.272251
31
CSW68.196721
3190.434782
6177.757009
3595.056179
7881.346153
8587.668393
78
CV63.457330
4263.043478
2663.249727
3783.700862
983.942307
6983.821411
43
CVP70.454545
4587.608695
6578.100775
1993.857758
6283.75
88.51626016
CVPW71.735537
1994.347826
0981.502347
4297.094972
0783.557692
3189.819121
45
CVS62.893081
7665.217391
364.034151
5584.359726
382.980769
2383.664566
17
CVSP69.178082
1987.826086
9677.394636
0293.886462
8882.692307
6987.934560
33
CVSPW70.473083
293.913043
4880.521901
2196.843292
82.59615385
89.15412558
CVSW69.696969
790
78.55787476
94.92273731
82.69230769
88.38643371
CVW70.962199
3189.782608
779.270633
494.880174
2983.75
88.96833504
CW69.932432
4390
78.70722433
94.9339207
82.88461538
88.50102669
D65.094339
6260
62.44343891
82.89962825
85.76923077
84.3100189
DP71.701720
8481.521739
1376.297049
8591.299897
6585.769230
7788.448190
38
DPW73.090909
0987.391304
3579.603960
493.894736
8485.769230
7789.648241
21
DS64.508928
5762.826086
9663.656387
6783.745247
1584.711538
4684.225621
41
DSP70.446096
6582.391304
3575.951903
8191.580041
5884.711538
4688.011988
01
DSPW71.607142
8687.173913
0478.627450
9893.723404
2684.711538
4688.989898
99
DSW70.664206
6483.260869
5776.447105
7991.962421
7184.711538
4688.188188
19
DV65.962441
3161.086956
5263.431151
2483.333333
3386.057692
3184.673604
54
DVP72.115384
6281.521739
1376.530612
2491.326530
6186.057692
3188.613861
39
DVPW73.345588
2486.739130
4379.482071
7193.619246
8686.057692
3189.679358
72
DVS64.414414
4162.173913
0463.274336
2883.522727
2784.807692
3184.160305
34
DVSP70.244821
0981.086956
5275.277497
4891.021671
8384.807692
3187.804878
05
DVSPW71.480144
486.086956
5278.106508
8893.234672
384.807692
3188.821752
27
DVSW70.467289
7281.956521
7475.778894
4791.398963
7384.807692
3187.980049
88
DVW72.328244
2782.391304
3577.032520
3391.700819
6786.057692
3188.789682
54
DW72.022684
3182.826086
9677.047522
7591.864057
6785.769230
7788.712083
54
I65.207373
2761.521739
1363.310961
9783.395872
4285.480769
2384.425451
09
ID64.772727
2761.956521
7463.333333
3383.490566
0485.096153
8584.285714
29
IDP70.864661
6581.956521
7476.008064
5291.425619
8385.096153
8588.147410
36
IDPW72.222222
2287.608695
6579.174852
6593.949044
5985.096153
8589.303733
6
IDS64.332603
9463.913043
4864.122137
484.084372
84.32692308
84.20547288
IDSP69.981583
7982.608695
6575.772681
9591.640543
3684.326923
0887.831747
62
IDSPW71.099290
7887.173913
0478.320312
593.696581
284.326923
0888.765182
19
IDSW70.309653
9283.913043
4876.511397
4292.218717
1484.326923
0888.096433
95
IDV66.206896
5562.608695
6564.357541
983.849765
2685.865384
6284.845605
7
IDVP71.892925
4381.739130
4376.500508
6591.402251
7985.865384
6288.547347
55
IDVPW73.076923
0886.739130
4379.324055
6793.605870
0285.865384
6289.568706
12
IDVS64.317180
6263.478260
8763.894967
1883.938814
5384.423076
9284.180249
28
IDVSP69.888475
8481.739130
4375.350701
491.268191
2784.423076
9287.712287
71
IDVSPW70.967741
9486.086956
5277.799607
0793.205944
884.423076
9288.597376
39
IDVSW70.055452
8782.391304
3575.724275
7291.553701
7784.423076
9287.843921
96
IDVW72.159090
9182.826086
9677.125506
0791.872427
9885.865384
6288.767395
63
IDW71.296296
383.695652
1777 92.1875
85.09615385
88.5
IP71.401515
1581.956521
7476.315789
4791.460905
3585.480769
2388.369781
31
IPW72.595281
3186.956521
7479.129574
6893.677555
3285.480769
2389.391654
1
IS64.601769
9163.478260
8764.035087
7283.969465
6584.615384
6284.291187
74
ISP70.370370
3782.608695
6576
91.66666667
84.61538462
88
ISPW71.377459
7586.739130
4378.312070
6693.517534
5484.615384
6288.844018
17
ISW70.479704
883.043478
2676.247504
9991.858037
5884.615384
6288.088088
09
IV66.433566
4361.956521
7464.116985
3883.660130
7286.153846
1584.888678
35
IVP72.200772
281.304347
8376.482617
5991.242362
5386.153846
1588.625123
64
IVPW73.333333
3386.086956
5279.2
93.33333333
86.15384615
89.6
IVS64.653243
8562.826086
9663.726571
1183.760683
7684.807692
3184.280936
45
IVSP70.356472
881.521739
1375.528700
9191.209927
6184.807692
3187.892376
68
IVSPW71.428571
4385.869565
2277.986179
6693.136219
6484.807692
3188.777050
83
IVSW70.411985
0281.739130
4375.653923
5491.304347
8384.807692
3187.936191
43
IVW72.307692
3181.739130
4376.734693
8891.428571
4386.153846
1588.712871
29
IW71.563088
5182.608695
6576.690211
9191.744066
0585.480769
2388.501742
16
P72.200772
281.304347
8376.482617
5991.242362
5386.153846
1588.625123
64
PW73.480662
9886.739130
4379.561316
0593.625914
3286.153846
1589.734601
9
S64.705882
3562.173913
0463.414634
1583.553875
2485
84.27073403
SP70.786516
8582.173913
0476.056338
0391.511387
1685
88.13559322
SPW71.841155
2386.521739
1378.500986
1993.446088
7985
89.02316213
SW70.841121
582.391304
3576.180904
5291.606217
6285
88.17955112
V65.947242
2159.782608
762.713797
0482.917820
8786.346153
8584.597268
02
VP72.211350
2980.217391
376.004119
4690.798786
6586.346153
8588.516510
6
VPW73.556797
0285.869565
2279.237713
1493.250259
6186.346153
8589.665501
75
VS64.678899
0861.304347
8362.946428
5783.270676
6985.192307
6984.220532
32
VSP70.610687
0280.434782
6175.203252
0390.778688
5285.192307
6987.896825
4
VSPW71.897810
2285.652173
9178.174603
1793.067226
8985.192307
6988.955823
29
VSW70.833333
3381.304347
8375.708502
0291.152263
3785.192307
6988.071570
58
VW72.480620
1681.304347
8376.639344
2691.260162
686.346153
8588.735177
87
W72.360844
5381.956521
7476.860346
5991.521961
1886.153846
1588.756810
3
PrecisionYes RecallYes F-ScoreYesCDVPW 71.735 94.347 81.502CVPW 71.735 94.347 81.502CIDVPW
71.311 94.565 81.308
CIVPW 71.264 94.347 81.197CPW 70.636 94.130 80.708
Extract Tuples
ROOT
ADJ NP VP
NP
Input = Sentence
Stanford Parser
Search Parse Tree
Extract
Parse Tree:
Query
(Yahoo!) Extr
act Sentence
s
Filter Senten
ces
Extract Tuples
Score
Terms in Tuples
Generate New Queries
Tuple Format
Query
(Yahoo!) Extr
act Sentence
s
Filter Senten
ces
Extract Tuples
Score
Terms in Tuples
Generate New Queries
Tuple
CauseVerb
Effect
Extraction Strategy
Strategy
Find Terms
in Cause
or Effect
Build n-
grams where
no Term
Query
(Yahoo!) Extr
act Sentence
s
Filter Senten
ces
Extract Tuples
Score
Terms in Tuples
Generate New Queries
Extraction Strategy
Sentence: “Citigroup Corp. determines common stock price.”
Cause:◦citigroup
Effects:◦common◦common stock◦common stock price◦stock◦stock price◦price
Query
(Yahoo!) Extr
act Sentence
s
Filter Senten
ces
Extract Tuples
Score
Terms in Tuples
Generate New Queries
Term DB
Extraction Strategy Cont…
Sentence: “Citigroup Corp. determines common stock price.”
Cause:◦citigroup
Effect:◦stock price
Query
(Yahoo!) Extr
act Sentence
s
Filter Senten
ces
Extract Tuples
Score
Terms in Tuples
Generate New Queries
Term DB
Score Terms in Tuples
Storing n-grams stores lots of bad tuples in addition to the correct tuples
Our remedy to this: Score each record◦Frequency2 + NumWords
This works fairly well. For future work could consider TF/IDF and/or term clustering
Example of scores:◦ Inflation = 21,825◦War = 18,734◦Central banks = 120◦Framing = 3
Query
(Yahoo!) Extr
act Sentence
s
Filter Senten
ces
Extract Tuples
Score
Terms in Tuples
Generate New Queries
Cause and Effect Extraction Results
Accuracy: 86%◦Random Sample: 50◦Correct: 43
Larger sample size is in progress
Network Generation
POLARITY CLASSIFICATION
COMPUTE CO-OCCURRENCE OF TUPLES
CAUSAL NETWORK VIZUALIZATION
Polarity Classification
Bayesian classifier determines polarity◦Classifiy polarity as increasing or decreasing
“Any fuel price hike leads to consumer inflation.” “Rising food prices makes inflation control difficult.”
Neutral Polarity: ◦“Earthquakes affect the earth”
Only 12/500 sentences were neutral
POLARITY CLASSIFICATION
COMPUTE CO-OCCURRENCE OF TUPLES
CAUSAL NETWORK VIZUALIZATION
Bayesian Polarity Classifier
Features:1. Bag of words2. Decreasing words marked with “_dec” tag and
original word is also kept3. Causal verbs are marked with “_verb” tag and
original word is also kept4. Words plus stems of words are added
◦ Porter stemming algorithm
POLARITY CLASSIFICATION
COMPUTE CO-OCCURRENCE OF TUPLES
CAUSAL NETWORK VIZUALIZATION
Bayesian Polarity Classifier Results
Precision: 72.1%Recall: 80.2%F-Score: 75.97%
Results from 5-fold cross-validation on 500 annotated sentences
Possible features: CIDVSPW
POLARITY CLASSIFICATION
COMPUTE CO-OCCURRENCE OF TUPLES
CAUSAL NETWORK VIZUALIZATION
PrecisionP RecallP F-ScoreP
IDVS 72.131 80.243 75.971IDVSP 72.252 79.939 75.901IDVSPW 72.252 79.939 75.901
Compute Co-Occurrence of Tuples
Determine net effect of a term on another term based on frequency◦Strength of connection is based on net co-occurrence
frequency
POLARITY CLASSIFICATION
COMPUTE CO-OCCURRENCE OF TUPLES
CAUSAL NETWORK VIZUALIZATION
Compute Co-Occurrence of Tuples
Example:◦<fuel prices, causes, inflation> Polarity = p◦<fuel prices, makes, inflation> Polarity = p◦<fuel prices, determines, inflation> Polarity = n◦<inflation, has, fuel prices> Polarity = n
Net strength = 3 – 1 = +2 for fuel prices causing inflation
Net polarity = 2 – 1 = +1 positive
POLARITY CLASSIFICATION
COMPUTE CO-OCCURRENCE OF TUPLES
CAUSAL NETWORK VIZUALIZATION
Visualization Demo POLARITY CLASSIFICATION
COMPUTE CO-OCCURRENCE OF TUPLES
CAUSAL NETWORK VIZUALIZATION
Conclusions
Our method:◦Automated extraction of causal-association networks
Directed graph of causal relations with polarity
New contributions◦Filter out non-causal sentences◦Extraction of new terms based on tuple co-occurrence◦Polarity information
Future Work
Make the system run in real-time by converting it to a multi-agent system where different agents perform different tasks
Improve classification and extraction results
Use causal-association network to create a computational model of a complex-adaptive system
Combine theoretical work on causality to extract implicit relations
Special Thanks
島田さん 河合さん
鈴木さん 山田博士
皆さん
因果関係?
?
?
?
?
?
?
?
?
?
?
?
Extracurricular Activities
Extracurricular Activities