Download - Graph-based WSD の続き
![Page 1: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/1.jpg)
Graph-based WSD の続き
DMLA 2008-12-10
23/04/21
小町守 .
![Page 2: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/2.jpg)
04/21/232
Word sense disambiguation task of Senseval-3 English Lexical Sample
Predict the sense of “bank”
… the financial benefits of the bank (finance) 's employee package ( cheap mortgages and pensions, etc ) , bring this up to …
In that same year I was posted to South Shields on the south bank (bank of the river) of the River Tyne and quickly became aware that I had an enormous burden
Possibly aligned to water a sort of bank(???) by a rushing river.
Training instances are annotated with their sense
Predict the sense of target word in the test set
![Page 3: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/3.jpg)
WSD with adjacency matrix Assumption
Similar examples tend to have the same label Can define (dis-)similarity between examples
Prior knowledge, kNN
Idea Perform clustering on an adjacency matrix
3
![Page 4: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/4.jpg)
Intuition behind using similarity graph Can propagate known labels to unlabeled data
without any overlapping
(Pictures taken from Zhu 2007)
4
![Page 5: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/5.jpg)
Using unlabeled data by similarity graph
5
![Page 6: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/6.jpg)
Pros and cons• Pros
– Mathematically well-founded– Can achieve high performance if the graph is well-
constructed• Cons
– Hard to determine appropriate graph structure (and its edges’ weight)
– Relatively large computational complexity– Mostly transductive
• Transductive learning: (unlabeled) test instances are given when building classification model
• Inductive: test instances are not known during training
6
![Page 7: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/7.jpg)
04/21/237
Word sense disambiguation by kNN
Seed instance = the instance to predict its sense
System output = k-nearest neighbor (k=3)
Seed instance
![Page 8: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/8.jpg)
04/21/238
Simplified Espresso is HITS
Simplified Espresso = HITS in a bipartite graph whose
adjacency matrix is A
Problem
No matter which seed you start with, the same instance is always ranked topmostSemantic drift (also called topic drift in HITS)
The ranking vector i tends to the principal eigenvector of ATA as the iteration proceedsregardless of the seed instances!
![Page 9: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/9.jpg)
04/21/239
Convergence process of EspressoHeuristics in Espresso helps reducing semantic
drift(However, early stopping is required for optimal
performance)
Output the most frequent sense regardless of input
Original Espresso
Simplified Espresso
Most frequent sense (baseline)
Semantic drift occurs (always outputs the most
frequent sense)
![Page 10: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/10.jpg)
Learning curve of Original Espresso:per-sense breakdown
04/21/2310
# of most frequent sense predictions increases
Recall for infrequent senses worsens even with original
Espresso
Most frequent sense
Other senses
![Page 11: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/11.jpg)
Q. What caused drift in Espresso?A. Espresso's resemblance to HITS
HITS is an importance computation method(gives a single ranking list for any seeds)
Why not use a method for another type of link analysis measure - which takes seeds into account?"relatedness" measure
(it gives different rankings for different seeds)
04/21/2311
![Page 12: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/12.jpg)
04/21/2312
The regularized Laplacian kernel A relatedness measure Takes higher-order relations into account Has only one parameter
€
L = D− AGraph Laplacian
R n ( L)n (I L) 1n0
Regularized Laplacian matrix
A: adjacency matrix of the graphD: (diagonal) degree matrix
β: parameterEach column of Rβ gives the rankings relative to a node
![Page 13: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/13.jpg)
algorithm F measur
e
Most frequent sense (baseline)
54.5
HyperLex 64.6
PageRank 64.6
Simplified Espresso 44.1
Espresso (after convergence) 46.9
Espresso (optimal stopping) 66.5
Regularized Laplacian (β=10-
2)67.104/21/2313
WSD on all nouns in Senseval-3
Outperforms other graph-based methods
Espresso needs optimal stopping to achieve an equivalent performance
![Page 14: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/14.jpg)
More experiments on WSD dataset Niu et al. “Word Sense Disambiguation using
LP-based Semi-Supervised Learning” (ACL-2005)
Pham et al. “Word Sense Disambiguation with Semi-Supervised Learning” (AAAI-2005)
04/21/2314
![Page 15: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/15.jpg)
Dataset Pedersen (2000) line, interest data
Line: six senses = 線 , 生産物 , … Interest: four senses = 利息 , 関心 , …
Features Bag-of-words feature Local collocation feature Parts-of-speech feature
04/21/2315
![Page 16: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/16.jpg)
Result
04/21/2316
MFS Niu et al. Pham et al.
BB proposed
interest 54.6% 79.8% 76.4% 75.5% 75.6%
line 53.5% 59.4% 68.0% 62.7% 61.3%
S3LS (1%) 54.5% 30.8% 42.1%
S3LS (10%)
54.5% 56.5% 56.0%
S3LS (25%)
54.5% 64.9% 63.2%
S3LS (50%)
54.5% 68.6% 66.3%
S3LS (75%)
54.5% 70.3% 68.8%
S3LS (100%)
54.5% 71.8% 69.8%
![Page 17: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/17.jpg)
Discussion Proposed method (simple k-NN) achieved
comparable performance to previous semi-supervised WSD systems
Does additional data help?
04/21/2317
![Page 18: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/18.jpg)
“line” data with 90 labeled instances
04/21/2318
![Page 19: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/19.jpg)
“line” data with 150 labeled instances
04/21/2319
![Page 20: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/20.jpg)
“interest” data with 60 labeled instances
04/21/2320
![Page 21: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/21.jpg)
“interest” data with 300 labeled instances
04/21/2321
![Page 22: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/22.jpg)
Discussion (cont.) Additional data doesn’t always help
Sometimes gets worse than nothing!
Haven’t succeeded to use large-scale data on this task (BNC data can be used)
All system suffers from data sparseness problem Needs robust feature selection (smoothing)
04/21/2322
![Page 23: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/23.jpg)
Multiple clusters in similarity graphs
23
€
P(ii, p j ) = P(z)P(ii | z)P(p j | z)z=1
N
∑Generative model of co-occurrence
![Page 24: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/24.jpg)
Construction of similarity matrix Let Gz be a hidden topic graph
The edge between ii and ij has weight P(z|ii,pj)
Adjacency graph Az = A(Gz) is a graph whose (i,j)-th element holds P(z|ii,pj) and all the other element are set 0
A similarity matrix is computed by AzTAz
The (i,j)-th element holds the co-occurrence value between instance ii and ij with respect to topic z
04/21/2324
![Page 25: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/25.jpg)
Combination of von Neumann kernels The von Neumann kernel matrix is defined as
follows:
Final kernel matrix is computed by summing the kernel matrices of all hidden topic
04/21/2325
€
Kβ = R(I −βR)−1 = R β nRn
n=0
∞
∑
R = ATA
€
Mβ = Kβz=1
N
∑
![Page 26: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/26.jpg)
Result
04/21/2326
MFS Niu et al. K-NN pLSI
S3LS 54.5% 71.8% 69.8% 51.7%
![Page 27: Graph-based WSD の続き](https://reader036.vdocuments.pub/reader036/viewer/2022062519/56814e64550346895dbc0158/html5/thumbnails/27.jpg)
Discussion Poor result on proposed method
Likely to be caused by mis-implimentation or a bug
The number of clusters (hidden variable: z) does not seem to strongly affect the performance (tested |z| = 5, 20. Got 3 points improvement on increasing |z| to 20, but still below most frequent sense baseline)
04/21/2327