japanese dependency analysis using cascaded chunking
Post on 22-Feb-2016
45 Views
Preview:
DESCRIPTION
TRANSCRIPT
Japanese Dependency Analysis using Cascaded Chunking
Taku Kudo 工藤 拓
Yuji Matsumoto 松本 裕治
Nara Institute Science and Technology, JAPAN
Motivation Kudo, Matsumoto 2000 (VLC)
Presented a state-of-the-art Japanese dependency parser using SVMs (89.09% for standard dataset)
Could show the high generalization performance and feature selection abilities of SVMs
Problems Not scalable
• 2 weeks training using 7,958 sentences• Hard to train with larger data
Slow in Parsing • 2 ~ 3 sec./sentence• Too slow to use it for actual NL applications
Goal Improve the scalability and the
parsing efficiency without loosing accuracy !
How? Apply Cascaded Chunking model to
dependency parsing and the selection of training examples
Reduce the number of times SVMs are consulted in parsing
Reduce the number of negative examples learned
Outline Japanese dependency analysis Two models
Probabilistic model (previous) Cascaded Chunking model (new!)
Features used for training and classification
Experiments and results Conclusion and future work
Japanese Dependency Analysis (1/2) Analysis of relationship between
phrasal units called bunsetsu (segments), base phrases in English
Two Constraints Each segment modifies one of the
right-side segments (Japanese is head final language)
Dependencies do not cross each other
Japanese Dependency Analysis (2/2)
Morphological analysis andBunsetsu identification
私は / 彼女と / 京都に / 行きます I with her to Kyoto-loc go
私は彼女と京都に行きます I go to Kyoto with her.
Raw text
私は / 彼女と / 京都に / 行きます
Dependency Analysis
Probabilistic Model
私は 1 / 彼女と 2 / 京都に 3 / 行きます 4
I-top / with her / to Kyoto-loc / go
Input
1.030.80.220.70.20.11
432
Dependency Matrix
ModifieeM
odifier
1. Build a Dependency Matrix ME, DT or SVMs (How probable one segment modifies another)
2. Search the optimal dependencies which maximize the sentence probabilities using CYK or Chart
Output
私は 1 / 彼女と 2 / 京都に 3 / 行きます 4
Problems of Probabilistic model(1/2)
Selection of training examples: All candidates of two segments which have Dependency relation→ positive No dependency relation→ negative
This straightforward way of selection requires a total (where n is # of segments in a sentence) training examples per sentence
Difficult to combine probabilistic model with SVMs which require polynomial computational cost
2/)1( nn
Problems of Probabilistic model(2/2)
parsing time is necessary with CYK or Chart
Even if beam-search is applied, parsing time is always necessary
The classification cost of SVMs is much more expensive than other ML algorithms such as ME and DT
)( 3nO
)( 2nO
Cascaded Chunking Model English parsing [Abney 1991] Parses a sentence deterministically
only deciding whether the current segment modifies the segment on its immediate right hand side
Training examples are extracted using this algorithm itself
Example: Training Phase
彼は 1 彼女の 2 温かい 3 真心に 4 感動した。 5
He her warm heart be moved (He was moved by her warm heart.)
Annotated sentence
SVMsTrainingData
Pairs of tag (D or O) and context(features) are stored as training data for SVMs
Tag is decided by annotated corpus
彼は 1 彼女の 2 温かい 3 真心に 4 感動した。 5
O O D D O
? ? ? ?
彼は 1 彼女の 2 真心に 4 感動した。 5
彼は 1 彼女の 2 真心に 4 感動した。 5
O D D O
? ? ?
彼は 1 真心に 4 感動した。 5
? ?
彼は 1 真心に 4 感動した。 5
O D O 彼は 1 感動した。 5
彼は 1 感動した。 5
D O
?
感動した。 5
finish彼は 1 彼女の 2 温かい 3 真心に 4 感動した。 5
Example: Test Phase
彼は 1 彼女の 2 温かい 3 真心に 4 感動した。 5
He her warm heart be moved (He was moved by her warm heart.)
Test sentence
SVMsTag is decided by SVMs built in training phase
彼は 1 彼女の 2 温かい 3 真心に 4 感動した。 5
O O D D O
? ? ? ?
彼は 1 彼女の 2 真心に 4 感動した。 5
彼は 1 彼女の 2 真心に 4 感動した。 5
O D D O
? ? ?
彼は 1 真心に 4 感動した。 5
? ?
彼は 1 真心に 4 感動した。 5
O D O 彼は 1 感動した。 5
彼は 1 感動した。 5
D O
?
感動した。 5
finish彼は 1 彼女の 2 温かい 3 真心に 4 感動した。 5
Advantages of Cascaded Chunking model
Simple and Efficient Prob.: v.s. cascaded chunking: Lower than since most of segments
modify segment on its immediate right-hand-side
Training examples is much smaller Independent from ML algorithm
Can be combined with any ML algorithms which work as a binary classifier
Probabilities of dependency are not necessary
)( 3nO )( 2nO)( 2nO
Features
彼の 1 友人は 2 この本を 3 持っている 4 女性を 5 探している 6 His friend-top this book-acc have lady-acc be looking for
modifier modifiee
Static Features modifier/modifiee
Head/Functional Word: (surface,POS,POS-subcategory,inflection- type,inflection-form), brackets, quotations, punctuations, position
Between segments: distance, case-particles, brackets, quotations, punctuations
Dynamic Features [Kudo, Matsumoto 2000] A,B : Static features of Functional word C: Static features of Head word
B A CModify or not?
His friend is looking for a lady who has this book.
Experimental Setting Kyoto University Corpus 2.0/3.0
Standard Data Set• Training: 7,958 sentences / Test: 1,246 sentences• Same data as [Uchimoto et al. 98, Kudo, Matsumoto 00]
Large Data Set• 2-fold Cross-Validation using all 38,383 sentences
Kernel Function: 3rd polynomial Evaluation method
Dependency accuracy Sentence accuracy
ResultsData Set Standard LargeModel Cascaded
ChunkingProbabilistic Cascaded
ChunkingProbabilistic
Dependency Acc. (%)
89.29 89.09 90.04 N/A
Sentence Acc. (%)
47.53 46.17 53.16 N/A
# of training sentences
7,956 7,956 19,191 19,191
# of training examples
110,355 459,105 251,254 1,074,316
Training time (hours)
8 336 48 N/A
Parsing time (sec./sent.)
0.5 2.1 0.7 N/A
Effect of Dynamic Features(1/2)
Effect of Dynamic Features (2/2)
Deleted type of dynamic features
Difference from the model with all dynamic features Dependency Acc. Sentence Acc.
A -0.28 % -0.89 %
B -0.10% -0.89 %
C -0.28 % -0.56 %
AB -0.33 % -1.21 %
AC -0.55 % -0.97 %
BC -0.54 % -1.61 %
ABC -0.58 % -2.34 %
彼の 1 友人は 2 この本を 3 持っている 4 女性を 5 探している 6 His Friend-top this book-acc have lady-acc be looking for
modifier modifiee
B A CModify or not?
Probabilistic v.s. Cascaded Chunking (1/2)
彼は 1 この本を 2 持っている 3 女性を 4 探している 5He-top this book-acc have lady-acc be looking for
modifier modifiee (He is looking for a lady who has this book.)
Positive: この本を 2 → 持っている 3
Negative: この本を 2 → 探している 5
Probabilistic models commit a number of unnecessary examples
unnecessary
Probabilistic Model uses all candidates of dependency relation as training data
Probabilistic v.s. Cascaded Chunking (2/2)
Probabilistic Cascaded ChunkingStrategy Maximize
sentence probability
Shift-ReduceDeterministic
Merit Can see all candidates of dependency
Simple, efficient and scalableAccurate as Prob. model
Demerit Not efficient,Commit to unnecessary training examples
Cannot see the all (posterior) candidates of dependency
Conclusion
A new Japanese dependency parser using a cascaded chunking model
It outperforms the previous probabilistic model with respect to accuracy, efficiency and scalability
Dynamic features significantly contribute to improve the performance
Future Work Coordinate structure analysis
Coordinate structures frequently appear in Japanese long sentences and make analysis hard
Use posterior context Hard to parse the following sentence
only using cascaded chunking model
僕の 母の ダイヤの 指輪My mother’s diamond ring
Comparison with Related WorkModel Training Corpus (# of
sentences) Acc. (%)
Our Model Cascaded Chunking + SVMs
Kyoto Univ. (19,191) 90.46
Kyoto Univ. (7,956) 89.29
Kudo et al. 00 Prob. + SVMs Kyoto Univ. (7,956) 89.09
Uchimoto et al. 00
Prob. + ME Kyoto Univ. (7,956) 87.93
Kanayama et al. 00
Prob. + ME + HPSG
EDR (192,778) 88.55
Haruno et al. 98 Prob. + DT + Boosting
EDR (50,000) 85.03
Fujio et al. 98 Prob. + ML EDR (190,000) 86.67
Support Vector Machines [Vapnik]
1iy
1iy
dd
d
Maximize the margin d
d
||||2
|||||1|
|||||1|
wwxw
wxw
bbd ii
Min. :s.t. :
2/||||)( 2ww L1])[( by ii xw
0 bxw
1 bxw1 bxw
Soft Margin Kernel Function
top related