chapter 11 structural svm
TRANSCRIPT
-
SVMChapter11.Structural SVM
Waseda Univ. Hamada Lab. Taikai Takeda
Twitter: @bigsea_t
-
Structural SVM (SSVM)SVM
SVM
NLP[Yue et al. 2007] Bioinformatics[Yu et al. 2006]
(cutting plane training)
-
[John Yu et al. 2009]SSVM http://www.cs.cornell.edu/~cnyu/latentssvm/ written in C Cornell Univ.Prof. Thorsten http://www.cs.cornell.edu/People/tj/
This is an implementation of latent structural SVM accompanying the ICML '09 paper "Learning Latent Structural SVMs with Latent Variables". It was developed under Linux and compiles under gcc, built upon the SVM^light software by Thorsten Joachims. There are two versions available. The standalone version using the SVM^light QP solver is available below. Another version using the Mosek quadratic program solver is also available. It has been developed and tested for a longer period of time but requires the separate installation of the solver.
-
Formulate SSVMSSVM
SVMxy yy
-
Formulate SSVMNotationsDecision function :
, = +(,) : space of input , : feature vector 0: parameter vector
Classifier : , = argmax
9(,)
: space of (structural) output
-
Formulate SSVMHard-margin problemConstraint
Max-Margin
-
Formulate SSVMSoft-Margin Problem
-
Formulate SSVMLagrange Function
Dual Problem(;9)(
-
Formulate SSVMKernel Function
-
Optimize SSVM
cutting plane training
[Joachims et al. 2009]
-
1-Slack Formulation1-slack OP
N-slack OP (previous one)
1-slack OP and N-slack OP are equivalent
-
1-Slack FormulationTheorem1. Any solution of 1-slack OP is also a solution of N-slack OP (and vice versa), with = ?? . (prove later)
Proof sketch. optimal n-slack
optimal 1-slack
Therefore, the objective functions are equal for any
-
1-Slack FormulationDual ProblemLagrange
Dual Problem
-
Cutting Plane Training M(J)
[Joachims et al. 2009]
-
Cutting Plane TrainingAlgorithm
-
Cutting Plane Training
; = ;M = 0
-
Loss functions SSVMhinge loss
For example, in natural language parsing, a parse tree that is almost correct and differs from the correct parse in only one or a few nodes should be treated differently from a parse tree that is completely different. [Tsochantaridis et al. 2005]
margin-rescaling, slack-rescalingloss function
[Tsochantaridis et al. 2005]
-
Loss functionsn-slack formulationMargin rescaling
Slack rescaling
-
Loss functionsn-slack formulationMargin rescaling
Slack rescaling
-
Application: learning to rank IRInformation Retrieval queryranking
relevant documents
-
Application: learning to rankEvaluation MeasureAverage Precision(AP)
Loss Function
-
Application: learning to rankNotations
= Q, , |T|: :
:
;< = _1 ; >