harnessing deep neural networks with logic rules

17
Harnessing Deep Neural Networks with Logic Rules Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, and Eric P. Xing ACL2016 読む:東学,瀬翔 1 16/09/12 8回最先端NLP勉強会 スライド中の図,表は [Hu+ 16] から引

Upload: sho-takase

Post on 17-Feb-2017

343 views

Category:

Engineering


5 download

TRANSCRIPT

  • Harnessing Deep Neural Networks with Logic Rules

    Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, and Eric P. Xing

    ACL2016

    116/09/12 8NLP

    [Hu+ 16]

  • A but B B

    B-PERI-ORG

    equal(yi-1, B-PER) equal(yi, I-ORG)

    216/09/12 8NLP

  • p(y|x)CNN q(y|x) q(y|x)

    Posterior regularization [Ganchev+ 10]

    316/09/12 8NLP

  • p(y|x)CNN q(y|x) q(y|x)

    Posterior regularization [Ganchev+ 10]

    4

    q(y|x)

    16/09/12 8NLP

  • q(y|x)

    loss

    5

    min

    1

    N

    NX

    n=1

    (1 )loss(yn,(xn)) + loss(q(y|xn),(xn))

    xn

    16/09/12 8NLP

  • p(y|x)

    weights instead of relying on explicit rule rep-resentations, we can use p for predicting new ex-amples at test time when the rule assessment isexpensive or even unavailable (i.e., the privilegedinformation setting (Lopez-Paz et al., 2016)) whilestill enjoying the benefit of integration. Besides,the second loss term in Eq.(2) can be augmentedwith rich unlabeled data in addition to the labeledexamples, which enables semi-supervised learningfor better absorbing the rule knowledge.

    3.3 Teacher Network ConstructionWe now proceed to construct the teacher networkq(y|x) at each iteration from p(y|x). The itera-tion index t is omitted for clarity. We adapt theposterior regularization principle in our logic con-straint setting. Our formulation ensures a closed-form solution for q and thus avoids any significantincreases in computational overhead.

    Recall the set of FOL rules R = {(Rl,l)}Ll=1.Our goal is to find the optimal q that fits the ruleswhile at the same time staying close to p. For thefirst property, we apply a commonly-used strategythat imposes the rule constraints on q through anexpectation operator. That is, for each rule (indexedby l) and each of its groundings (indexed by g)on (X,Y ), we expect Eq(Y |X)[rlg(X,Y )] = 1,with confidence l. The constraints define a rule-regularized space of all valid distributions. For thesecond property, we measure the closeness betweenq and p with KL-divergence, and wish to minimizeit. Combining the two factors together and furtherallowing slackness for the constraints, we finallyget the following optimization problem:

    min

    q,0KL(q(Y |X)kp(Y |X)) + C

    Xl,gl

    l,gl

    s.t. l(1 Eq[rl,gl(X,Y )]) l,glgl = 1, . . . , Gl, l = 1, . . . , L,

    (3)

    where l,gl 0 is the slack variable for respec-tive logic constraint; and C is the regularizationparameter. The problem can be seen as project-ing p into the constrained subspace. The problemis convex and can be efficiently solved in its dualform with closed-form solutions. We provide thedetailed derivation in the supplementary materialsand directly give the solution here:

    q(Y |X) / p(Y |X) exp

    8