harnessing deep neural networks with logic rules

Harnessing Deep Neural Networks with Logic Rules Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, and Eric P. Xing ACL2016 読む：東学，瀬翔 1 16/09/12 第8回最先端NLP勉強会スライド中の図，表は [Hu+ 16] から引

Upload: sho-takase

Post on 17-Feb-2017

343 views

Category:

Engineering

5 download

Report

Download

Embed Size (px):

TRANSCRIPT

Harnessing Deep Neural Networks with Logic Rules

Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, and Eric P. Xing

ACL2016

116/09/12 8NLP

[Hu+ 16]
A but B B

B-PERI-ORG

equal(yi-1, B-PER) equal(yi, I-ORG)

216/09/12 8NLP
p(y|x)CNN q(y|x) q(y|x)

Posterior regularization [Ganchev+ 10]

316/09/12 8NLP
p(y|x)CNN q(y|x) q(y|x)

Posterior regularization [Ganchev+ 10]

4

q(y|x)

16/09/12 8NLP
q(y|x)

loss

5

min

1

N

NX

n=1

(1 )loss(yn,(xn)) + loss(q(y|xn),(xn))

xn

16/09/12 8NLP
p(y|x)

weights instead of relying on explicit rule rep-resentations, we can use p for predicting new ex-amples at test time when the rule assessment isexpensive or even unavailable (i.e., the privilegedinformation setting (Lopez-Paz et al., 2016)) whilestill enjoying the benefit of integration. Besides,the second loss term in Eq.(2) can be augmentedwith rich unlabeled data in addition to the labeledexamples, which enables semi-supervised learningfor better absorbing the rule knowledge.

3.3 Teacher Network ConstructionWe now proceed to construct the teacher networkq(y|x) at each iteration from p(y|x). The itera-tion index t is omitted for clarity. We adapt theposterior regularization principle in our logic con-straint setting. Our formulation ensures a closed-form solution for q and thus avoids any significantincreases in computational overhead.

Recall the set of FOL rules R = {(Rl,l)}Ll=1.Our goal is to find the optimal q that fits the ruleswhile at the same time staying close to p. For thefirst property, we apply a commonly-used strategythat imposes the rule constraints on q through anexpectation operator. That is, for each rule (indexedby l) and each of its groundings (indexed by g)on (X,Y ), we expect Eq(Y |X)[rlg(X,Y )] = 1,with confidence l. The constraints define a rule-regularized space of all valid distributions. For thesecond property, we measure the closeness betweenq and p with KL-divergence, and wish to minimizeit. Combining the two factors together and furtherallowing slackness for the constraints, we finallyget the following optimization problem:

min

q,0KL(q(Y |X)kp(Y |X)) + C

Xl,gl

l,gl

s.t. l(1 Eq[rl,gl(X,Y )]) l,glgl = 1, . . . , Gl, l = 1, . . . , L,

(3)

where l,gl 0 is the slack variable for respec-tive logic constraint; and C is the regularizationparameter. The problem can be seen as project-ing p into the constrained subspace. The problemis convex and can be efficiently solved in its dualform with closed-form solutions. We provide thedetailed derivation in the supplementary materialsand directly give the solution here:

q(Y |X) / p(Y |X) exp

8