harnessing deep neural networks with logic rules
Post on 17-Feb-2017
343 Views
Preview:
TRANSCRIPT
-
Harnessing Deep Neural Networks with Logic Rules
Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, and Eric P. Xing
ACL2016
116/09/12 8NLP
[Hu+ 16]
-
A but B B
B-PERI-ORG
equal(yi-1, B-PER) equal(yi, I-ORG)
216/09/12 8NLP
-
p(y|x)CNN q(y|x) q(y|x)
Posterior regularization [Ganchev+ 10]
316/09/12 8NLP
-
p(y|x)CNN q(y|x) q(y|x)
Posterior regularization [Ganchev+ 10]
4
q(y|x)
16/09/12 8NLP
-
q(y|x)
loss
5
min
1
N
NX
n=1
(1 )loss(yn,(xn)) + loss(q(y|xn),(xn))
xn
16/09/12 8NLP
-
p(y|x)
weights instead of relying on explicit rule rep-resentations, we can use p for predicting new ex-amples at test time when the rule assessment isexpensive or even unavailable (i.e., the privilegedinformation setting (Lopez-Paz et al., 2016)) whilestill enjoying the benefit of integration. Besides,the second loss term in Eq.(2) can be augmentedwith rich unlabeled data in addition to the labeledexamples, which enables semi-supervised learningfor better absorbing the rule knowledge.
3.3 Teacher Network ConstructionWe now proceed to construct the teacher networkq(y|x) at each iteration from p(y|x). The itera-tion index t is omitted for clarity. We adapt theposterior regularization principle in our logic con-straint setting. Our formulation ensures a closed-form solution for q and thus avoids any significantincreases in computational overhead.
Recall the set of FOL rules R = {(Rl,l)}Ll=1.Our goal is to find the optimal q that fits the ruleswhile at the same time staying close to p. For thefirst property, we apply a commonly-used strategythat imposes the rule constraints on q through anexpectation operator. That is, for each rule (indexedby l) and each of its groundings (indexed by g)on (X,Y ), we expect Eq(Y |X)[rlg(X,Y )] = 1,with confidence l. The constraints define a rule-regularized space of all valid distributions. For thesecond property, we measure the closeness betweenq and p with KL-divergence, and wish to minimizeit. Combining the two factors together and furtherallowing slackness for the constraints, we finallyget the following optimization problem:
min
q,0KL(q(Y |X)kp(Y |X)) + C
Xl,gl
l,gl
s.t. l(1 Eq[rl,gl(X,Y )]) l,glgl = 1, . . . , Gl, l = 1, . . . , L,
(3)
where l,gl 0 is the slack variable for respec-tive logic constraint; and C is the regularizationparameter. The problem can be seen as project-ing p into the constrained subspace. The problemis convex and can be efficiently solved in its dualform with closed-form solutions. We provide thedetailed derivation in the supplementary materialsand directly give the solution here:
q(Y |X) / p(Y |X) exp
8
top related