learning sets of rules 学习规则集合 2004. 12 edited by wang yinglin shanghai jiaotong...

54
Learning sets of rules Learning sets of rules 学学学学学学 学学学学学学 2004. 12 2004. 12 Edited by Wang Yinglin Edited by Wang Yinglin Shanghai Jiaotong University Shanghai Jiaotong University

Upload: gwendoline-owens

Post on 19-Jan-2016

290 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

Learning sets of rulesLearning sets of rules 学习规则集合学习规则集合

Learning sets of rulesLearning sets of rules 学习规则集合学习规则集合

2004. 122004. 12

Edited by Wang YinglinEdited by Wang Yinglin

Shanghai Jiaotong UniversityShanghai Jiaotong University

Page 2: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

2

注意:图书馆资源

www.lib.sjtu.edu.cn

馆藏资源电子数据库Kluwer

Elsevier

IEEE

….

中国期刊网万方数据…

Page 3: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

3

Introduction

Set of “If-then” rules The hypothesis is easy to interpret.

Goal Goal

Look at a new method to learn rulesLook at a new method to learn rules

Rules Propositional rules (rules without variables)

First-order predicate rules (with variables)

Page 4: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

4

Introduction

So far . . .Method 1: Learn decision tree rules

Method 2: Genetic algorithm , encode rule set as a bit string

From now . . . New method!Learning first-order rule

Using sequential covering

First-order rule Difficult to represent using a decision tree or other propositional representation

If Parent(x,y) then Ancestor(x,y)

If Parent(x,z) and Ancestor(z,y) then Ancestor(x,y)

Page 5: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

5

Introduction

contentsIntroduction

Sequential Covering Algorithms

First Order Rules

First-order inductive learning (FOIL)

Induction as Inverted Deduction

Summary

Page 6: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

6

Sequential Covering Algorithms

Algorithm1. Learn one rule that covers certain number of positive examples

2. Remove those examples covered by the rule

3. Repeat until no positive examples are left

Page 7: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

7

Sequential Covering Algorithms

Require that each rule has high accuracy but any coverageHigh accuracy the predicate : correct

Accepting low coverage the predicate NOT for every training example

Page 8: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

8

Sequential Covering Algorithms

Sequential-Covering

(Target-Attribute, Attributes, Examples, Threshold)

Learned-Rules {}

Rule Learn-One-Rule (Target-Attribute, Attributes, Examples)

WHILE Performance (Rule, Examples) > Threshold DO

Learned_rules ← Learned_rules + Rule // add new rule to set

Examples ← Examples - {examples correctly classified by Rule}

Rule Learn-One-Rule (Target-Attribute, Attributes, Examples)

Sort-By-Performance (Learned-Rules, Target-Attribute, Examples)

RETURN Learned-Rules

Page 9: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

9

Sequential Covering Algorithms

One of the most widespread approaches to learning disjunctive sets of rules.

Learning disjunctive sets of rules sequence of single problem ( a rule : conjunction of attr. value )

It performs a greedy search (no backtracking); as such it may not find an optimal rule set.

It sequentially covers the set of positive examples until the performance of a rule is below a threshold.

Page 10: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

10

Sequential Covering Algorithms : General to Specific Beam Search

How do we learn each individual rule?

Requirements for LEARN-ONE-RULEHigh accuracy, need not high coverage

One approach is . . .To proceed as decision tree learning (ID3)

BUT, by following a single branch with best performance at each search step

Page 11: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

11

Sequential Covering Algorithms : General to Specific Beam Search

IF {Humidity = Normal}THEN Play-Tennis = Yes

IF {Wind = Strong}THEN Play-Tennis = No

IF {Wind = Light}THEN Play-Tennis = Yes

IF {Humidity = High}THEN Play-Tennis = No

IF {}THEN Play-Tennis = Yes

IF {Humidity = Normal,Outlook = Sunny}

THEN Play-Tennis = Yes

IF {Humidity = Normal,Wind = Strong}

THEN Play-Tennis = Yes

IF {Humidity = Normal,Wind = Light}

THEN Play-Tennis = Yes

IF {Humidity = Normal,Outlook = Rain}

THEN Play-Tennis = Yes

Page 12: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

12

•Idea: organize the hypothesis space search in general to specific fashion.

•Start with most general rule precondition, then greedily add the attribute that most improves performance measured over the training examples.

Sequential Covering Algorithms : General to Specific Beam Search

Page 13: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

13

Sequential Covering Algorithms : General to Specific Beam Search

Greedy search without backtracking

danger of suboptimal choice at any step

The algorithm can be extended using beam-searchKeep a list of the k best candidates at each step

On each search step, descendants are generated for each of these k best candidates.

And resulting set is again reduced to the k best candidates.

Page 14: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

14

Sequential Covering Algorithms : General to Specific Beam Search

Learn_One_Rule (target_attr,attributes,examples,k)Best-hypothesis = 0Candidate-hypotheses = {Best-hypothesis}While Candidate-hypotheses is not empty do

1. Generate the next more specific candidate hypotheses 2. Update Best-hypothesis For all h in new-candidates if (Performance(h) > Performance(Best-hypothesis)) Best-hypothesis = h3. Update Candidate-hypotheses Candidate-hypotheses = best k members of new-candidates

Return rule : “IF Best-hypothesis THEN prediction” (predication: most frequent value of target_attr. among examples covered by Best-

hypothesis)

Performance(h, examples, target_attr.) - h_examples = the subsets of examples covered by h - Return Entropy(h_examples)

Page 15: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

15

•All_constraints 所有形式为 (a=v) 的约束集合,其中, a 为Attributes 的成员, v 为出现在当前 Examples 集合中的 a 的值•New_candidate_hypotheses

对 Candidate_hypotheses 中每个 h ,循环

对 All_constraints 中每个 c ,循环

通过加入约束 c 创建一个 h 的特化式•New_candidate_hypotheses 中移去任意重复的、不一致的或非极大特殊化的假设

Generate the next more specific candidate hypotheses:

Sequential Covering Algorithms : General to Specific Beam Search

Page 16: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

16

Day Outlook Temp Humid Wind PlayTennis

D1 Sunny Hot High Weak No

D2 Sunny Hot High Strong No

D3 Overcast Hot High Weak Yes

D4 Rain Mild High Weak Yes

D5 Rain Cool Low Weak Yes

D6 Rain Cool Low Strong No

D7 Overcast Cool Low Strong Yes

D8 Sunny Mild High Weak No

D9 Sunny Cool Low Weak Yes

D10 Rain Mild Low Weak Yes

D11 Sunny Mild Low Strong Yes

D12 Overcast Mild High Strong Yes

D13 Overcast Hot Low Weak Yes

D14 Rain Mild High Strong No

1.best-hypothesis = IF T THEN PlayTennis(x) = Yes

Learn-One-Rule Example

Page 17: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

17

Day Outlook Temp Humid Wind PlayTennis

D1 Sunny Hot High Weak No

D2 Sunny Hot High Strong No

D3 Overcast Hot High Weak Yes

D4 Rain Mild High Weak Yes

D5 Rain Cool Low Weak Yes

D6 Rain Cool Low Strong No

D7 Overcast Cool Low Strong Yes

D8 Sunny Mild High Weak No

D9 Sunny Cool Low Weak Yes

D10 Rain Mild Low Weak Yes

D11 Sunny Mild Low Strong Yes

D12 Overcast Mild High Strong Yes

D13 Overcast Hot Low Weak Yes

D14 Rain Mild High Strong No

1.best-hypothesis = IF T THEN PlayTennis(x) = Yes

2.candidate-hypotheses = {best-hypothesis}

Learn-One-Rule Example

Page 18: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

18

Day Outlook Temp Humid Wind PlayTennis

D1 Sunny Hot High Weak No

D2 Sunny Hot High Strong No

D3 Overcast Hot High Weak Yes

D4 Rain Mild High Weak Yes

D5 Rain Cool Low Weak Yes

D6 Rain Cool Low Strong No

D7 Overcast Cool Low Strong Yes

D8 Sunny Mild High Weak No

D9 Sunny Cool Low Weak Yes

D10 Rain Mild Low Weak Yes

D11 Sunny Mild Low Strong Yes

D12 Overcast Mild High Strong Yes

D13 Overcast Hot Low Weak Yes

D14 Rain Mild High Strong No

1. best-hypothesis = IF T THEN PlayTennis(x) = Yes

2. candidate-hypotheses = {best-hypothesis}

3. all-constraints = {Outlook(x)=Sunny, Outlook(x)=Overcast, Temp(x)=Hot, ......}

Learn-One-Rule Example

Page 19: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

19

Day Outlook Temp Humid Wind PlayTennis

D1 Sunny Hot High Weak No

D2 Sunny Hot High Strong No

D3 Overcast Hot High Weak Yes

D4 Rain Mild High Weak Yes

D5 Rain Cool Low Weak Yes

D6 Rain Cool Low Strong No

D7 Overcast Cool Low Strong Yes

D8 Sunny Mild High Weak No

D9 Sunny Cool Low Weak Yes

D10 Rain Mild Low Weak Yes

D11 Sunny Mild Low Strong Yes

D12 Overcast Mild High Strong Yes

D13 Overcast Hot Low Weak Yes

D14 Rain Mild High Strong No

1. best-hypothesis = IF T THEN PlayTennis(x) = Yes

2. candidate-hypotheses = {best-hypothesis}

3. all-constraints = {Outlook(x)=Sunny, Outlook(x)=Overcast, Temp(x)=Hot, ......}

4. new-candidate-hypotheses = {IF Outlook=Sunny THEN PlayTennis=YES, IF Outlook=Overcast THEN PlayTennis=YES, ...}

Learn-One-Rule Example

Page 20: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

20

Day Outlook Temp Humid Wind PlayTennis

D1 Sunny Hot High Weak No

D2 Sunny Hot High Strong No

D3 Overcast Hot High Weak Yes

D4 Rain Mild High Weak Yes

D5 Rain Cool Low Weak Yes

D6 Rain Cool Low Strong No

D7 Overcast Cool Low Strong Yes

D8 Sunny Mild High Weak No

D9 Sunny Cool Low Weak Yes

D10 Rain Mild Low Weak Yes

D11 Sunny Mild Low Strong Yes

D12 Overcast Mild High Strong Yes

D13 Overcast Hot Low Weak Yes

D14 Rain Mild High Strong No

1. best-hypothesis = IF T THEN PlayTennis(x) = Yes

2. candidate-hypotheses = {best-hypothesis}

3. all-constraints = {Outlook(x)=Sunny, Outlook(x)=Overcast, Temp(x)=Hot, ......}

4. new-candidate-hypotheses = {IF Outlook=Sunny THEN PlayTennis=YES, IF Outlook=Overcast THEN PlayTennis=YES, ...}

5. best-hypothesis = IF Outlook=Sunny THEN PlayTennis=YES

Learn-One-Rule Example

Page 21: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

21

Sequential Covering Algorithms : Variations

Learn only rules that cover positive examplesThe fraction of positive example is small

in this case, we can modify the algorithm to learn only from those rare example

Instead of entropy, use a measure that evaluates the fraction of positive examples covered by the hypothesis

AQ-algorithmDifferent covering algorithm

Searches rule sets for particular target value

Different single-rule alg.

Guided by uncovered positive examples

Only attributes satisfied in examples are used

Page 22: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

22

Summary : Points for Consideration

Key design issue for learning sets of rules

Sequential or simultaneous?Sequential : isolate components of hypothesis

Simultaneous : whole hypothesis at once

General-to-specific or Specific-to-general?GS : learn-one-rule

SG : Find-S

Generate-and-Test or Example-Driven?G&T : search thru syntactically legal hypotheses

E-D : Find-S, Candidate-Elimination

Post-pruning of Rules?Popular overfitting recovery method

Page 23: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

23

Summary : Points for Consideration

What statistical evaluation method?Relative frequency :

nc/n (n : matched by rule, nc: classified by rule correctly)

M-estimate of accuracy :

(nc + mp) / (n + m)

P : the prior probability that a randomly drawn example will have classification assigned by the rule

m : weight ( or # of examples for weighting this prior)

Entropy

a i

c

ii ppSEntropy 2

1

log)(

Page 24: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

24

Learning first-order rules

From now . . .We consider learning rule that contain variables (first-order rules)

Inductive learning of first-order rules : inductive logic programming (ILP)

Can be viewed as automatically inferring Prolog programs

Two methods are considered

FOIL

Induction as inverted deduction

Page 25: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

25

Learning first-order rules

First-order ruleRules that contain variables

Example

Ancestor (x, y) Parent (x, y).

Ancestor (x, y) Parent (x, z) ^ Ancestor (z, y) :recursive

More expressive than propositional rules

IF (Father1 = Bob)&(Name2 = Bob)&(Female1 = True),THEN Daughter1,2 = True

IF Father(y,x) & Female(y), THEN Daughter(x,y)

Page 26: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

26

Learning first-order rules :Terminology

Terminology常量 ---Constants: e.g., John, Kansas, 42

变量 ---Variables: e.g., Name, State, x

谓词 ---Predicates: e.g., Father-Of, Greater-Than

函数 ---Functions: e.g., age, cosine

项 ---Term: constant, variable, or function(term)

文字 ----Literals (atoms): Predicate(term) or negation

例如: , Greater-Than(age(John), 42) , Female(Mary), ¬Female(x)

子句 ---Clause: disjunction of literals with implicit universal quantification

例如:∀ x : Female(x) Male(x)∨

Horn 子句 ----Horn clause: at most one positive literal

(H L1 L2 … Ln)

Page 27: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

27

Learning first-order rules :Terminology

First Order Horn Clauses Rules that have one or more preconditions and one single consequent. Predicates may have variables

The following Horn clause is equivalent

H L1 … Ln

H (L1 … Ln )

If L1 … Ln) then H

Page 28: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

28

Learning first-order rules :First-Order Inductive Learning (FOIL)

Natural extension of “Sequential covering + Learn-one-rule”

FOIL rule : similar to Horn clause with two exceptionsSyntactic restriction : no function

More expressive than Horn clauses :

Negation allowed in rule bodies

Page 29: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

29

Learning first-order rules :First-Order Inductive Learning (FOIL)FOIL (Target_predicate, Predicates, Examples)

Pos (Neg) ← those Examples for which the Target_predicate is True (False)

Learned_rules ← { }

while Pos is not NULL, do

New Rule ← the rule that predicts Target_predicate with no preconditions

New RuleNeg ← Neg

while NewRuleNeg, do

Candidate_literals ← candidate new literals for NewRule

Best_literal ← argmax Foil_Gain (L, NewRule)

Add Best_literal to preconditions of NewRule

NewRuleNeg ← subset of NewRuleNeg (satisfies NewRule precondi.)

Learned-rules ← Learned_rules + NewRule

Pos ← Pos – {members of Pos covered by NewRule}

Return Learned_rules

Page 30: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

30

Learning first-order rules :First-Order Inductive Learning (FOIL)

FOIL learns rules when the target literal is true.Cf. sequential covering learns both rules that are true and false

Outer loop Add a new rule to its disjunctive hypothesis

Specific-to-General search

Inner loopFind a conjunction

General-to-Specific search on each rule by starting with a NULL precondition and adding more literal (hill-climbing)

Cf. sequential covering performs a beam search.

Page 31: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

31

FOIL: Generating Candidate Specializations in FOIL

Generate new literals, each of which may be added to the rule preconditions.

Current Rule : P(x1, x2, … , xk) <- L1 … Ln

Add new literal Ln+1 to get more specific Horn clause

Form of literal

Q(v1, v2, … , vk) : Q in predicates

Equal( Xj, Xk ) : Xj and Xk are variables already present to the rule

The negation of above

Page 32: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

32

FOIL : Guiding the Search in FOIL

Consider all possible bindings (substitution) : prefer rules that possess more positive bindings

Foil_Gain(L, R)

L candidate predicate to add to rule R

p0 number of positive bindings of R

n0 number of negative bindings of R

p1 number of positive bindings of R + L

n1 number of negative bindings of R + L

t number of positive bindings of R also covered by R + L

Based on the numbers of positive and negative bindings covered before and after adding the new literal

00

0

11

1 lglgnp

p

np

pt Gain-Foil

Page 33: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

33

FOIL : Examples

ExamplesExamples

Target literalTarget literal : GrandDaughter(x, y) : GrandDaughter(x, y)

Training Examples Training Examples : GrandDaughter(Victor, Sharon): GrandDaughter(Victor, Sharon)Father(Sharon,Bob)Father(Sharon,Bob) Father(Tom, Bob) Father(Tom, Bob)

Female(Sharon) Female(Sharon) Father(Bob, Victor)Father(Bob, Victor)

Initial stepInitial step : GrandDaughter(x, y) ← : GrandDaughter(x, y) ←

positive binding : {x/Victor, y/Sharon}positive binding : {x/Victor, y/Sharon}

negative binding : othersnegative binding : others

Page 34: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

34

FOIL : Examples

Candidate additions to the rule preconditions : Equal(x,y), Female(x), Female(y), Father(x,y),

Father(y,x), Father(x,z), Father(z,x), Father(y,z),

Father(z,y) and the negations

For each candidate, calculate FOIL_Gain

If Father(y, z) has the maximum value of FOIL_Gain, select Father(y, z) to add precondition of ruleGrandDaughter(x, y) ← Father(y,z)

Iteration…

We add the best candidate literal and continue adding literals until we generate a rule like the following:

GrandDaughter(x,y) Father(y,z) ^ Father(z,x) ^ Female(y)

At this point we remove all positive examples covered by the rule and begin the search for a new rule.

Page 35: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

35

FOIL : Learning recursive rules sets

Predicate occurs in rule head.

Example

Ancestor (x, y) Parent (x, z) Ancestor (z, y).

Rule: IF Parent (x, z) Ancestor (z, y) THEN Ancestor (x, y)

Learning recursive rule from relation

Given: appropriate set of training examples

Can learn using FOIL-based search

Requirement: Ancestor Predicates

Recursive rules still have to outscore competing candidates at FOIL-Gain

How to ensure termination? (i.e. no infinite recursion)

[Quinlan, 1990; Cameron-Jones and Quinlan, 1993]

Page 36: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

36

Induction as inverted deduction

Induction : inference from specific to general

Deduction : inference from general to specific

Induction can be cast as a deduction problem

( < x∀ i, f(xi) > D) (B h x∈ ∧ ∧ i)├ f(xi)D : a set of training data

B : background knowledge

xi : ith training instance

f(xi) : target value

X├ Y : “Y follows deductively from X”, or “X entails Y”

For every training instance xi, the target value f(xi) must follow

deductively from B, h, and xi

Page 37: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

37

Induction as inverted deduction

Learn target : Child(u,v) : child of u is v

Positive example : Child(Bob, Sharon)

Given instance: Male(Bob), Female(Sharon), Father(Sharon,Bob)

Background knowledge : Parent(u,v) Father(u,v)

Hypothesis satisfying the (B h x∧ ∧ i)├ f(xi)

h1 : Child(u, v) ←Father(v, u) : no need of B

h2 : Child(u, v) ←Parent(v, u) : need B

The role of Background KnowledgeExpanding the set of hypotheses

New predicates (Parent) can be introduced into hypotheses(h2)

Page 38: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

38

Induction as inverted deduction

In view of induction as the inverse of deduction

Inverse entailment operators is required

O(B, D) = h such that ( < x∀ i, f(xi) > D) (B h x∈ ∧ ∧ i)├ f(xi)

Input : training data D = {< xi, f(xi)>}

background knowledge B

Output : a hypothesis h

Page 39: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

39

Induction as inverted deduction

Attractive features to formulating the learning task1. This formulation subsumes the common definition of learning

2. By incorporating the notion of B, this formulation allows a more rich definition

3. By incorporating B, this formulation invites learning methods that use this B to guide search for h

Page 40: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

40

Induction as inverted deduction

Practical difficulties to formulating1. The requirement of the formulation does not naturally accommodate noisy training data.

2. The language of first-order logic is so expressive, and the number of hypotheses that satisfy the formulation is so large.

3. In most ILP system, the complexity of the hypothesis space search increases as B is increased.

Page 41: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

41

Inverting Resolution

Resolution rule

P L∨

¬ L R ∨

P R ∨ (L: literal P,R : clause)

Resolution Operator (propositional form)Given initial clauses C1 and C2, find a literal L from clause C1 such

that ¬ L occurs in clause C2.

Form the resolvent C by including all literal from C1 and C2, except

for L and ¬ L. More precisely, the set of literals occurring in the conclusion C is

C = (C1 - {L}) (C∪ 2 - { ¬ L})

Page 42: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

42

Inverting Resolution

Example 1 C2: KnowMaterial ∨ ¬ Study

C1: PassExam ∨ ¬ KnowMaterial

C: PassExam ∨ ¬ Study

Example 2C1: A B C ┐D∨ ∨ ∨

C2: ┐B E F ∨ ∨

C : A C ┐D E F∨ ∨ ∨ ∨

Page 43: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

43

Inverting Resolution

O(C, C1)

Perform inductive inference

Inverse Resolution Operator (propositional form)

Given initial clauses C1 and C, find a literal L that occurs in clause

C1, but not in Clause C.

Form the second clause C2 by including the following literals

C2 = (C - (C1 -{L})) {∪ ¬ L}

Page 44: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

44

Inverting Resolution

Example 1 C2: KnowMaterial ∨ ¬ Study

C1: PassExam ∨ ¬ KnowMaterial

C: PassExam ∨ ¬ Study

Example 2C1: B D , C : A B∨ ∨

C2 : A ┐D (if, C∨ 2 : A ┐D B ??)∨ ∨

Inverse resolution is nondeterministic

Page 45: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

45

Inverting Resolution : First-Order Resolution

First-Order ResolutionSubstitution ( 置换 )

Mapping of variables to terms

Ex) θ = {x/Bob, z/y}

Unifying Substitution (合一置换)For two literal L1 and L2, provided L1θ = L2θ

Ex) θ = {x/Bill, z/y}

L1=Father(x, y), L2=Father(Bill, z)

L1θ = L2θ = Father(Bill, y)

Page 46: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

46

Inverting Resolution : First-Order Resolution

Resolution Operator (first-order form)

Find a literal L1 from clause C1, literal L2 from clause C2, and

substitution θ such that L1θ= ¬ L2θ.

From the resolvent C by including all literals from C1θ and C2θ,

except for L1θ and ¬ L2θ. More precisely, the set of literals

occurring in the conclusion C is

C = (C1 - {L1})θ (C∪ 2 - {L2})θ

Page 47: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

47

Inverting Resolution : First-Order Resolution

ExampleC1= White(x) ← Swan(x), C2 = Swan(Fred)

C1 = White(x)∨ ¬ Swan(x),

L1= ¬ Swan(x), L2=Swan(Fred)

unifying substitution θ = {x/Fred}

then L1θ = ¬ L2θ = ¬ Swan(Fred)

(C1-{L1})θ = White(Fred)

(C2-{L2})θ = Ø

∴ C = White(Fred)

Page 48: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

48

Inverting Resolution : First-order case

Inverse Resolution : First-order case

C=(C1-{L1})θ1 (C∪ 2-{L2})θ2

(where, θ = θ1θ2 (factorization))

C - (C1-{L1})θ1 = (C2-{L2})θ2

(where, L2 = ¬ L1θ1θ2-1 )

∴ C2=(C-(C1-{L1})θ1)θ2-1 {∪ ¬ L1θ1θ2

-1}

Page 49: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

49

Inverting Resolution : First-order case

Multistep Inverse Resolution

Father(Tom,Bob) GrandChild(y,x)∨¬ Father(x,z)∨¬ Father(z,y)

{Bob/y,Tom/z}

Father(Shannon,Tom) GrandChild(Bob,x)∨¬ Father(x,Tom)

{Shannon/x}

GrandChild(Bob,Shannon)

Page 50: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

50

Inverting Resolution

C=GrandChild(Bob,Shannon)

C1=Father(Shannon,Tom)

L1=Father(Shannon,Tom)

Suppose we choose inverse substitution

θ1-1={}, θ2

-1={Shannon/x)

(C-(C1-{L1})θ1)θ2-1= (Cθ1)θ2

-1 = GrandChild(Bob,x)

{ ¬ L1θ1θ2-1} = ¬ Father(x,Tom)

∴ C2 = GrandChild(Bob,x) ∨¬ Father(x,Tom)or equivalently GrandChild(Bob,x) ← ¬ Father(x,Tom)

Page 51: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

51

逆消解小结

逆消解提供了一种一般的途径以自动产生满足式子 ( < x∀ i, f(xi) > D) (B h x∈ ∧ ∧ i)├ f(xi) 的假设

与 FOIL 方法的差异逆消解是样例驱动, FOIL 是生成再测试逆消解在每一步只考虑可用数据中的一小部分, FOIL 考虑所有的可用数据看起来基于逆消解的搜索更有针对性且更有效,但实际未必如此

Page 52: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

52

Summary

Learning Rules from Data

Sequential Covering Algorithms序列覆盖算法学习析取的规则集,方法是先学习单个精确的规则,然后移去被此规则覆盖的正例,再在剩余样例上重复这一过程序列覆盖算法提供了一个学习规则集的有效的贪婪算法,可作为由顶向下的决策树学习算法的替代算法,决策树算法可被看作并行覆盖,与序列覆盖相对应Learning single rules by search

Beam search

Alternative covering methods :这些方法的不同在于它们考察规则前件空间的策略不同

Learning rule sets

First-Order RulesLearning single first-order rules

Representation: first-order Horn clauses

Extending Sequential-Covering and Learn-One-Rule: variables in rule preconditions :学习一阶规则集的方法是将 CN2 中的序列覆盖算法由命题形式扩展到一阶表示。

Page 53: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

53

Summary

FOIL: learning first-order rule sets :可学习包括简单递归规则集在内的一阶规则集

Idea: inducing logical rules from observed relations

Guiding search in FOIL

Learning recursive rule sets

Induction as inverted deduction

学习一阶规则集的另一个方法是逆演绎,通过运用熟知的演绎推理的逆算子来搜索假设Idea : inducing logical rule as inverted deduction

O(B, D) = h – such that ( < x∀ i, f(xi) > D) (B h x∈ ∧ ∧ i)├ f(xi)

逆消解是消解算子的逆转,而消解是普遍用于机器定理证明的一种推理规则,

Page 54: Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

54

书后练习