learningandcomputing - mahito · data law that generalizes 4/32. frameworkoflearning (mlvsdm) user...

47
November 15, 2016 Learning and Computing Data Mining Theory (データマイニング工学) Mahito Sugiyama (杉山麿人)

Upload: others

Post on 19-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

November 15, 2016

Learning and ComputingData Mining Theory (データマイニング工学)

Mahito Sugiyama (杉山麿人)

Page 2: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Objective of This Lecture

• Learn a fundamental mechanism ofmachine learning– Machine learning is a core process

in many applications in data mining

• Computational aspects of machine learning aremainly discussed

• Key issues:– Computing (single) vs Learning (double)

◦ Finite/infinite– Learning targets (mathematical objects)

vs Representations (programs)

1/32

Page 3: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Example of Learning from Data(frommlss.tuebingen.mpg.de/2013/schoelkopf_whatisML_slides.pdf)

• 1, 2, 4, 7, . . .– What are succeeding numbers?

2/32

Page 4: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Example of Learning from Data(frommlss.tuebingen.mpg.de/2013/schoelkopf_whatisML_slides.pdf)

• 1, 2, 4, 7, . . .– What are succeeding numbers?

1, 2, 4, 7, 11, 16, . . . (an = an−1 + n − 1)

2/32

Page 5: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Example of Learning from Data(frommlss.tuebingen.mpg.de/2013/schoelkopf_whatisML_slides.pdf)

• 1, 2, 4, 7, . . .– What are succeeding numbers?

1, 2, 4, 7, 11, 16, . . . (an = an−1 + n − 1)1, 2, 4, 7, 12, 20, . . . (an = an−1 + an−2 + 1)

2/32

Page 6: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Example of Learning from Data(frommlss.tuebingen.mpg.de/2013/schoelkopf_whatisML_slides.pdf)

• 1, 2, 4, 7, . . .– What are succeeding numbers?

1, 2, 4, 7, 11, 16, . . . (an = an−1 + n − 1)1, 2, 4, 7, 12, 20, . . . (an = an−1 + an−2 + 1)1, 2, 4, 7, 13, 24, . . . (an = an−1 + an−2 + an−3)

2/32

Page 7: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Example of Learning from Data(frommlss.tuebingen.mpg.de/2013/schoelkopf_whatisML_slides.pdf)

• 1, 2, 4, 7, . . .– What are succeeding numbers?

1, 2, 4, 7, 11, 16, . . . (an = an−1 + n − 1)1, 2, 4, 7, 12, 20, . . . (an = an−1 + an−2 + 1)1, 2, 4, 7, 13, 24, . . . (an = an−1 + an−2 + an−3)1, 2, 4, 7, 14, 28 (divisors of 28)

2/32

Page 8: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Example of Learning from Data(frommlss.tuebingen.mpg.de/2013/schoelkopf_whatisML_slides.pdf)

• 1, 2, 4, 7, . . .– What are succeeding numbers?

1, 2, 4, 7, 11, 16, . . . (an = an−1 + n − 1)1, 2, 4, 7, 12, 20, . . . (an = an−1 + an−2 + 1)1, 2, 4, 7, 13, 24, . . . (an = an−1 + an−2 + an−3)1, 2, 4, 7, 14, 28 (divisors of 28)1, 2, 4, 7, 1, 1, 5, . . . (decimals of π = 3.1415 . . ., e = 2.718 . . .)

2/32

Page 9: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Example of Learning from Data(frommlss.tuebingen.mpg.de/2013/schoelkopf_whatisML_slides.pdf)

• 1, 2, 4, 7, . . .– What are succeeding numbers?

1, 2, 4, 7, 11, 16, . . . (an = an−1 + n − 1)1, 2, 4, 7, 12, 20, . . . (an = an−1 + an−2 + 1)1, 2, 4, 7, 13, 24, . . . (an = an−1 + an−2 + an−3)1, 2, 4, 7, 14, 28 (divisors of 28)1, 2, 4, 7, 1, 1, 5, . . . (decimals of π = 3.1415 . . ., e = 2.718 . . .)

• 993 results (!) in the on-line encyclopedia of integersequences (https://oeis.org/) 2/32

Page 10: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Analyze Learning asScientific/Engineering Problem

• Which is the correct answer (or generalization;汎化)for succeeding numbers of 1, 2, 4, 7, . . . ?– Any answer is possible!

3/32

Page 11: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Analyze Learning asScientific/Engineering Problem

• Which is the correct answer (or generalization;汎化)for succeeding numbers of 1, 2, 4, 7, . . . ?– Any answer is possible!

• We should take two points into consideration:(i) We need to formalize the problem of “learning”

(学習の定式化)◦ There are two agents (teacher and learner) in learning,

which are different from “computation”(ii) Learning is an infinite process (無限に続く過程)

◦ A learner usually never knows thatthe current hypothesis is perfectly correct 3/32

Page 12: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Framework of Learning (ML vs DM)

User(Teacher) (Learner)

Computer

Machine Learning

Data

Law thatgeneralizes

4/32

Page 13: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Framework of Learning (ML vs DM)

User(Teacher) (Learner)

Computer

Machine Learning

Data

Law thatgeneralizes

Data Mining (Knowledge Discovery)

Objects(Natural phenome-

non, market, ...)

Data

Law thatgeneralizes

(Teacher) (Learner)

4/32

Page 14: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Computation—Core Engine of Learning/Mining

• Machine learning/data mining is usually achievedusing a computer (計算機)

• Computing behavior is mathematically formulatedby Alan Turing in 1936– A. M. Turing,On Computable Numbers, with the Application to

the Entscheidungsproblem, Proceedings of the LondonMathematical Society, 42(1), 230–265, 1937

• The model of computation, known as a Turing machine(チューリング機械), is developed for simulatingcomputation by human beings

5/32

Page 15: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

6/32

Page 16: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

7/32

Page 17: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

8/32

Page 18: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Turing Machine

Tape

MachineState

• The machine repeats the following:– Read a symbol a of a cell– Do the following from a and the current state s

according to a set of rules in its memory◦ Replace the symbol a at the square◦ Move the head◦ Change the state s

9/32

Page 19: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Computing vs Learning

• In computation, the process is completed on its own– No interaction

◦ The Turing machine automatically worksaccording to programmed rules

– A finite process

10/32

Page 20: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Computing vs Learning

• In computation, the process is completed on its own– No interaction

◦ The Turing machine automatically worksaccording to programmed rules

– A finite process

• In learning, there are two agents (teacher and learner)– Interaction between agents should be considered

◦ A learning protocol (学習プロトコル) betweena teacher and a learner is essentially needed

– An infinite process

10/32

Page 21: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Formalization of Learning inComputational Manner

1. What are targets of learning? (学習対象)

2. How to represent targets and hypotheses? (表現言語)

3. How are data provided to a learner? (データ)

4. How does the learner work? (学習手順,アルゴリズム)

5. When can we say that the learner correctly learnsthe target? (学習の正当性)

11/32

Page 22: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Learning of Binary Classifier

12/32

Page 23: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Learning of Binary Classifier

wx + b = 0

12/32

Page 24: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Example: Perceptron (by F. Rosenblatt, 1958)

• Learning target: two subsets F ,G ⊆ Rd s.t. F ∩ G = ∅

– Assumption: F and G are linearly separable◦ There exists a function (classifier) f∗(x) = w∗x + b s.t.f∗(x) > 0 ∀x ∈ F ,f∗(x) < 0 ∀x ∈ G

13/32

Page 25: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Example: Perceptron (by F. Rosenblatt, 1958)

• Learning target: two subsets F ,G ⊆ Rd s.t. F ∩ G = ∅

– Assumption: F and G are linearly separable◦ There exists a function (classifier) f∗(x) = w∗x + b s.t.f∗(x) > 0 ∀x ∈ F ,f∗(x) < 0 ∀x ∈ G

• Hypotheses: hyperplanes on Rd

– If we consider a linear equation f (x) = wx + b, each line canbe uniquely specified by a pair of two parameters (w , b)

– Each hypothesis is represented by a pair (w , b)13/32

Page 26: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Example: Perceptron (by F. Rosenblatt, 1958)

• Learning target: two subsets F ,G ⊆ Rd s.t. F ∩ G = ∅

– Assumption: F and G are linearly separable◦ There exists a function (classifier) f∗(x) = w∗x + b s.t.f∗(x) > 0 ∀x ∈ F ,f∗(x) < 0 ∀x ∈ G

• Hypotheses: hyperplanes on Rd

– If we consider a linear equation f (x) = wx + b, each line canbe uniquely specified by a pair of two parameters (w , b)

– Each hypothesis is represented by a pair (w , b)• Data: a sequence of pairs (x1 , y1), (x2 , y2), . . .

– (x i , y i ): (a real-valued vector in Rd , a label)– x i ∈ F ∪ G, y i ∈ {1,−1}, and y i = 1 (y i = −1) if x i ∈ F (x i ∈ G)13/32

Page 27: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Learning Model for Perceptron

FG A hypothesis, a hyperplane

in general, is uniquely speci�edby a pair (w, b)

(xi, 1)(xj, –1) Data

f(x) = wx + b = 0

14/32

Page 28: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Learning Procedure of Perceptron

1. w ← 0, b ← 0 (or a small random value) // initialization

2. for i = 1, 2, 3, . . . do

3. Receive i-th pair (x i , y i )4. Compute a = ∑d

j=1 wjx j

i + b

5. if y i ⋅ a < 0 then // x i is misclassified

6. w ← w + y i x i // update the weight

7. b ← b + y i // update the bias

8. end if

9. end for

15/32

Page 29: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Correctness of Perceptron

• It is guaranteed that a perceptron always convergesto a correct classifier– A correct classifier is a function f s.t.

f (x) > 0 ∀x ∈ F ,f (x) < 0 ∀x ∈ G

– The convergence theorem (パーセプトロンの収束性定理)

• Note: there are (infinitely) many functions that correctlyclassify F and G– A perceptron converges to one of them

16/32

Page 30: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Summary: Perceptron

Target Two disjoint subsets of Rd

Representation Two parameters (w , b) of linearequation f (x) = wx + b

Data Real vectors from target subsets

Algorithm Perceptron

Correctness Convergence theorem

17/32

Page 31: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Example 2: Maximum LikelihoodEstimation

• Estimate the probability of a coin being a head in a toss

Target Bernoulli distribution

Representation Parameter (probability) p

Data Sampling

Algorithm Maximum Likelihood Estimation

p̂ = k/nCorrectness Consistency

18/32

Page 32: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Basic Definitions of Learning

• Target (学習対象): a classifier (分類器) f∗ ∶ X → {0, 1}– A class C of classifiers is usually pre-determined– Each target can be viewed as the set F∗ = {a ∈ X ∣ f∗(a) = 1}

◦ F∗ is called a concept (概念)

19/32

Page 33: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Basic Definitions of Learning

• Target (学習対象): a classifier (分類器) f∗ ∶ X → {0, 1}– A class C of classifiers is usually pre-determined– Each target can be viewed as the set F∗ = {a ∈ X ∣ f∗(a) = 1}

◦ F∗ is called a concept (概念)

• Hypothesis space (仮説空間): R– Each hypothesis H ∈ R represents a classifier– R ⊆ Σ∗ usually holds (Σ∗ is the set of finite strings)

19/32

Page 34: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Basic Definitions of Learning

• Target (学習対象): a classifier (分類器) f∗ ∶ X → {0, 1}– A class C of classifiers is usually pre-determined– Each target can be viewed as the set F∗ = {a ∈ X ∣ f∗(a) = 1}

◦ F∗ is called a concept (概念)

• Hypothesis space (仮説空間): R– Each hypothesis H ∈ R represents a classifier– R ⊆ Σ∗ usually holds (Σ∗ is the set of finite strings)

• Data: Example (例) (a, f∗(a))– a ∈ X– An example (a, 1) is called positive (正例)– An example (a, 0) is called negative (負例)

19/32

Page 35: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Learning Model

X

A class ofclassi�ers

A target classi�er(concept) f*

Hypothesis

(x1, 0), (x2, 1),(x3, 1), (x4, 0), ...

Learner

Data

20/32

Page 36: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Learning Model (e.g. Perceptron)

ℝd

A class oflinearlyseparable sets

A target of linearequation f*

Perceptron(w, b)

Data(x1, 0), (x2, 1),(x3, 1), (x4, 0), ...

(in f(x) = wx + b)

21/32

Page 37: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Gold’s Learning Model(Identification in the Limit)

• Gold gave the first basic learning model, called“Identification in the limit”– E. M. Gold, Language identification in the limit, Information and

Control, 10(5), 447–474, 1967

• This model was originally introduced to analyzethe learnability of formal languages– His motivation was to model infant’s learning process of

natural languages

22/32

Page 38: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

23/32

Page 39: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Formal Languages

• Alphabet (アルファベット) Σ: a nonempty finite set– Each element a ∈ Σ is called a symbol (記号)

• Word (語) w = a1a2 . . . an : a finite sequence of symbols– Null word (空語) ε, whose length is 0

• The set of words Σ∗ (with ε) and Σ+ (without ε)Σ∗ = { a1a2 . . . an ∣ a i ∈ Σ, n ≥ 0 }Σ+ = { a1a2 . . . an ∣ a i ∈ Σ, n ≥ 1 } = Σ∗ \ {ε}

• Formal language (形式言語): a subset of Σ∗

24/32

Page 40: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Representation of Languages

• We connect syntax and semantics using a mapping f

• Given a hypothesis H ∈ R, f (H,w) is a functionthat returns 0 or 1 for w ∈ Σ∗

– H is a program of a classifier– w is a (binary) code of the input to H

• L(H) = {w ∈ Σ∗ ∣ f (H,w) = 1 }• R is usually a recursively enumerable set (帰納的可算集合)

– There is an algorithm that enumerates all elements ofR– R is often identified with N

◦ Each natural number encodes a classifier (hypothesis)25/32

Page 41: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Setting of Gold’s Learning Model

• A class of languages C ⊆ { A ∣ A ⊆ Σ∗ } is chosen• For a language L ∈ C, an infinite sequenceσ = (x1 , y1), (x2 , y2), . . . is a complete presentation(完全提示) of L if(i) {x1 , x2 , . . . } = Σ∗

(ii) y i = 1 ⟺ x i ∈ L for all i– σ[i] = (x1 , y1), . . . , (x i , y i ) (a prefix of σ)

• A learner (学習者) is a procedure M that receives σ andgenerates an infinite sequence of hypothesesγ = H1 , H2 , . . .– M outputs H i if it gets σ[i]

26/32

Page 42: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Identification in the Limit

• If γ converges to some hypothesis H and H represents L,we say that M identifies L in the limit (極限学習する)

• If M identifies any L ∈ C in the limit,we say that M identifies C in the limit

27/32

Page 43: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Basic Strategy: Generate and Test

• Input: a complete presentation σ of a language L

• Output: γ = H1 , H2 , . . . ,

1. i ← 1, S ← ∅

2. Repeat

3. S ← S ∪ {(x i , y i )}4. k ← min { j ∈ N ∣ L(H( j)) consistent with S }5. // H( j) is a hypothesis encoded by a natural number j

6. H i ← H(k) and output H i

7. i ← i + 1

8. until forever28/32

Page 44: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Power of Generate and Test Strategy

• For any class C of languages, Generate and Test strategyidentifies C in the limit– That is, Generate and Test strategy identifies

every language L ∈ C in the limit

• Unfortunately, this strategy is very inefficient– More intelligent strategy can be designed

for each learning target– One of themost important tasks in studies ofmachine learning!

29/32

Page 45: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Learning from Positive Data

• In many cases, in particular in data mining,we obtain only positive data– Imagine supervised vs unsupervised learning

• A positive presentation (正提示) of a language L ∈ Cis an infinite sequence x1 , x2 , . . . s.t. L = {x1 , x2 , . . . }

• If γ (an infinite sequence of hypotheses of a learner M)converges to a hypothesis H s.t. L(H) = L,we say that M identifies L in the limit from positive data(正例から極限学習する)

30/32

Page 46: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

Limitation ofLearning from Positive Data

• Consider the following class C(i) All finite languages are included in C(ii) At least one infinite language is included in C– C is called superfinite (超有限)

• Gold proved that a superfinite class cannot be learnedfrom positive data– e.g. Σ = {a}, C contains all finite languages and {an ∣ n ≥ 1}

• Although this fact shows a limitation, there still exist richclasses of interesting languages– For example, pattern language (パターン言語) 31/32

Page 47: LearningandComputing - Mahito · Data Law that generalizes 4/32. FrameworkofLearning (MLvsDM) User (Teacher) (Learner) Computer Machine Learning Data Law that generalizes Data Mining

References

• If you are interested in computational learning theory,the following books might be interesting:– 榊原康文,横森貴,小林聡,計算論的学習,培風館, 2001– S. Jain, D. N. Osherson, J. S. Royer, A. Sharma, Systems That

Learn, A Bradford Book, 1999

• These books are not necessarily for this lecture

32/32