tommi s. jaakkola mit csail [email protected]
TRANSCRIPT
6.867 Machine learning: lecture 1
Tommi S. Jaakkola
MIT CSAIL
6.867 Machine learning: administrivia⢠Course staff ([email protected])
â Prof. Tommi Jaakkola ([email protected])
â Adrian Corduneanu ([email protected])
â Biswajit (Biz) Bose ([email protected])
⢠General info
â lectures MW 2.30-4pm in 32-141
â tutorials/recitations, initially F11-12.30 (4-145) / F2.30-4
(4-159)
â website http://www.ai.mit.edu/courses/6.867/
⢠Grading
â midterm (15%), final (25%)
â 5 (â bi-weekly) problem sets (30%)
â final project (30%)
Tommi Jaakkola, MIT CSAIL 2
Why learning?⢠Example problem: face recognition
Tommi Jaakkola, MIT CSAIL 3
Why learning?⢠Example problem: face recognition
Jaakkola
Indyk
Barzilay
Collins
Training data: a collection of images and labels (names)
Tommi Jaakkola, MIT CSAIL 4
Why learning?⢠Example problem: face recognition
Jaakkola
Indyk
Barzilay
Collins
Training data: a collection of images and labels (names)
Evaluation criterion: correct labeling of new images
Tommi Jaakkola, MIT CSAIL 5
Why learning?⢠Example problem: text/document classification
faculty
faculty
faculty
course
â a few labeled training documents (webpages)
â goal to label yet unseen documents
Tommi Jaakkola, MIT CSAIL 6
Why learning?⢠There are already a number of applications of this type
â face, speech, handwritten character recognition
â fraud detection (e.g., credit card)
â recommender problems (e.g., which movies/products/etc
youâd like)
â annotation of biological sequences, molecules, or assays
â market prediction (e.g., stock/house prices)
â finding errors in computer programs, computer security
â defense applications
â etc
Tommi Jaakkola, MIT CSAIL 7
Learning
Jaakkola
Indyk
Barzilay
Collins
faculty
faculty
faculty
course
⢠Steps
â entertain a (biased) set of possibilities (hypothesis class)
â adjust predictions based on available examples (estimation)
â rethink the set of possibilities (model selection)
Tommi Jaakkola, MIT CSAIL 8
Learning
Jaakkola
Indyk
Barzilay
Collins
faculty
faculty
faculty
course
⢠Steps
â entertain a (biased) set of possibilities (hypothesis class)
â adjust predictions based on available examples (estimation)
â rethink the set of possibilities (model selection)
⢠Principles of learning are âuniversalâ
â society (e.g., scientific community)
â animal (e.g., human)
â machine
Tommi Jaakkola, MIT CSAIL 9
Learning, biases, representation
âyesâ
Tommi Jaakkola, MIT CSAIL 10
Learning, biases, representation
âyesâ
âyesâ
Tommi Jaakkola, MIT CSAIL 11
Learning, biases, representation
âyesâ
âyesâ
ânoâ
Tommi Jaakkola, MIT CSAIL 12
Learning, biases, representation
âyesâ
Tommi Jaakkola, MIT CSAIL 13
Learning, biases, representation
âyesâ
âyesâ
Tommi Jaakkola, MIT CSAIL 14
Learning, biases, representation
âyesâ
âyesâ
ânoâ (oops)
Tommi Jaakkola, MIT CSAIL 15
Representation⢠There are many ways of presenting the same information
0111111001110010000000100000001001111110111011111001110111110001
⢠The choice of representation may determine whether the
learning task is very easy or very difficult
Tommi Jaakkola, MIT CSAIL 16
Representation0111111001110010000000100000001001111110111011111001110111110001 âyesâ
0001111100000011000001110000011001111110111111001111111101111111 âyesâ
1111111000000110000011000111111000000111100000111110001101110001 ânoâ
Tommi Jaakkola, MIT CSAIL 17
Representation0111111001110010000000100000001001111110111011111001110111110001 âyesâ
0001111100000011000001110000011001111110111111001111111101111111 âyesâ
1111111000000110000011000111111000000111100000111110001101110001 ânoâ
âyesâ
âyesâ
ânoâ
Tommi Jaakkola, MIT CSAIL 18
Hypothesis class⢠Representation: examples are binary vectors of length d = 64
x = [111 . . . 0001]T =
and labels y â {â1, 1} (ânoâ,âyesâ)
⢠The mapping from examples to labels is a âlinear classifierâ
y = sign ( θ ¡ x ) = sign ( θ1x1 + . . . + θdxd )
where θ is a vector of parameters we have to learn from
examples.
Tommi Jaakkola, MIT CSAIL 19
Linear classifier/experts⢠We can understand the simple linear classifier
y = sign ( θ ¡ x ) = sign ( θ1x1 + . . . + θdxd )
as a way of combining expert opinion (in this case simple
binary features)
. . .
xdx1 x2
θ1 θd
âvotesâ
y = sign ( θ1x1 + . . . + θdxd )
combined âvotesâ
Expert 1θ2
majority rule
1 10
Tommi Jaakkola, MIT CSAIL 20
Estimationx y
0111111001110010000000100000001001111110111011111001110111110001 +10001111100000011000001110000011001111110111111001111111100000011 +11111111000000110000011000111111000000111100000111110001101111111 -1
. . . . . .⢠How do we adjust the parameters θ based on the labeled
examples?
y = sign ( θ ¡ x )
Tommi Jaakkola, MIT CSAIL 21
Estimationx y
0111111001110010000000100000001001111110111011111001110111110001 +10001111100000011000001110000011001111110111111001111111100000011 +11111111000000110000011000111111000000111100000111110001101111111 -1
. . . . . .⢠How do we adjust the parameters θ based on the labeled
examples?
y = sign ( θ ¡ x )
For example, we can simply refine/update the parameters
whenever we make a mistake:
θiâ θi + y xi, i = 1, . . . , d if prediction was wrong
Tommi Jaakkola, MIT CSAIL 22
Evaluation⢠Does the simple mistake driven algorithm work?
Tommi Jaakkola, MIT CSAIL 23
Evaluation⢠Does the simple mistake driven algorithm work?
0 200 400 600 800 1000 1200 14000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
aver
age
erro
r
number of examples
(average classification error as a function of the number of
examples and labels seen so far)
Tommi Jaakkola, MIT CSAIL 24
Model selection⢠The simple linear classifier cannot solve all the problems
(e.g., XOR)
(0,0)
x1
x2
xx
xx
(1,1)(0,1)
(1,0)
Tommi Jaakkola, MIT CSAIL 25
Model selection⢠The simple linear classifier cannot solve all the problems
(e.g., XOR)
â1
x1
x2
xx
xx1
â11
Tommi Jaakkola, MIT CSAIL 26
Model selection⢠The simple linear classifier cannot solve all the problems
(e.g., XOR)
â1
x1
x2
xx
xx1
â11
⢠Can we rethink the approach to do even better?
Tommi Jaakkola, MIT CSAIL 27
Model selection⢠The simple linear classifier cannot solve all the problems
(e.g., XOR)
â1
x1
x2
xx
xx1
â11
⢠Can we rethink the approach to do even better?
We can, for example, add âpolynomial expertsâ
y = sign ( θ1x1 + . . . + θdxd + θ12x1x2 + . . . )
Tommi Jaakkola, MIT CSAIL 28
Model selection contâd
â1.5 â1 â0.5 0 0.5 1 1.5 2â1
â0.5
0
0.5
1
1.5
2
â1.5 â1 â0.5 0 0.5 1 1.5 2â1
â0.5
0
0.5
1
1.5
2
linear 2nd order polynomial
â1.5 â1 â0.5 0 0.5 1 1.5 2â1
â0.5
0
0.5
1
1.5
2
â1.5 â1 â0.5 0 0.5 1 1.5 2â1
â0.5
0
0.5
1
1.5
2
4th order polynomial 8th order polynomial
Tommi Jaakkola, MIT CSAIL 29
Types of learning problems (not exhaustive)⢠Supervised learning: explicit feedback in the form of examples
and target labels
â goal to make predictions based on examples (classify them,
predict prices, etc)
⢠Unsupervised learning: only examples, no explicit feedback
â goal to reveal structure in the observed data
⢠Semi-supervised learning: limited explicit feedback, mostly
only examples
â tries to improve predictions based on examples by making
use of the additional âunlabeledâ examples
⢠Reinforcement learning: delayed and partial feedback, no
explicit guidance
â goal to minimize the cost of a sequence of actions (policy)
Tommi Jaakkola, MIT CSAIL 30