mncs 16-09-4주-변승규-introduction to the machine learning

27
Seung-gyu, BYEON [email protected] Intelligence Networking & Computing Lab. Dept. of Electrical & Computer Eng. Pusan National University Seung-gyu BYEON Introduction to the Machine Learning 1

Upload: seung-gyu-byeon

Post on 14-Apr-2017

22 views

Category:

Engineering


5 download

TRANSCRIPT

Page 1: Mncs 16-09-4주-변승규-introduction to the machine learning

Seung-gyu, [email protected]

Intelligence Networking & Computing Lab.Dept. of Electrical & Computer Eng.

Pusan National University

Seung-gyu BYEON

Introduction to the Machine Learning

1

Page 2: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Comput-ing Lab. 2

Contents

Spam FilteringHand Coded Pattern & ScoringSpamAssassinMachine Learning for Spam FilteringLinear ClassificationBayesian Text Classification

What is Machine LearningMain IngredientsExamples of ModelsTasks: the problems that can be solved with MLTasks: Looking for StructureEvaluating Performance on a TaskModels: Output of Machine Learning

Conclusion

Page 3: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab.

Spam Filtering

I. Hand Coded Pattern & ScoringII. SpamAssassin

III. Machine Learning for Spam FilteringIV. Linear Classification

V. Bayesian Text Classification

Page 4: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab.

Spam Filter-ing Hand Coded Pattern & Scoring

Spam e-mail FilteringHand coded-pattern matching such as using regular expressionsIs not Flexible

Spam AssassinCalculates a score for an e-mail based on a number of tests

The e-mail is reported as a spam if the score is 5 or moreThe weights for each of the tests are learned from a training set of e-mails labeled spam or ham

Page 5: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab. 5

Spam Filter-ing Spam Assassin

and Indicate the results of two testsex) Does the e-mail contain the word, Viagra? Yes 1The training set should contain both spams and hams with various and values

This ExampleWe can separate spam from ham by thresholding at 5

Page 6: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab. 6

Spam Filter-ing Machine Learning for Spam Filtering

Weights(and a threshold) are learned from training data

Text of each e-mailsIs converted into a data point by means of SpamAssassin’s built-in tests

Linear ClassifierIs applied to obtain a ‘spam or ham’ decision *

Learn Weights

Training Data

weights

Linear Clas-sifier

Data Spam?Spam As-sassin Tests

e-mails

Page 7: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab.

Spam Filtering Linear Classification

Separated by the straight line : vector perpendicular to the decision boundary : decision threshold : points on the decision boundary, ( if )

The Vector Equation of LineThe decision boundary can equivalently be represented by

It is the orientation but not the length of that determines the location of the decision boundary

w

x0x1x2

Page 8: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab. 8

Spam Filter-ing Linear Classification

Generalization: the most fundamental concept in machine learningToo good a performance on training data can lead to overfitting

(data can have noise)Good performance on new data is what we want

Expressive PowerWhat if e-mail 2 in the training set for SpamAssassin was spam?

Introduce a second decision rule ?Switching to a more expressive classifier is an option if there is enough training data available to reliably learn additional parameters

X 1?

Page 9: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab. 9

Spam Filter-ing Bayesian Text Classification

How do we learn not only the weights for the tests but also the tests them-selves?

We need to maintain potential indicators and collect statistics from a training setE.g., ‘Viagra’ or ‘free iPod’ are good spam indicators

The Basics of Probability

Definition of Conditional Probability

Product Rule gives an alternative formulation:

Chain Rule:

A and B are independent if and only if

Bayes’ rule:

The odds of an event are the ratio of the probability that the event happens and the probability that it does not

if p is the probability, the odds are o=p/(1-p), and conversely p=o/(1+o)

𝑃 ( 𝐴|𝐵 )= 𝑃 ( 𝐴∧𝐵 )𝑃 (𝐵 )

𝑖𝑓 𝑃 (𝐵 )≠0

𝑃 ( 𝐴∧𝐵 )=𝑃 ( 𝐴|𝐵 )𝑃 (𝐵 )=𝑃 (𝐵|𝐴 ) 𝑃 ( 𝐴 )𝑃 ( 𝐴 ,𝐵 ,𝐶 )=𝑃 ( 𝐴 ,𝐵 ) 𝑃 (𝐶|𝐴 ,𝐵 )=𝑃 ( 𝐴 ) 𝑃 (𝐵|𝐴 ) 𝑃 (𝐶|𝐴 ,𝐵 )

, , or equivalently

𝑃 ( 𝐴|𝐵 )= 𝑃 (𝐵|𝐴 )𝑃 ( 𝐴)𝑃 (𝐵 )

Page 10: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab. 10

Spam Filter-ing Bayesian Text Classification

Suppose we observe ‘Viagra’ four times more often in spams than in hams on av-erage

Likelihood ratio associated with Viagra is

One spam is received for every six hams on average

Prior odds are

By Bayes’ rule the posterior odds become

‘Viagra’ makes the probability of ham to drop from 6/7=0.86 to 6/10=0.6

𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎|𝑆𝑝𝑎𝑚 ) /𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎|𝐻𝑎𝑚)=4/1

𝑃 (𝑆𝑝𝑎𝑚 )/𝑃 (𝐻𝑎𝑚 )=1/6

𝑃 (𝑆𝑝𝑎𝑚|𝑉𝑖𝑎𝑔𝑟𝑎 )𝑃 (𝐻𝑎𝑚|𝑉𝑖𝑎𝑔𝑟𝑎 )

=𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎|𝑆𝑝𝑎𝑚 ) 𝑃 (𝑆𝑝𝑎𝑚 ) /𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎)𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎|𝐻𝑎𝑚) 𝑃 (𝐻𝑎𝑚) /𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎 )

¿𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎|𝑆𝑝𝑎𝑚 )𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎|𝐻𝑎𝑚 )

⋅ 𝑃 (𝑆𝑝𝑎𝑚 )𝑃 (𝐻𝑎𝑚)

=4116=4 /6

Page 11: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab. 11

Spam Filter-ing Bayesian Text Classification

Suppose the likelihood ratio associated with an additional (independent) evidence ‘blue pill’ is 3/1

The combined ratio is

The posterior odds become

The spam probability of 0.67, up from 0.4

DiscussionIndependent Assumption allows simple multiplication of odds without having to manipulate joint probabilitiesWe can include a large set of ‘features’ and let the classifier figure out which features are impor-tant,

And in what combinations

𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎|𝑆𝑝𝑎𝑚 )𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎|𝐻𝑎𝑚 )

⋅ 𝑃 (𝑏𝑙𝑢𝑒𝑝𝑖𝑙𝑙|𝑆𝑝𝑎𝑚 )𝑃 (𝑏𝑙𝑢𝑒𝑝𝑖𝑙𝑙|𝐻𝑎𝑚)

=𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎 ,𝑏𝑙𝑢𝑒𝑝𝑖𝑙𝑙|𝑆𝑝𝑎𝑚 )𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎 ,𝑏𝑙𝑢𝑒𝑝𝑖𝑙𝑙|𝐻𝑎𝑚)

=41⋅ 31=121

𝑃 (𝑆𝑝𝑎𝑚|𝑉𝑖𝑎𝑔𝑟𝑎 ,𝑏𝑙𝑢𝑒𝑝𝑖𝑙𝑙 )𝑃 (𝐻𝑎𝑚|𝑉𝑖𝑎𝑔𝑟𝑎 ,𝑏𝑙𝑢𝑒𝑝𝑖𝑙𝑙 )

=41⋅ 31⋅ 16=12

Page 12: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab.

What is Machine Learning

I. Main IngredientsII. Examples of Models

III. Tasks: the problems that can be solved with ML

IV. Tasks: Looking for StructureV. Evaluating Performance on a Task

VI. Models: Output of Machine Learning

12

Page 13: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab. 13

Task

Learning problem

What is ML Main Ingredients

Features: relevant object in our domain as data points

Task: abstract representation of a problem between domain objects and out-put

e.g., classifying them into two or more classes

Model: a mapping from data points to output,produced as the output of a machine learning algorithm applied to

training data

Learning Algorithm

Training Data

modelData Output

FeaturesDomain

objects

Page 14: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab. 14

What is ML Examples of models

SpamAssassin: a linear equation of the form : Boolean features indicating whether the -th test succeeded : feature weights learned from the training set : threshold for classification learned from the training set

Bayesian classifier: a decision rule of the form : the likelihood ratio associated with each word : the prior odds, estimated from the training set

Page 15: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab. 15

What is ML Tasks: the problems that can be solved with ML

Supervised Learning vs. Unsupervised LearningGiven a training set of N example input-output pairs

Where each is generated by an unknown function ,Search for a hypothesis such that

Tasks of supervised learning:Estimation of when is stochasticClassification when is one of a finite set of valuesRegression when is a number

Example: Curve fitting(regression)

Construct/adjust to agree with on training set is consistent if it agrees with on all examples generalizes well if it correctly predicts for test set

Occam’s razor: Prefer the simplest hypothesis consistent with dataMaximize a combination of consistency and simplicity (generality)

Page 16: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab. 16

What is ML Tasks: the problems that can be solved with ML

Binary and multi-class classification: categorical targetLearn a model representing class boundaries

Regression: numerical targetLearn a real-valued function that maps data to numeric values

Clustering: hidden targetGroup data without prior information on the groups but only by assessing the similarity between instancesLearn from unlabeled data Unsupervised Learning

Page 17: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab. 17

What is ML Tasks: Looking for Structure

Predictive vs. Descriptive ModelDescriptive model does not involve the target variable

Subgroup discovery identifies subsets of data that exhibit a class distribution significantly different from the overall

population

Predictive clustering clusters data to assign classes to new data

Page 18: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab. 18

What is ML Tasks: Looking for Structure

Example: Predictive clusteringThree bivariate Gaussians centered at , , The centroid can be given as the following 3-by-2 matrix A new data can be assigned to one of the three clusters

depending on its distances to the three centroids

Example: Descriptive clustering matrices represent a descriptive clustering (left) and a soft clustering (right)

: number of examples : number of clusters

Given a new data it is not easy to tell which cluster it should belong to

Page 19: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab. 19

What is ML Evaluating Performance on a Task

Test error (or test accuracy):Performance on the training data is misleadingNeed to separate test set to avoid overfitting

Dilemma: larger test set leaves smaller training set

K-fold cross-validation:Partition the data into equal foldsEach fold in turn is used for testing, and the remainder for trainingThe error rates are averaged (better estimate than a single score)

Unsupervised learning methods need to be evaluated differently

Page 20: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab. 20

What is ML Models: Output of Machine Learning

Distinction according to intuition:Geometric modelsProbabilistic ModelsLogical models

Characterization by modus operandi:Grouping models

↑ ↓

Grading models

Page 21: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab. 21

What is ML Models: Geometric models

Basic Linear ClassifierLet and be the sets of positive and negative examples, respectively

Since is on the decision boundary

Support Vector Machine (SVM)The decision boundary maximized the marginThe circled data points are the support vectors

Note: data are more likely to be linearly separableas the dimension gets higher due to sparsity

Page 22: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab. 22

What is ML Models: Geometric models

K-nearest neighbor classifierPredictions are locally made based on most similar instancesPopular similarity measures:

Euclidean distance: Manhattan distance:

Lazy method *

Page 23: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab. 23

What is ML Models: Probabilistic models

Joint probability distribution specifies probability of every atomic event

For any proposition , sum the atomic events where it is true:

Page 24: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab. 24

What is ML Models: Probabilistic models

For any proposition , sum the atomic events where it is true:

Can also compute conditional probabilities:

is not necessary if we consider the odds

0.108+0.012+0.016+0.064=0.20.108+0.012+0.016+0.064+0.072+0.008=0.28

0.016+0.0640.108+0.012+0.016+0.064=0.4

Page 25: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Computing Lab. 25

What is ML Models: Probabilistic models

Making DecisionsPrediction on can be made on the basis of the values of and the posterior distribution The decision can be made even if some values of are missing

Example: Missing ValuesWhat if we only noticed that a patient only suffers from toothache?

and

Slightly, less certain than the case in the

Page 26: Mncs 16-09-4주-변승규-introduction to the machine learning

Intelligence Networking and Comput-ing Lab. 26

Why Machine Learning?Intelligent Networking…The problem not solved or solved but incompletely in the past

may be solved with Machine Learning

Very High Entry BarrierWhat is this?How can we treat and apply it?To where?

Future WorkTreats Models and Features in Machine LearningsTo find such issues

Conclusion

Page 27: Mncs 16-09-4주-변승규-introduction to the machine learning

Seung-gyu, [email protected]

Intelligence Networking & Computing Lab.Dept. of Electrical & Computer Eng.

Pusan National UniversityIntelligence Networking and Comput-ing Lab.

I appreciate your deep interest