mncs 16-09-4주-변승규-introduction to the machine learning

Post on 14-Apr-2017

22 Views

Category:

Engineering

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Seung-gyu, BYEONhrfeel@mobile.re.kr

Intelligence Networking & Computing Lab.Dept. of Electrical & Computer Eng.

Pusan National University

Seung-gyu BYEON

Introduction to the Machine Learning

1

Intelligence Networking and Comput-ing Lab. 2

Contents

Spam FilteringHand Coded Pattern & ScoringSpamAssassinMachine Learning for Spam FilteringLinear ClassificationBayesian Text Classification

What is Machine LearningMain IngredientsExamples of ModelsTasks: the problems that can be solved with MLTasks: Looking for StructureEvaluating Performance on a TaskModels: Output of Machine Learning

Conclusion

Intelligence Networking and Computing Lab.

Spam Filtering

I. Hand Coded Pattern & ScoringII. SpamAssassin

III. Machine Learning for Spam FilteringIV. Linear Classification

V. Bayesian Text Classification

Intelligence Networking and Computing Lab.

Spam Filter-ing Hand Coded Pattern & Scoring

Spam e-mail FilteringHand coded-pattern matching such as using regular expressionsIs not Flexible

Spam AssassinCalculates a score for an e-mail based on a number of tests

The e-mail is reported as a spam if the score is 5 or moreThe weights for each of the tests are learned from a training set of e-mails labeled spam or ham

Intelligence Networking and Computing Lab. 5

Spam Filter-ing Spam Assassin

and Indicate the results of two testsex) Does the e-mail contain the word, Viagra? Yes 1The training set should contain both spams and hams with various and values

This ExampleWe can separate spam from ham by thresholding at 5

Intelligence Networking and Computing Lab. 6

Spam Filter-ing Machine Learning for Spam Filtering

Weights(and a threshold) are learned from training data

Text of each e-mailsIs converted into a data point by means of SpamAssassin’s built-in tests

Linear ClassifierIs applied to obtain a ‘spam or ham’ decision *

Learn Weights

Training Data

weights

Linear Clas-sifier

Data Spam?Spam As-sassin Tests

e-mails

Intelligence Networking and Computing Lab.

Spam Filtering Linear Classification

Separated by the straight line : vector perpendicular to the decision boundary : decision threshold : points on the decision boundary, ( if )

The Vector Equation of LineThe decision boundary can equivalently be represented by

It is the orientation but not the length of that determines the location of the decision boundary

w

x0x1x2

Intelligence Networking and Computing Lab. 8

Spam Filter-ing Linear Classification

Generalization: the most fundamental concept in machine learningToo good a performance on training data can lead to overfitting

(data can have noise)Good performance on new data is what we want

Expressive PowerWhat if e-mail 2 in the training set for SpamAssassin was spam?

Introduce a second decision rule ?Switching to a more expressive classifier is an option if there is enough training data available to reliably learn additional parameters

X 1?

Intelligence Networking and Computing Lab. 9

Spam Filter-ing Bayesian Text Classification

How do we learn not only the weights for the tests but also the tests them-selves?

We need to maintain potential indicators and collect statistics from a training setE.g., ‘Viagra’ or ‘free iPod’ are good spam indicators

The Basics of Probability

Definition of Conditional Probability

Product Rule gives an alternative formulation:

Chain Rule:

A and B are independent if and only if

Bayes’ rule:

The odds of an event are the ratio of the probability that the event happens and the probability that it does not

if p is the probability, the odds are o=p/(1-p), and conversely p=o/(1+o)

𝑃 ( 𝐴|𝐵 )= 𝑃 ( 𝐴∧𝐵 )𝑃 (𝐵 )

𝑖𝑓 𝑃 (𝐵 )≠0

𝑃 ( 𝐴∧𝐵 )=𝑃 ( 𝐴|𝐵 )𝑃 (𝐵 )=𝑃 (𝐵|𝐴 ) 𝑃 ( 𝐴 )𝑃 ( 𝐴 ,𝐵 ,𝐶 )=𝑃 ( 𝐴 ,𝐵 ) 𝑃 (𝐶|𝐴 ,𝐵 )=𝑃 ( 𝐴 ) 𝑃 (𝐵|𝐴 ) 𝑃 (𝐶|𝐴 ,𝐵 )

, , or equivalently

𝑃 ( 𝐴|𝐵 )= 𝑃 (𝐵|𝐴 )𝑃 ( 𝐴)𝑃 (𝐵 )

Intelligence Networking and Computing Lab. 10

Spam Filter-ing Bayesian Text Classification

Suppose we observe ‘Viagra’ four times more often in spams than in hams on av-erage

Likelihood ratio associated with Viagra is

One spam is received for every six hams on average

Prior odds are

By Bayes’ rule the posterior odds become

‘Viagra’ makes the probability of ham to drop from 6/7=0.86 to 6/10=0.6

𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎|𝑆𝑝𝑎𝑚 ) /𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎|𝐻𝑎𝑚)=4/1

𝑃 (𝑆𝑝𝑎𝑚 )/𝑃 (𝐻𝑎𝑚 )=1/6

𝑃 (𝑆𝑝𝑎𝑚|𝑉𝑖𝑎𝑔𝑟𝑎 )𝑃 (𝐻𝑎𝑚|𝑉𝑖𝑎𝑔𝑟𝑎 )

=𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎|𝑆𝑝𝑎𝑚 ) 𝑃 (𝑆𝑝𝑎𝑚 ) /𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎)𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎|𝐻𝑎𝑚) 𝑃 (𝐻𝑎𝑚) /𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎 )

¿𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎|𝑆𝑝𝑎𝑚 )𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎|𝐻𝑎𝑚 )

⋅ 𝑃 (𝑆𝑝𝑎𝑚 )𝑃 (𝐻𝑎𝑚)

=4116=4 /6

Intelligence Networking and Computing Lab. 11

Spam Filter-ing Bayesian Text Classification

Suppose the likelihood ratio associated with an additional (independent) evidence ‘blue pill’ is 3/1

The combined ratio is

The posterior odds become

The spam probability of 0.67, up from 0.4

DiscussionIndependent Assumption allows simple multiplication of odds without having to manipulate joint probabilitiesWe can include a large set of ‘features’ and let the classifier figure out which features are impor-tant,

And in what combinations

𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎|𝑆𝑝𝑎𝑚 )𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎|𝐻𝑎𝑚 )

⋅ 𝑃 (𝑏𝑙𝑢𝑒𝑝𝑖𝑙𝑙|𝑆𝑝𝑎𝑚 )𝑃 (𝑏𝑙𝑢𝑒𝑝𝑖𝑙𝑙|𝐻𝑎𝑚)

=𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎 ,𝑏𝑙𝑢𝑒𝑝𝑖𝑙𝑙|𝑆𝑝𝑎𝑚 )𝑃 (𝑉𝑖𝑎𝑔𝑟𝑎 ,𝑏𝑙𝑢𝑒𝑝𝑖𝑙𝑙|𝐻𝑎𝑚)

=41⋅ 31=121

𝑃 (𝑆𝑝𝑎𝑚|𝑉𝑖𝑎𝑔𝑟𝑎 ,𝑏𝑙𝑢𝑒𝑝𝑖𝑙𝑙 )𝑃 (𝐻𝑎𝑚|𝑉𝑖𝑎𝑔𝑟𝑎 ,𝑏𝑙𝑢𝑒𝑝𝑖𝑙𝑙 )

=41⋅ 31⋅ 16=12

Intelligence Networking and Computing Lab.

What is Machine Learning

I. Main IngredientsII. Examples of Models

III. Tasks: the problems that can be solved with ML

IV. Tasks: Looking for StructureV. Evaluating Performance on a Task

VI. Models: Output of Machine Learning

12

Intelligence Networking and Computing Lab. 13

Task

Learning problem

What is ML Main Ingredients

Features: relevant object in our domain as data points

Task: abstract representation of a problem between domain objects and out-put

e.g., classifying them into two or more classes

Model: a mapping from data points to output,produced as the output of a machine learning algorithm applied to

training data

Learning Algorithm

Training Data

modelData Output

FeaturesDomain

objects

Intelligence Networking and Computing Lab. 14

What is ML Examples of models

SpamAssassin: a linear equation of the form : Boolean features indicating whether the -th test succeeded : feature weights learned from the training set : threshold for classification learned from the training set

Bayesian classifier: a decision rule of the form : the likelihood ratio associated with each word : the prior odds, estimated from the training set

Intelligence Networking and Computing Lab. 15

What is ML Tasks: the problems that can be solved with ML

Supervised Learning vs. Unsupervised LearningGiven a training set of N example input-output pairs

Where each is generated by an unknown function ,Search for a hypothesis such that

Tasks of supervised learning:Estimation of when is stochasticClassification when is one of a finite set of valuesRegression when is a number

Example: Curve fitting(regression)

Construct/adjust to agree with on training set is consistent if it agrees with on all examples generalizes well if it correctly predicts for test set

Occam’s razor: Prefer the simplest hypothesis consistent with dataMaximize a combination of consistency and simplicity (generality)

Intelligence Networking and Computing Lab. 16

What is ML Tasks: the problems that can be solved with ML

Binary and multi-class classification: categorical targetLearn a model representing class boundaries

Regression: numerical targetLearn a real-valued function that maps data to numeric values

Clustering: hidden targetGroup data without prior information on the groups but only by assessing the similarity between instancesLearn from unlabeled data Unsupervised Learning

Intelligence Networking and Computing Lab. 17

What is ML Tasks: Looking for Structure

Predictive vs. Descriptive ModelDescriptive model does not involve the target variable

Subgroup discovery identifies subsets of data that exhibit a class distribution significantly different from the overall

population

Predictive clustering clusters data to assign classes to new data

Intelligence Networking and Computing Lab. 18

What is ML Tasks: Looking for Structure

Example: Predictive clusteringThree bivariate Gaussians centered at , , The centroid can be given as the following 3-by-2 matrix A new data can be assigned to one of the three clusters

depending on its distances to the three centroids

Example: Descriptive clustering matrices represent a descriptive clustering (left) and a soft clustering (right)

: number of examples : number of clusters

Given a new data it is not easy to tell which cluster it should belong to

Intelligence Networking and Computing Lab. 19

What is ML Evaluating Performance on a Task

Test error (or test accuracy):Performance on the training data is misleadingNeed to separate test set to avoid overfitting

Dilemma: larger test set leaves smaller training set

K-fold cross-validation:Partition the data into equal foldsEach fold in turn is used for testing, and the remainder for trainingThe error rates are averaged (better estimate than a single score)

Unsupervised learning methods need to be evaluated differently

Intelligence Networking and Computing Lab. 20

What is ML Models: Output of Machine Learning

Distinction according to intuition:Geometric modelsProbabilistic ModelsLogical models

Characterization by modus operandi:Grouping models

↑ ↓

Grading models

Intelligence Networking and Computing Lab. 21

What is ML Models: Geometric models

Basic Linear ClassifierLet and be the sets of positive and negative examples, respectively

Since is on the decision boundary

Support Vector Machine (SVM)The decision boundary maximized the marginThe circled data points are the support vectors

Note: data are more likely to be linearly separableas the dimension gets higher due to sparsity

Intelligence Networking and Computing Lab. 22

What is ML Models: Geometric models

K-nearest neighbor classifierPredictions are locally made based on most similar instancesPopular similarity measures:

Euclidean distance: Manhattan distance:

Lazy method *

Intelligence Networking and Computing Lab. 23

What is ML Models: Probabilistic models

Joint probability distribution specifies probability of every atomic event

For any proposition , sum the atomic events where it is true:

Intelligence Networking and Computing Lab. 24

What is ML Models: Probabilistic models

For any proposition , sum the atomic events where it is true:

Can also compute conditional probabilities:

is not necessary if we consider the odds

0.108+0.012+0.016+0.064=0.20.108+0.012+0.016+0.064+0.072+0.008=0.28

0.016+0.0640.108+0.012+0.016+0.064=0.4

Intelligence Networking and Computing Lab. 25

What is ML Models: Probabilistic models

Making DecisionsPrediction on can be made on the basis of the values of and the posterior distribution The decision can be made even if some values of are missing

Example: Missing ValuesWhat if we only noticed that a patient only suffers from toothache?

and

Slightly, less certain than the case in the

Intelligence Networking and Comput-ing Lab. 26

Why Machine Learning?Intelligent Networking…The problem not solved or solved but incompletely in the past

may be solved with Machine Learning

Very High Entry BarrierWhat is this?How can we treat and apply it?To where?

Future WorkTreats Models and Features in Machine LearningsTo find such issues

Conclusion

Seung-gyu, BYEONhrfeel@mobile.re.kr

Intelligence Networking & Computing Lab.Dept. of Electrical & Computer Eng.

Pusan National UniversityIntelligence Networking and Comput-ing Lab.

I appreciate your deep interest

top related