naive bayes classifiers - wilkes university
Post on 29-Nov-2021
8 Views
Preview:
TRANSCRIPT
kNN(k-Nearest Neighbor)
kNN (Instance Based Classifier)•Uses k “closest” points (nearest neighbors)
•Requires a similarity (distance) metric
• Similarity between items a and b:• C, common (shared) features
• A, features unique in a
• B, features unique in bS = θC - αA - βBwhere θ,α,β ≥ 0
One of these things…
kNN Classifiers• Requirements
– The set of stored records
– The distance metric
– The value of k
• Classification Algorithm– Compute distance from item to other stored records
– Identify its k nearest neighbors
– Use their class labels to determine the unknown class label
– May weigh the vote according to distance, e.g. use w = 1/d2
c c
c
c
cc
c
cc
dd
d
dd
d
?
d
(next: video clip examining parameters on website)
Reducing Complexity• Decrease training set size
• Help distance metric• Apply PCA to reduce features• Neighborhood Component Analysis
• Precompute distances• Nearest Neighbor Transformer
• Change search strategy
Naive Bayes Classifiers
Naive Bayes is
supervised learning algorithm
classification algorithm
probabilistic classifier: based on Bayes’ theorem of probability
Bayes’ Theorem
Let A and B be events. Then
P(A|B) =P(B|A)P(A)
P(B)
where
P(A) and P(B) are probabilities of observing events A and B,
respectively
P(A|B) is a conditional probability: the likelihood of event A
occurring given that B is true
P(B|A) is also a conditional probability: the likelihood of event
B occurring given that A is true
Naive Bayes
Let a dataset have two classes - for instance: {cats, dogs}. Every data
point has a set of features (variables). Then
P(class|feature set) =P(feature set|class)P(class)
P(feature set)
where
P(class|feature set) is called the posteriori: probability of classifying a
cat (an image of a cat), given a set of features observed in cats.
P(class) is called the prior: (unscaled) probability that a randomly
chosen observation is a cat.
P(feature set|class) is called the scaler: it scales up or down the prior
given this specific set of features (also called the likelihood).
P(feature set) is called the normalizer (evidence): probability of what
we are observing (the set of features) in our dataset.
Naive Bayes
“Naive” = assumption that all the features in data are
independent of one another! (This strong assumption rarely
holds in the real world though.)
The method is simple and computationally fast!
Example
Suppose we have 60 cats and 40 dogs in our dataset. Each data point
is a vector of n features.
Given particular values for the first two features (feature 1 and feature
2), what is the probability of a data point being a cat or a dog?
Feature Values Cats Dogs
Total 60 40
feature 1 50 5/6 5 1/8
feature 2 45 3/4 10 1/4
both features 40 15/24 5/4 1/32
P(Cat|both features) =40
40 + 5/4= 97%
Naive Bayes in Scikit-Learn
References
“A Comparison of Event Models for Naive Bayes Text
Classification” by Andrew McCallum and Kamal Nigam
“Spam Filtering with Naive Bayes – Which Naive Bayes?” by
Vangelis Metsis, et al.
“Pattern Recognition and Machine Learning” by Christopher
Bishop
“Image Classification Using Naive Bayes Classifier” by
Dong-Chul Park
Naive Bayes in Python (Scikit Learn):
https://scikit-learn.org/stable/modules/
naive_bayes.html
ANN(brief discussion)
Live Session IV(pause video here)
top related