natural language processing with naive bayes

68
Natural Language Processing with Naïve Bayes Tim Ruffles @timruffles

Upload: tim-ruffles

Post on 13-Jun-2015

885 views

Category:

Documents


1 download

DESCRIPTION

A little talk I gave about NLP with naive bayes for classification. I used the ideas to build http://twedar.herokuapp.com, and a client-side classifier for Skimlinks

TRANSCRIPT

Page 1: Natural language processing with naive bayes

Natural Language Processing with Naïve

BayesTim Ruffles@timruffles

Page 2: Natural language processing with naive bayes

Overview

● Intro to Natural Language Processing● Intro to Bayes● Bayesian Maths● Bayes applied to Natural Language

Processing

Page 3: Natural language processing with naive bayes

NLP

(not like Derren Brown)

Page 4: Natural language processing with naive bayes

Processing text● Named entity recognition - Skimwords● Information retrieval - Google● Information extraction - IBM's Watson● Interpreting - sentiment, named entities● Classification - spam vs not spam● Speech to text - Siri

Page 5: Natural language processing with naive bayes

Named entity recognition

Page 6: Natural language processing with naive bayes

ClassificationFrom: Prime Minister of NigeriaSubject: Opportunity

Dear Sir,

My country vexes me; I wish to leave. Please give me your bank account information for instantaneous enrichment, no danger to you!

Yours in good faith and honour,Mr P. Minister

From: [email protected]: cats

lol this cat is really fat

http://reddit.com/r/fat-cats/roflolcoptor-fat-cat-dancing.gif

spam: 99% ham: 1%

spam: 1% ham: 99%

Page 7: Natural language processing with naive bayes

Example Task

Page 8: Natural language processing with naive bayes

Identify Product References

Page 9: Natural language processing with naive bayes

How do humans do this?

● Algorithms are far dumber than you● If you don't have enough info, an

algorithm will not help● Anyone can identify features required for

natural language processing

Page 10: Natural language processing with naive bayes

Features

The new cameras are the Canon PowerShot S100, the Nikon J1 and the Olympus PEN.

Page 11: Natural language processing with naive bayes

Types of features

● Word shape (capitalization, numbers etc)● Tag context - near a product● Dictionary/gazette - list of brands● Part of speech - verb, noun● n-Grams - products contain only one

brand

Page 12: Natural language processing with naive bayes

NLP process

Page 13: Natural language processing with naive bayes

The new cameras are Canon's PowerShot S100, the Nikon J1 and the Olympus PEN.

Page 14: Natural language processing with naive bayes

The new cameras are Canon's PowerShot S100, the Nikon J1 and the Olympus PEN.

Supervision

Page 15: Natural language processing with naive bayes

The new cameras are , theand the

Canon's PowerShot S100Nikon J1Olympus PEN.

Feature extraction

Page 16: Natural language processing with naive bayes

The new cameras are , theand the

Canon's PowerShot S100Nikon J1Olympus PEN.

Correlate features & tagscapital in middle of sentence: 0capital in middle of word: 0acronyms: 0words with numbers in them: 0

capital in middle of sentence: 7capital in middle of word: 1acronyms: 1words with numbers in them: 2

Page 17: Natural language processing with naive bayes

NLP OverviewSupervision with tagged data

Training up a model

Test model on test set

Model ready to use

Page 18: Natural language processing with naive bayes

Nuts and boltsSupervision - create a test set of labelled data

Normalisation and clean-up (Canon's -> Canon etc)

Feature extraction and training on training set

Validate on test set

Page 19: Natural language processing with naive bayes

How to use features/tags to tag products?

● Need to a method for using our correlated feature/tag sets to learn from and predict mathematically

● One such method is...

Page 20: Natural language processing with naive bayes

Naïve Bayes

Page 21: Natural language processing with naive bayes

When my information changes, I alter my conclusions.

What do you do, sir?

Keynes

Page 22: Natural language processing with naive bayes

Mathematically updating our beliefs on evidence

Page 23: Natural language processing with naive bayes

Bayes: local hero

Page 24: Natural language processing with naive bayes

Thomas Bayes

Page 25: Natural language processing with naive bayes
Page 26: Natural language processing with naive bayes

An Essay towards solving a Problem in the Doctrine of

Chances1763

Page 27: Natural language processing with naive bayes

Example applications

● Given a drug test result, how likely is it a person has taken drugs?

● Give these words, how likely is it that this email is spam?

● Given these words, how likely is it they refer to a product?

Page 28: Natural language processing with naive bayes

Estimate

● 99% accurate drug test● 1% of people actually take drugs

Given the above, what is the probability that someone indicated as drug positive by the test is a drug user?

Page 29: Natural language processing with naive bayes

Place your bets

Page 30: Natural language processing with naive bayes

50%

Page 31: Natural language processing with naive bayes

The Maths

Page 32: Natural language processing with naive bayes

A Little Notation

Probability

0 0.5 1

Impossible

You'd never bet on it happening

Likely as not - evens

Best odds you'd get would be 1/2

Certain

You'd never bet against

Page 33: Natural language processing with naive bayes

More notationP(spam)

Probability of spam

P(^spam)

Probability of not spam

P(spam|features)

Probability of spam given some features

Page 34: Natural language processing with naive bayes

A few rulesP(6,6) = P(6)P(6) = 1/6 x 1/6 = 1/36

Probability of rolling 6 twice

P(^6) = 1 - P(6) = 1 - 1/6 = 5/6

Probability of not rolling six is inverse of rolling a six

Page 35: Natural language processing with naive bayes

IndependenceP(A,B) = P(A)P(B)

Only applies if two events are independent.

Events are independent if the one having happened has no bearing on how likely it is the other will.

Page 36: Natural language processing with naive bayes

Dependence is informative

e.g: if someone is paler than normal, they could be sick

P(sick|pale) ≠ P(sick)

if someone fails a drug test, they could be a drug user

Page 37: Natural language processing with naive bayes
Page 38: Natural language processing with naive bayes

P(A|B)?What is the probability of A, given that B has happened?

Page 39: Natural language processing with naive bayes

Drugs test

● 99% accurate, 1% of people take drugs● Prior probability that someone is a drug

user: 1%● 1% chance of a false positive

Probability of something not happening is inverse of it happening.

Page 40: Natural language processing with naive bayes

Priors (pre information)

Prior: drug use

P(drug use) = 0.01 = 1/100 = 1%

Prior: false positive

P(false positive) = 0.01 = 1/100 = 1%

Page 41: Natural language processing with naive bayes

A drug test is asking

P(drug user | positive drug test)

Page 42: Natural language processing with naive bayes

Union of a signal and an event

P(drug user | positive drug test)

P(event | signal)

Page 43: Natural language processing with naive bayes

We can see an signal at least 2 ways

Can see a positive in 2 ways:

P(drug user , positive drug test)P(non user , positive drug test)

or a negative in two ways:

P(drug user , negative drug test)P(non user , negative drug test)

Page 44: Natural language processing with naive bayes

The theoremThe chance of an event given a signal is the ratio of:

the prior probability of the event multiplied by that of seeing the signal given the event

to

all the ways you could see that signal.

Page 45: Natural language processing with naive bayes
Page 46: Natural language processing with naive bayes
Page 47: Natural language processing with naive bayes

The calculation

P(drug user | positive drug test) =

P(drug user) x P(positive drug test | drug user)

P(positive drug test)

Page 48: Natural language processing with naive bayes

Estimate

● 99% accurate drug test● 1% of people take drugs

Given the above, what is the chance someone failing the drugs test is a drug user?

Page 49: Natural language processing with naive bayes

Place your bets

Page 50: Natural language processing with naive bayes

50%

Page 51: Natural language processing with naive bayes

The calculation

P(drug user | positive drug test) =

1/100 x 99/100

P(positive drug test)

Page 52: Natural language processing with naive bayes

P(B)?

Page 53: Natural language processing with naive bayes

P(B)

● All the ways you could see the signal

∑ P(event) x P(signal | event)

(∑ is sum of, ie add all the things)

Page 54: Natural language processing with naive bayes

P(B)

● In our case there are two possibilities - person is either a drug user or not - we already know the result of the test

P(user) x P(positive | user) P(A) x P(B|A) + +P(clean) x P(positive | clean) P(^A) x P(B|^A)

Page 55: Natural language processing with naive bayes

The calculation

P(drug user | positive drug test) =

1/100 x 99/100

(1/100 x 99/100) + (99/100 x 1/100)

Page 56: Natural language processing with naive bayes

The calculation

P(drug user | positive drug test) =

1 x 99

(1 x 99) + (99 x 1)

Page 57: Natural language processing with naive bayes

The calculation

P(drug user | positive drug test) =

99

99(1 + 1)

Page 58: Natural language processing with naive bayes

The calculation

P(drug user | positive drug test) =

1

2

Page 59: Natural language processing with naive bayes

Maths applied to NLP

Page 60: Natural language processing with naive bayes

Building a spam filter

● Using what we know about Bayes, we're going to build an NLP spam filter

● We'll use n-grams as our features - the number of times we have seen each word

● 1-gram is each word, 2-grams are pairs of words: 2-grams are more accurate but more complex

Page 61: Natural language processing with naive bayes

1-gramsDear Sir,

Give me your bank account. I will transfer money from my bank account to your bank account.

Yours in good faith and honour,Mr P. Minister

Hi,

Lovely to see you last night. I'll pay you back for the film - just give me your bank account details.

Cheers,

Sally x

bank 3account 3your 2from 1give 1dear 1sir 1i 1will 1transfer 1money 1me 1my 1to 1

you 2the 1to 1see 1hi 1last 1night 1ill 1pay 1back 1for 1me 1your 1bank 1account 1details 1

30 words total 28 words total

Page 62: Natural language processing with naive bayes

1-gramsbank 3 / 30 = 1/10 words is bank for spam

Give me your bank account. I will transfer money from my bank account to your bank account.

P(bank,bank,bank|spam)

P(bank) * P(bank) * P(bank) = 1/1000

P(bank,bank,bank|ham)

P(bank) * P(bank) * P(bank) = 1/21,952

Page 63: Natural language processing with naive bayes

Smoothing 1-grams

24 unique words

Count each word as

(count(word) + smooth)

countWords + (smooth*uniqueWords)

Laplacian smooth - take a bit of probability away from each of our words to give to words we've not seen before.

Page 64: Natural language processing with naive bayes

Smoothing 1-grams

(count(word) + smooth)

countWords + (smooth*uniqueWords)

P(bank) = 1+0.1 / 28 + (0.1 * 24)

P(bank) = 1.1 / 30.4

P(sesquipedalian) = 0.1 / 30.4

Page 65: Natural language processing with naive bayes

Applied smoothing

P(lovely,film|spam)P(lovely) = 0 + 0.1 / 30 + (24 * 0.1)P(lovely) = 0.1 / 32.4

P(lovely) * P(film) = (0.1 / 32.4)2

P(lovely,film|ham)

P(lovely,film|ham)

P(lovely) * P(film) = (1.1 / 30.4)2

(0.1 / 32.4)2 < (1.1 / 30.4)2

Page 66: Natural language processing with naive bayes

Smoothed n-grams with bayes

P(A|B) = P(A)P(B|A) / P(B)

P(spam|words) = P(spam)P(words|spam) / P(words)

P(words) = P(spam)P(words|spam) + P(ham)P(words|ham)

We'll take product of all of the word probabilities as P(words) for both spam and ham, and choose whichever has the highest P(tag|words).

Page 67: Natural language processing with naive bayes

Applied smoothing

Email two (ham):

P(bank,bank,bank|spam) 8.75e-04 0.39 39%

P(bank,bank,bank|ham) 4.73e-05 0.02 2%

P(lovely,film|spam) 9.52e-06 0.004 0.4%

P(lovely,film|ham) 1.3e-03 0.58 58%

Email one (spam):

Priors: ham(0.5) spam(0.5)

P(spam)P(words|spam) / P(words)

P(ham)P(words|ham) / P(words)

Page 68: Natural language processing with naive bayes

Summary

● NLP uses features of language to statistically classify, interpret or generate language.

● Bayes rule is a mathematical method for updating your beliefs on evidence

● P(event|signal) = P(event)P(signal|event) / P(signal)

● Using smoothed n-grams is a dumb but simple spam filter

● Naïve Bayes shouldn't work: but does