6. n-grams 부산대학교 인공지능연구실 최성자. 2 word prediction “i’d like to make...

44
6. N-GRAMs 부부부부부 부부부부부부부 부부부

Upload: merry-henderson

Post on 28-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

6. N-GRAMs

부산대학교 인공지능연구실최성자

Page 2: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

2

Word prediction

“I’d like to make a collect …”Call, telephone, or person-to-person

- Spelling error detection

- Augmentative communication

- Context-sensitive spelling error correction

Page 3: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

3

Language Model

Language Model (LM)– statistical model of word sequences

n-gram: Use the previous n -1 words to predict the next word

Page 4: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

4

Applications

context-sensitive spelling error detection and correction“He is trying to fine out.”

“The design an construction will take a year.”

machine translation

Page 5: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

5

Counting Words in Corpora

Corpora (on-line text collections) Which words to count

– What we are going to count– Where we are going to find the things to count

Page 6: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

6

Brown Corpus

1 million words 500 texts Varied genres (newspaper, novels, non-

fiction, academic, etc.) Assembled at Brown University in 1963-64 The first large on-line text collection used

in corpus-based NLP research

Page 7: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

7

Issues in Word Counting

Punctuation symbols (. , ? !) Capitalization (“He” vs. “he”, “Bush” vs. “b

ush”) Inflected forms (“cat” vs. “cats”)

– Wordform: cat, cats, eat, eats, ate, eating, eaten– Lemma (Stem): cat, eat

Page 8: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

8

Types vs. Tokens

Tokens (N): Total number of running words Types (B): Number of distinct words in a corpus (si

ze of the vocabulary)

Example: “They picnicked by the pool, then lay back on the gra

ss and looked at the stars.”–16 word tokens, 14 word types (not counting punctuation)

“※ Types” will mean wordform types and not lemma type, and punctuation marks will generally be counted as word

Page 9: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

9

How Many Words in English?

Shakespeare’s complete works– 884,647 wordform tokens– 29,066 wordform types

Brown Corpus– 1 million wordform tokens– 61,805 wordform types– 37,851 lemma types

Page 10: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

10

Simple (Unsmoothed) N-grams

Task: Estimating the probability of a word First attempt:

– Suppose there is no corpus available

– Use uniform distribution

– Assume:• word types = V (e.g., 100,000)

Page 11: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

11

Simple (Unsmoothed) N-grams

Task: Estimating the probability of a word Second attempt:

– Suppose there is a corpus

– Assume:• word tokens = N

• # times w appears in corpus = C(w)

Page 12: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

12

Simple (Unsmoothed) N-grams

Task: Estimating the probability of a word Third attempt:

– Suppose there is a corpus

– Assume a word depends on its n –1 previous words

Page 13: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

13

Simple (Unsmoothed) N-grams

Page 14: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

14

Simple (Unsmoothed) N-grams

n-gram approximation:– Wk only depends on its previous n–1words

Page 15: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

15

Bigram Approximation

Example:P(I want to eat British food)

= P(I|<s>) P(want|I) P(to|want) P(eat|to) P(British|eat) P(food|British)

<s>: a special word meaning “start of sentence”

Page 16: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

16

Note on Practical Problem

Multiplying many probabilities results in a very small number and can cause numerical underflow

Use logprob instead in the actual computation

Page 17: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

17

Estimating N-gram Probability

Maximum Likelihood Estimate (MLE)

Page 18: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

18

Page 19: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

19

Estimating Bigram Probability

Example:– C(to eat) = 860

– C(to) = 3256

Page 20: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

20

Page 21: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

21

Two Important facts

The increasing accuracy of N-gram models as we increse the value of N

Very strong dependency on their training corpus (in particular its genre and its size in words)

Page 22: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

22

Smoothing

Any particular training corpus is finite Sparse data problem Deal with zero probability

Page 23: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

23

Smoothing

Smoothing– Reevaluating zero probability n-grams and

assigning them non-zero probability

Also called Discounting– Lowering non-zero n-gram counts in order to

assign some probability mass to the zero n-grams

Page 24: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

24

Add-One Smoothing for Bigram

Page 25: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

25

Page 26: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

26

Page 27: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

27

Things Seen Once

Use the count of things seen once to help estimate the count of things never seen

Page 28: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

28

Witten-Bell Discounting

Page 29: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

29

Witten-Bell Discounting for Bigram

Page 30: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

30

Witten-Bell Discounting for Bigram

Page 31: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

31

Seen count Unseen count

Page 32: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

32

Page 33: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

33

Good-Turing Discounting for Bigram

Page 34: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

34

Page 35: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

35

Backoff

Page 36: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

36

Backoff

Page 37: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

37

Entropy

Measure of uncertainty Used to evaluate quality of n-gram models (how

well a language model matches a given language) Entropy H(X) of a random variable X:

Measured in bits Number of bits to encode information in the optimal

coding scheme

Page 38: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

38

Example 1

Page 39: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

39

Example 2

Page 40: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

40

Perplexity

Page 41: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

41

Entropy of a Sequence

Page 42: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

42

Entropy of a Language

Page 43: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

43

Cross Entropy

Used for comparing two language models p: Actual probability distribution that generated

some data m: A model of p (approximation to p) Cross entropy of m on p:

Page 44: 6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection

44

Cross Entropy

By Shannon-McMillan-Breimantheorem:

Property of cross entropy:

Difference between H(p,m) and H(p) is a measure of how accurate model m is

The more accurate a model, the lower its cross-entropy