day 4 classic ot although we’ve seen most of the ingredients of ot, there’s one more big thing...

Day 4 Classic OT

Although we’ve seen most of the ingredients of OT, there’s one more big thing you need to know to be able to read OT papers and listen to OT talks

Constraints interact through strict ranking instead of through weighting

Analogy: alphabetical order

Constraints– HaveEarly1stLetter– HaveEarly2ndLetter– HaveEarly3rdLetter– HaveEarly4thLetter– HaveEarly5thLetter– ...

Harmonic grammar

Cabana wins because it does much better on less-important constraints

1st

w=5

2nd

w=4

3rd

w=3

4th

w=2

5th

w=1

harm.

banana -1 -13 -13 -57

azalea -25 -11 -4 -126

azote -25 -14 -19 -4 -184

cabana -2 -1 -13 -26

Classic Optimality Theory

Strict ranking: all the candidates that aren’t the best on the top constraint are eliminated

– “!” means “eliminated here”– Shading on rest of row indicates it doesn’t matter how well

or poorly the candidate does on subsequent constraints

1st 2nd 3rd 4th 5th

banana 1! 13 13

azalea 25 11 4

azote 25 14! 19 4

cabana 2! 1 13

Classic Optimality Theory

Repeat the elimination for subsequent constraints Here, the two remaining candidates tie (both are the

best), so we move to the next constraint Winner(s) = the candidates that remain

1st 2nd 3rd 4th 5th

banana 1! 13 13

azalea 25 11 4

azote 25 14! 19 4

cabana 2! 1 13

Example tableaux: find the winner

Constraint1 C2 C3 C4

a. * *

b. * *

c. * *


C1 C2 C3 C4

a. ** *

b. * *

c. * *


C1 C2 C3 C4

a. *

b. * ***

c. * *


C1 C2 C3 C4

a. ** *

b. ** * *

c. ***

“Harmonically bounded” candidates

A fancy term for candidates that can’t win under any ranking Simple harmonic bounding: What can’t (c) win under any

ranking?

C2 C3 C4

a. * *

b. * *

c. ** *

“Harmonically bounded” candidates

Joint harmonic bounding: What can’t (c) win under any ranking?

C1 C2

a. **

b. **

c. * *

Why this matters for variation

“Multi-site” variation: more than one place in word that can vary

Which candidates can win under some ranking?

/akitamiso/ *i Max-V

a. [akitamiso] **

b. [aktamiso] * *

c. [akitamso] * *

d. [aktamso] **

/akitamiso/ Max-V *i

a. [akitamiso] **

b. [aktamiso] * *

c. [akitamso] * *

d. [aktamso] **

Why this matters for variation

Even if the ranking is allowed to vary, candidates like (b) and (c) can never occur


a. [akitamiso] **

b. [aktamiso] * *

c. [akitamso] * *

d. [aktamso] **

/akitamiso/ Max-V *i

a. [akitamiso] **

b. [aktamiso] * *

c. [akitamso] * *

d. [aktamso] **

How about in MaxEnt?

Can (b) and (c) ever occur?


a. [akitamiso] **

b. [aktamiso] * *

c. [akitamso] * *

d. [aktamso] **

How about in Noisy Harmonic Grammar?

Suppose the two constraints have the same weight

/akitamiso/ *i

w=1

Max-V

w=1

a. [akitamiso] **

b. [aktamiso] * *

c. [akitamso] * *

d. [aktamso] **

Special case in Noisy HG

/apataka/ *aCa

w=a

Ident(lo)

w=b

harmony wins (or ties) if

a. [apataka] *** -3a a < ½ b

b. [epataka] ** * -2a-b --

c. [apetaka] * * -a-b a < b < 2a

d. [apateka] * * -a-b a < b < 2a

e. [apatake] ** * -2a-b --

f. [epateka] ** -2b b < a

g. [epatake] * ** -a-2b --

d. [apetake] ** -2b b < a

Summary for harmonic bounding

In OT, harmonically bounded candidates can never win under any ranking

– means that applying a change to one part of a word but not another is impossible

In MaxEnt, all candidates have some probability of winning.

In Noisy HG, harmonically bounded candidates can win only in special cases.

See Jesney 2007 for a nice discussion of harmonic bounding in weighted models.

Is it good or bad that (b) and (c) can’t win in OT?

In my opinion, probably bad, because there are several cases where candidates like (b) and (c) do win...


a. [akitamiso] **

b. [aktamiso] * *

c. [akitamso] * *

d. [aktamso] **

French optional schwa deletion

There’s a long literature on this. See Riggle & Wilson 2005, Kaplan 2011 Kimper 2011 for references.

La queue de ce renard no deletion La queue d’ ce renard some deletion La queue de c’ renard some deletion La queue de ce r’nard some deletion La queue d’ ce r’nard as much deletion as

possible, without violating *CCC

Pima plural marking

Munro & Riggle 2004, Uto-Aztecan language of Mexico, about 650 speakers [Lewis 2009].

Infixing reduplication marks plural. In compounds, any combination of members can

reduplicate, as long as at least one does:Singular: [ʔus-kàlit-váinom], lit. tree-car-knife ‘wagon-knife’Plural options:

ʔuʔus-kàklit-vápainom ‘wagon-knives’ʔuʔus-kàklit-váinomʔuʔus-kàlit-vápainomʔus-kàklit-vápainomʔuʔus-kàlit-váinomʔus-kàklit-váinomʔus-kàlit-vápainom

Simplest theory of variation in OT: Anttila’s partial ranking (Anttila 1997)

Some constraints’ rankings are fixed; others vary I’m using the red line here to indicate varying ranking

/θɪk/ Max-C Ident(place) *θ Ident(cont) *Dental

a [θɪk] * *

b [tT ɪk] * *

c [ɪk] *!

d [sɪk] *!

Anttilan partial ranking

Max-C

Ident(place)

*θ Ident(continuant)

*Dental

Linearization

In order to generate a form, the constraints have to be put into a linear order

Each linear order consistent with the grammar’s partial order is equally probable

grammar linearization 1 (50%) lineariztn 2 (50%)Max-C Max-C Max-C

Ident(place) Ident(place)Id(place) *θ Ident(cont)

Ident(cont) *θ*θ Id(cont) *Dental *Dental

*Dental [tT ɪk] [θɪk]

Properties of this theory

No learning algorithm, unfortunately Makes strong predictions about variation

numbers:– If there are 2 constraints, what are the possible

Anttilan grammars?– What variation pattern does each one predict?

Finnish example (Anttila 1997)

The genitive suffix has two forms– “strong”: -iden/-iten (with additional changes)– “weak”: -(j)en (data from p. 3)

Factors affecting variation

Anttila shows that choice is governed by...– avoiding sequence of heavies or lights (*HH, *LL)– avoiding high vowels in heavy syllables (*H/I) or low

vowels in light syllables (*L/A)

Anttila’s grammar (p. 21)

(Without going through the whole analysis)

Sample of the results (p. 23)

Day 4 summary

We’ve seen Classic OT, and a simple way to capture variation in that theory

But there’s no learning algorithm available for this theory, so its usefulness is limited

Also, predictions may be too restrictive– E.g. if there are 2 constraints, the candidates

must be distributed 100%-0%, 50%-50%, or 0%-100%

Next time (our final day)

A theory of variation in OT that permits finer-grained predictions, and has a learning algorithm

Ways to deal with lexical variation

Day 4 references

Anttila, A. (1997). Deriving variation from grammar. In F. Hinskens, R. van Hout, & W. L. Wetzels (Eds.), Variation, Change, and Phonological Theory (pp. 35–68). Amsterdam: John Benjamins.

Jesney, K. (2007). The locus of variation in weighted constraint grammars. In Workshop on Variatin, Gradience and Frequency in Phonology. Presented at the Workshop on Variatin, Gradience and Frequency in Phonology, Stanford University.

Kaplan, A. F. (2011). Variation Through Markedness Suppression. Phonology, 28(03), 331–370. doi:10.1017/S0952675711000200

Kimper, W. A. (2011). Locality and globality in phonological variation. Natural Language & Linguistic Theory, 29(2), 423–465. doi:10.1007/s11049-011-9129-1

Lewis, M. P. (Ed.). (2009). Ethnologue: languages of the world (16th ed.). Dallas, TX: SIL International.

Munro, P., & Riggle, J. (2004). Productivity and lexicalization in Pima compounds. In Proceedings of BLS.

Riggle, J., & Wilson, C. (2005). Local optionality. In L. Bateman & C. Ussery (Eds.), NELS 35.

Day 5: Before we start

Last time I promised to show you numbers for multi-site variation in MaxEnt

If weights are equal:/akitamiso/ *i

w= 1

Max-V

w = 1

harmony prob.

a. [akitamiso] ** e-2 0.25

b. [aktamiso] * * e-2 0.25

c. [akitamso] * * e-2 0.25

d. [aktamso] ** e-2 0.25

Day 5: Before we start

As weights move apart, “compromise” candidates remain more frequent than no-deletion candidate

/akitamiso/ *i

w= 1

Max-V

w = 2

harmony prob.

a. [akitamiso] ** e-2 = 0.14 0.57

b. [aktamiso] * * e-3 = 0.05 0.21

c. [akitamso] * * e-3 = 0.05 0.21

d. [aktamso] ** e-6 = 0.002 0.01

sum = 0.24

Stochastic OT

Today we’ll see a richer model of variation in Classic (strict-ranking) OT.

But first, we need to discuss the concept of a probability distribution

What is a probability distribution

It’s a function from possible outcomes (of some random variable) to probabilities.

A simple example: flipping a fair coin

which side lands up probabiliy

heads 0.5

tails 0.5

Rolling 2 dice

sum of 2 dice probability

2 (1+1) 1/36

3 (1+2, 2+1) 2/36

4 (1+3, 2+2, 3+1) 3/36

5 (1+4, 2+3, 3+2, 4+1) 4/36

6 (1+5, 2+4, 3+3, 4+2, 5+1) 5/36

7 (1+6, 2+5, 3+4, 4+3, 5+2, 6+1) 6/36

8 (2+6, 3+5, 4+4, 5+3, 6+2) 5/36

9 (3+6, 4+5, 5+4, 6+3) 4/36

10 (4+6, 5+5, 6+4) 3/36

11 (5+6, 6+5) 2/36

12 (6+6) 1/36

Probability distributions over grammars

One way to think about within-speaker variation is that, at each moment, the speaker has multiple grammars to choose between.

This idea is often invoked in syntactic variation (e.g., Yang 2010)

– E.g., SVO order vs. verb-second order

Probability distributions over Classic OT grammars

We could have a theory that allows any probability distribution:

– Max-C >> *θ >> Ident(continuant): 0.10 (tT ɪn)– Max-C >> Ident(continuant) >> *θ: 0.50 (θɪn) – *θ >> Max-C >> Ident(continuant): 0.05 (tT ɪn)– *θ >> Ident(continuant)>> Max-C: 0.20 (ɪn)– Ident(continuant) >> Max-C >> *θ: 0.05(θɪn) – Ident(continuant) >> *θ >> Max-C: 0 (ɪn)

The child has to learn a number for each ranking (except one)

Probability distributions over Classic OT grammars

But I haven’t seen any proposal like that in phonology

Instead, the probability distributions are usually constrained somehow

Anttilan partial ranking as a probability distribution over Classic OT grammars

Id(place)

*θ Id(cont)

means Id(place) >> *θ >> Id(cont): 50% Id(place) >> Id(cont) >> *θ: 50% *θ>> Id(place) >> Id(cont): 0% *θ>> Id(cont) >> Id(place): 0% Id(cont) >> *θ>> Id(place): 0% Id(cont) >> Id(place) >> *θ: 0%

A less-restrictive theory: Stochastic OT

Early version of the idea from Hayes & MacEachern 1998.

– Each constraint is associated with a range, and those ranges also have fringes (margem), indicated by “?” or “??”

p. 43

Stochastic OT

Each time you want to generate an output, choose one point from each constraint’s range, then use a total ranking according to those points.

This approach defines (though without precise quantification) a probability distribution over constraint rankings.

Making it quantitative

Boersma 1997: the first theory to quantify ranking preference.

In the grammar, each constraint has a “ranking value”: *θ 101Ident(cont) 99

Every time a person speaks, they add a little noise to each of these numbers

– then rank the constraints according to the new numbers. ⇒ Go to demo [Day5_StochOT_Materials.xls] Once again, this defines a probability distribution over

constraint rankings An Anttilan grammar is a special case of a Stochastic OT

grammar

Boersma’s Gradual Learning Algorithm for stochastic OT

1. Start out with both constraints’ ranking values at 100.2. You hear an adult say something—suppose /θɪk/ →[θɪk]3. You use your current ranking values to produce an output. Suppose it’s /θɪk/ →

[tT ɪk].4. Your grammar produced the wrong result! (If the result was right, repeat from

Step 2)5. Constraints that [θɪk] violates are ranked too low; constraints that [tT ɪk] violates

are too high.6. So, promote and demote them, by some fixed amount (say 0.33 points)

/θɪk/ *θ Ident(cont)

the adult said this

[θɪk] *demote to 99.67

your grammar produced this

[t̪ ɪk] *promote to

100.33

Gradual Learning Algorithm

demo (same Excel file, different worksheet)

Problems with the GLA for stochastic OT

Unlike with MaxEnt grammars, the space is not convex: there’s no guarantee that there isn’t a better set of ranking values far away from the current ones

And in any case, the GLA isn’t a “hill-climbing” algorithm. It doesn’t have a function it’s trying to optimize, but just a procedure for changing in response to data

Problems with GLA for stochastic OT

Pater 2008: constructed cases where some constraints never stop getting promoted (or demoted)– This means the grammar isn’t even converging to

a wrong solution—it’s not converging at all!

I’ve experienced this in appyling the algorithm myself

Still, in many cases stochastic OT works well

E.g., Boersma & Hayes 2001– Variation in Ilokano reduplication and metathesis– Variation in English light/dark /l/– Variation in Finnish genitives (as we saw last

time)

Type variation

All the theories of variation we’ve used so far predict token variation

– In this case, every theory wrongly predicts that both words vary

/mão+s/ Ident(round) *ãos

mãos *

mães *

/pão+s/ Ident(round) *ãos

pãos *

pães *

Indexed constraints

Pater 2009, Becker 2009 Some constraints apply only to certain words

/mão+s/TypeA Ident(round)TypeA *ãos Ident(round)TypeB

mãos *

mães *!

/pão+s/TypeB Ident(round)TypeA *ãos Ident(round)TypeB

pãos *!

pães *

Indexed constraints

If the grammar is itself variable, we can have some words whose behavior is variable (Huback 2011 example)

/sidadão+s/TypeC Ident(round)TypeC

weight: 100

*ãos

weight: 98

sidadãos *

sidadães *

Where to go from here: R and regression

Download R– www.r-project.org

Download Harald Baayen’s book Analyzing Linguistic Data: A Practical INtroduction to Statistics using R

– www.ualberta.ca/~baayen/publications/baayenCUPstats.pdf

Work through the analyses in the book– Baayen gives all the R commands and lets you download

the data sets, so you can do the analyses in the book as you read about them

Where to go: Optimality Theory

Read John McCarthy’s book Doing Optimality Theory: Applying Theory to Data

– A practical guide for actually doing OT If you enjoy that, read John McCarthy’s book

Optimality Theory: A Thematic Guide– Goes into more theoretical depth

There is a book in Portuguese, João Costa’s 2001 Gramática, conflitos e violações. Introdução à Teoria da Optimidade

Download OTSoft– www.linguistics.ucla.edu/people/hayes/otsoft– If you give it the candidates, constraints, and violations, it

will tell you the ranking

Where to go: Stochastic OT and Gradual Learning Algorithm

Read Boersma & Hayes’s 2001 article “Empirical tests of the Gradual Learning Algorithm”

Download the data sets for the article and play with them in OTSoft– www.fon.hum.uva.nl/paul/gla, under part 3– Try different GLA options– Try learning algorithms other than GLA

Where to go: Harmonic Grammar and Noisy HG

Unfortunately, I don’t know of any friendly introductions to these

Download OT-Help and try the examples– people.umass.edu/othelp/– The OT-Help manual might be the easiest-to-read

summary of Harmonic Grammar that exists!– Try the sample files

Where to go: MaxEnt

The original proposal to use MaxEnt for phonology was Goldwater & Johnson 2003, but it’s difficult to read

Andy Martin’s 2007 UCLA dissertation has an easier-to-read introduction (chapter 4)– www.linguistics.ucla.edu/general/Dissertations/

Martin_dissertationUCLA2007.pdf You could try using OTSoft to fit a MaxEnt

model to the Boersma/Hayes data

Where to go: MaxEnt’s Gaussian prior

To use the prior (bias against changing weights from default), download the MaxEnt Grammar Tool

– www.linguistics.ucla.edu/people/hayes/MaxentGrammarTool– In addition to the usual OTSoft input file, you need to make a file

with mu and sigma2 for each constraint (there is a sample file)

Good examples to read of using the prior– Chapter 4 of Andy Martin’s dissertation– White & Hayes 2013 article, “Phonological naturalness and

phonotactic learning” /www.linguistics.ucla.edu/people/grads/jwhite/documents/HayesWhitePhonologicalNaturalnessAndPhonotacticLearning.pdf

Where to go: lexical variation

Becker’s 2009 UMass dissertation, “Phonological Trends in the Lexicon: The Role of Constraints”, develops the lexical-indexing approach

– www.phonologist.org/papers/becker_dissertation.pdf

Hayes & Londe’s 2006 paper “Stochastic phonological knowledge: the case of Hungarian vowel harmony” uses another approach (Zuraw’s UseListed)

– www.linguistics.ucla.edu/people/hayes/HungarianVH

Thanks for attending!

Stay in touch: [email protected] Working on a phonology project (with or

without variation)? I’d be interested to read it.

Day 5 references

Becker, M. (2009). Phonological trends in the lexicon: the role of constraints (Ph.D. dissertation). University of Massachusetts Amherst.

Boersma, P. (1997). How we learn variation, optionality, and probability. Proceedings of the Institute of Phonetic Sciences of the University of Amsterdam, 21, 43–58.

Boersma, P., & Hayes, B. (2001). Empirical tests of the gradual learning algorithm. Linguistic Inquiry, 32, 45–86.

Goldwater, S., & Johnson, M. (2003). Learning OT Constraint Rankings Using a Maximum Entropy Model. In J. Spenader, A. Eriksson, & Ö. Dahl (Eds.), Proceedings of the Stockholm Workshop on Variation within Optimality Theory (pp. 111–120). Stockholm: Stockholm University.

Hayes, B., & Londe, Z. C. (2006). Stochastic Phonological Knowledge: The Case of Hungarian Vowel Harmony. Phonology, 23(01), 59–104. doi:10.1017/S0952675706000765

Day 5 references

Hayes, B., & MacEachern, M. (1998). Quatrain form in English folk verse. Language, 64, 473–507.

Hayes, B., & White, J. (2013). Phonological Naturalness and Phonotactic Learning. Linguistic Inquiry, 44(1), 45–75. doi:10.1162/LING_a_00119

Huback, A. P. (2011). Irregular plurals in Brazilian Portuguese: An exemplar model approach. Language Variation and Change, 23(02), 245–256. doi:10.1017/S0954394511000068

Martin, A. (2007). The evolving lexicon (Ph.D. Dissertation). University of California, Los Angeles.

Pater, J. (2008). Gradual Learning and Convergence. Linguistic Inquiry.

Pater, J. (2009). Morpheme-specific phonology: constraint indexation and inconsistency resolution. In S. Parker (Ed.), Phonological argumentation: essays on evidence and motivation. Equinox.

Yang, C. (2010). Three factors in language variation. Lingua, 120(5), 1160–1177. doi:10.1016/j.lingua.2008.09.015

day 4 classic ot although we’ve seen most of the ingredients of ot, there’s one more big thing...

Documents