sppa 403 speech science1 unit 3 outline the vocal tract (vt) source-filter theory of speech...

Post on 14-Dec-2015

225 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

SPPA 403 Speech Science 1

Unit 3 outline

• The Vocal Tract (VT)• Source-Filter Theory

of Speech Production• Capturing Speech

Dynamics• The Vowels• The Diphthongs• The Glides• The Liquids

SPPA 403 Speech Science 2

Source-Filter Theory: Modeling Vowels

Vowels are• Voiced (shapes the glottal source spectrum)• oral (velopharyngeal port closed)• produced with “open” vocal tract (VT)

– Poral ~ Patmos

• Classified according to tongue position in VT– front/central/back– high/mid/low

SPPA 403 Speech Science 3

Vowel Quadrilateral

SPPA 403 Speech Science 4

Source-Filter Theory: Modeling Vowels

• Mobile articulators serve to change the VT area function so that it is not constant

• non-constant area function complexity for determining VT transfer function

• However, VT transfer function still based on tube acoustics

SPPA 403 Speech Science 5

Artic. Config. Area Function Transfer Function

glottis lipsfrequency

gain

SPPA 403 Speech Science 6

Source-Filter Theory: Modeling Vowels

• VT has an infinite number of resonances/formants

• Identification of vowel quality seems most dependent upon the location of F1, F2 & F3

• These observations are based on – Studies of vowel perception– Modeling efforts which suggest F4-F6 are relatively

static– the observation that glottal source spectrum rolls off

with increasing frequency

SPPA 403 Speech Science 7

Mid Central vowelF1: 500 HzF2: 1500 HzF3: 2500 Hz

/i/

/u/

//

//

frequency

Am

plit

ude/

gain

SPPA 403 Speech Science 8

Tongue “Rules” for vowel formant values

/i/ & /u/ have a low F1

// & // have high F1

Tongue height ~ F1

Tongue height F1 Tongue height F1

/u/ & // have low F2

/i/ & // have high F2

Tongue A-P ~ F2

Tongue front F2 Tongue back F2

SPPA 403 Speech Science 9

Mid Central vowelF1: 500 HzF2: 1500 HzF3: 2500 Hz

/i/

/u/

//

//

frequency

gain

SPPA 403 Speech Science 10

Lip “Rules” for vowel formant values

Lip rounding (for /u/) ~ F2

Note* lip protrusion will increase the overall length of the vocal tract which will decrease all formant values

SPPA 403 Speech Science 11

Mid Central vowelF1: 500 HzF2: 1500 Hz

/i/

/u/

//

//

frequency

Am

plit

ude/

gain

SPPA 403 Speech Science 12

IMPORTANT

• Tongue and lip rules are based on how these articulations change the VT area function (shape)

• VT area function ultimately determines the VT filter properties

SPPA 403 Speech Science 13

Tf32 example

SPPA 403 Speech Science 14

SPPA 403 Speech Science 15

F1-F2 values for English Vowels

SPPA 403 Speech Science 16

Vowels: Stylized Spectrograms

SPPA 403 Speech Science 17

Vowels: Stylized Spectrograms

SPPA 403 Speech Science 18

/a/ - low back ( F1 F2)

SPPA 403 Speech Science 19

/i/- high front ( F1 F2)

SPPA 403 Speech Science 20

/u/ - high back ( F1 F2)

SPPA 403 Speech Science 21

/ae/ - low front ( F1 F2)

SPPA 403 Speech Science 22

How important are F1-F3 in speech production & perception?

SPPA 403 Speech Science 23

Sinewave Speech Demonstration

Sinewave speech examples (from HINT sentence intelligibility test):

SPPA 403 Speech Science 24

Problem

• Recall - F1 = c/4l

• VT length influences exact frequency location of formants

• Speakers vary in their vocal tract length

• men > women > children

SPPA 403 Speech Science 25

Problem

/i/

/u/

SPPA 403 Speech Science 26

How do we know that a child, a man and a women all say /i/, when the acoustic values of formants are quite different?

SPPA 403 Speech Science 27

A possible answer??

F2

F1

children

women

men

SPPA 403 Speech Science 28

A possible answer??

• Relative locations of formants is similar across speakers even though absolute values differ

• Perhaps we ‘rescale’ our expectations depending upon factors such as gender and age

SPPA 403 Speech Science 29

SPPA 403 Speech Science 30

Vowel articulation and vowel acoustics

• Vowel quadrilateral: articulatory plane

is similar to

• F1-F2 plot: acoustic plane

SPPA 403 Speech Science 31

SPPA 403 Speech Science 32back front

low

high

SPPA 403 Speech Science 33

Vowel articulation and vowel acoustics

• Vowel quadrilateral: articulatory plane

is similar to

• F1-F2 plot: acoustic plane

SPPA 403 Speech Science 34

Unit 3 outline

• The Vocal Tract (VT)• Source-Filter Theory

of Speech Production• Capturing Speech

Dynamics• The Vowels• The Diphthongs• The Glides• The Liquids

SPPA 403 Speech Science 35

Diphthongs

• Slow gliding movement between two vowel qualities

• characterized by an articulatory transition

• articulatory transition = formant transitions

SPPA 403 Speech Science 36

Diphthongs

• /ai/ - “bye”

• /au/ - “bough”

• /oi/ - “boy”

• /ei/ - “bay”

SPPA 403 Speech Science 37

Diphthongs: /ai/

a i

SPPA 403 Speech Science 38

Diphthongs: /au/

a u

SPPA 403 Speech Science 39

Unit 3 outline

• The Vocal Tract (VT)• Source-Filter Theory

of Speech Production• Capturing Speech

Dynamics• The Vowels• The Diphthongs• The Glides• The Liquids

SPPA 403 Speech Science 40

Glides (/w/, /j/) & Liquids (/l/, /r/)

• often termed sonorants

• Associated with – a high degree of vocal tract constriction– articulatory transition = formant transition

SPPA 403 Speech Science 41

Glides (/w/, /j/) & Liquids (/l/, /r/)

Degree of Constriction• Greater than vowels

– Poral slightly greater than Patmos

• Less than fricatives– Poral for glides/liquids < Poral for fricatives

• Constriction lasts ~ 100 msec• Constriction results in a loss in energy

– weaker formants

SPPA 403 Speech Science 42

Glides (/w/, /j/) & Liquids (/l/, /r/)

Transition rate

• faster than the diphthongs

• slower than the stops

• lasts ~ 75 msec

SPPA 403 Speech Science 43

/w/

• Place: labial

• Acoustics– /u/-like formant frequencies– Constriction formant values– F1 ~ 330 Hz– F2 ~ 730 Hz– weak F3 (~ 2300 Hz)

SPPA 403 Speech Science 44

/w/

uh w ae

F1

F2

F3

1000

2000

3000

SPPA 403 Speech Science 45

/j/

• Place: palatal

• Acoustics– /i/-like formant frequencies– F1 ~ 300 Hz– F2 ~ 2200 Hz– F3 ~ 3000 Hz

SPPA 403 Speech Science 46

/j/

uh j ae

F1

F2

F3

1000

2000

3000

SPPA 403 Speech Science 47

/j/

uh j ae

SPPA 403 Speech Science 48

Liquids (/l/, /r/)

• lateral /l/

• Retroflex /r/

• Pickett (1999) considers these consonants glides as well

SPPA 403 Speech Science 49

/l/

• Place: alveolar

• Articulatory phonetics:– tongue tip contacts alveolar ridge– Constriction is on each side of this

obstruction – hence the term lateral– Vocal tract is split – not modeled with a

single tube

SPPA 403 Speech Science 50

/l/

• Acoustics– F1 ~ 360 Hz– F2 ~ 1300 Hz– F3 ~ 2700 Hz– F2 is variable and affected by vowel

environment– Position in word will affect acoustic

features of /l/– Final /l/ will have a higher F1 & lower F2

SPPA 403 Speech Science 51

/l/

uh l ae

F1

F2

F3

1000

2000

3000

SPPA 403 Speech Science 52

/l/

uh l ae

SPPA 403 Speech Science 53

/r/

• Place: palatal

• Articulatory phonetics– /r/ can take on a wide variety of articulator

positions– Tongue can be “bunched” together– Tongue can be “retroflexed”, tipping back

toward the palate– Clearly illustrates that many articulatory

configurations can result in the same acoustic product

SPPA 403 Speech Science 54

/r/

• Acoustics– Hallmark of /r/ is a low F3– F1 ~ 350 Hz– F2 ~ 1050 Hz– F3 ~ 1550 Hz– Vowels have F3 above 2200 Hz– Vowels around /r/ are colored by it and

exhibit lowered F3 values

SPPA 403 Speech Science 55

/r/

uh r ae

F1

F2

F3

1000

2000

3000

SPPA 403 Speech Science 56

“Bunched” /r/

uh r ae

SPPA 403 Speech Science 57

“Retroflexed” /r/

uh r ae

SPPA 403 Speech Science 58

A digression…

• /r/ demonstrates that there isn’t a single way to make a speech sound

• /r/ serves to remind us that our source-filter theory allows educated guesses that may not always be right.

• For example, how would you know from acoustics (i.e. formants) if the person is bunching or retroflexing?

SPPA 403 Speech Science 59

A digression…

• /r/ is a problematic sound for many youngsters to learn

• Why might that be?

top related