on the psychoacoustics of vocal tremor: identifying severity … · p4 perfectfourth p5 perfectfth...

143
Technische Universität Berlin Fakultät I – Geistes- und Bildungswissenschaften Institut für Sprache und Kommunikation Fachgebiet Kommunikationswissenschaft Masterarbeit zur Erlangung des akademischen Grades "Master of Arts" (M.A.) am Fachgebiet Kommunikationswissenschaft On the Psychoacoustics of Vocal Tremor: Identifying Severity Predictor Variables Betreuer: Dr. Markus Brückl Gutachter: Dr. Markus Brückl Prof. Dr. Walter Sendlmeier Bearbeiterin: Cleopatra Christina Moshona Studiengang: Sprache und Kommunikation Eingereicht: zum 10. September 2018

Upload: others

Post on 20-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Technische Universität BerlinFakultät I – Geistes- und BildungswissenschaftenInstitut für Sprache und KommunikationFachgebiet Kommunikationswissenschaft

Masterarbeit

zur Erlangung des akademischen Grades"Master of Arts" (M.A.)

am Fachgebiet Kommunikationswissenschaft

On the Psychoacoustics of Vocal Tremor:Identifying Severity Predictor Variables

Betreuer: Dr. Markus BrücklGutachter: Dr. Markus Brückl

Prof. Dr. Walter SendlmeierBearbeiterin: Cleopatra Christina Moshona

Studiengang: Sprache und KommunikationEingereicht: zum 10. September 2018

Page 2: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Erklärung der Urheberschaft

Hiermit erkläre ich, dass ich die vorliegende Arbeit selbstständig und eigenhändig sowieohne unerlaubte fremde Hilfe und ausschließlich unter Verwendung der aufgeführtenQuellen und Hilfsmittel angefertigt habe.

Gezeichnet, Cleopatra Christina Moshona, geboren am 22. 06. 1984 in Köln.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Ort, Datum Unterschrift

Page 3: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Abstract

Vocal tremor is an unintentional, quasi-cyclical, infrasonic fluctuation of the voice’sfundamental frequency or intensity that emerges as a symptom of various diseases, butalso naturally accompanies the aging process. Though research on vocal tremor hasprogressed in recent years, the relation between its acoustic properties and their perceptsis poorly understood. Knowledge of how these variables connect to perception is anecessary prerequisite for the meaningful evaluation of tremor severity and may helpbridge the gap between acoustics and perception.

This thesis aims at identifying perceptually significant, acoustic predictors of vocaltremor severity. Sustained phonation instances of the vowel /a/ were synthetically gen-erated in Praat for this purpose and manipulated in regards to their frequency tremorparameters. Due to the limited framework of this thesis, amplitude tremor was disre-garded entirely. Perceived severity was assessed by naive and expert listeners in a setof three experiments employing forced-choice pairwise comparisons. The preliminaryexperiment (n = 12) addressed the role of harmonicity on the perceived severity of vocaltremor. Experiment A (n = 34) focused on the perceptual significance of frequencytremor frequency (FTrF ), -intensity (FTrI) -power (FTrP ) and beat effects, resultingfrom the superimposition of more than one FTrF . Experiment B (n = 34) sought tovalidate the adequacy of frequency tremor cyclicality (FTrC) as a measure of tremorperiodicity and to explore its predictive ability.

The results indicate that frequency tremor frequency, -intensity, -power and beat effectsare all significant single predictors of vocal tremor severity. Voices containing higher val-ues of these variables are generally judged to sound more pathological, which translatesto an overall increased vocal tremor severity perception. Out of all listed predictors, theweighted FTrP measures have the highest predictive strength, explaining up to almost80 % of the total rating variance. Harmonicity in contrast does not seem to play a signif-icant perceptual role. Frequency tremor cyclicality was validated as a suitable measureof frequency tremor periodicity. Lower cyclicality values corresponded to increased levelsof noise in the present thesis, which in turn contributed to these sounds being perceivedas more severely impacted.

Page 4: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Zusammenfassung

Stimmtremor wird als unwillkürliche, quasi-zyklische Schwankung der Grundfrequenzoder der Intensität des Sprechsignals aufgefasst, die sich außerhalb des hörbaren Bereichsereignet. Sie tritt als Symptom verschiedener Krankheiten auf, begleitet aber auch aufnatürliche Weise den Alterungsprozess. Obwohl die Stimmtremorforschung in den letztenJahren deutliche Fortschritte verzeichnet hat, ist es noch weitestgehend ungeklärt wiedie akustischen Eigenschaften von Stimmtremor mit der Wahrnehmung korrelieren. Einbesseres Verständnis dieser Beziehung ist allerdings eine unabdingbare Voraussetzung,um den Schweregrad von Stimmtremor sinnvoll evaluieren zu können und würde fernerdazu beitragen die Lücke zwischen Akustik und Perzeption zu schließen.

Das Ziel der vorliegenden Arbeit ist, perzeptiv signifikante, akustische Prädiktoren fürdie Beurteilung des Schweregrads von Stimmtremor zu identifizieren. Hierfür wurdensynthetische /a/-Vokale in Praat generiert und im Hinblick auf ihre Frequenztremorpa-rameter manipuliert. Amplitudentremor wurde aufgrund des begrenzten Umfangs dieserArbeit nicht berücksichtigt. Der wahrgenommene Stimmtremorschweregrad wurde vonLaien und Experten in drei forced choice Paarvergleich-Experimenten beurteilt. Dasvorläufige Experiment (n = 12) analysiert die Rolle von Harmonizität auf den wahrge-nommenen Schweregrad von Stimmtremor. Experiment A (n = 34) konzentriert sich aufdie perzeptive Signifikanz der Frequenztremor-Frequenz (FTrF ), der Frequenztremor-Intensität (FTrI) und der Frequenztremor-Leistung (FTrP ) sowie auf die Rolle vonSchwebungseffekten, die aus der Überlagerung mehrerer FTrF s entstehen. ExperimentB (n = 34) zielt darauf ab die Frequenztremor-Zyklizität (FTrC) als Maß für die Peri-odizität von Tremor zu validieren und ihre prädiktive Fähigkeit zu untersuchen.

Die Ergebnisse zeigen, dass Frequenztremor-Frequenz, -intensität und -leistung sowieSchwebungseffekte signifikante Einzelprädiktoren des Stimmtremorschweregrads sind.Stimmen mit erhöhten Tremormaßen werden generell als pathologischer empfunden, wassich perzeptiv als ein erhöhter Tremorschweregrad übersetzten lässt. Unter den aufgelis-teten Prädiktoren liefern die gewichteten FTrP Maße die beste Prognose und erklärenbis zu 80 % der gesamten Urteilsvarianz. Harmonizität scheint hingegen keine perzeptivsignifikante Rolle zu spielen. Frequenztremor-Zyklizität konnte als geeignetes Maß fürdie Tremorperiodizität validiert werden. In der vorliegenden Arbeit entsprachen gerin-gere Zyklizitätswerte einem höheren Rauschpegel, was wiederum dazu beigetragen hat,dass diese Sounds als stärker beeinträchtigt empfunden wurden.

Page 5: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Contents

List of Figures v

List of Tables vi

Abbreviation directory vii

Symbol directory x

Thesis motivation and outline xiii

I. Theoretical part 1

1. Introduction to vocal tremor 21.1. Tremor definition and classification . . . . . . . . . . . . . . . . . . . . 21.2. Vocal tremor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1. Vocal tremor quantification . . . . . . . . . . . . . . . . . . . . 41.2.2. Acoustic parameters of vocal tremor . . . . . . . . . . . . . . . 51.2.3. Modulatory interdependences . . . . . . . . . . . . . . . . . . . 7

2. Vocal tremor measurement algorithms 92.1. Evaluation of the pitch power spectrum . . . . . . . . . . . . . . . . . 92.2. The Vocal Demodulator . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3. Multi-Dimensional Voice Program . . . . . . . . . . . . . . . . . . . . 102.4. The Modulogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.5. Continuous wavelet transform analysis . . . . . . . . . . . . . . . . . . 122.6. TREMOR.PRAAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.7. Empirical mode decomposition . . . . . . . . . . . . . . . . . . . . . . 14

3. Past findings on vocal tremor 15

i

Page 6: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

II. Empirical part 18

4. Preliminary experiment: exploring the role of harmonicity on theperceived severity of vocal tremor 194.1. Theoretical background . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.1.1. Consonance and dissonance in music theory . . . . . . . . . . . 194.1.2. Perception of musical consonance and dissonance . . . . . . . . 204.1.3. Sensory consonance and dissonance . . . . . . . . . . . . . . . . 214.1.4. Perception of consonance and dissonance in vocal tremor . . . 22

4.2. Aims and objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.2.1. Preliminary hypothesis . . . . . . . . . . . . . . . . . . . . . . . 23

4.3. Acoustic methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.4. Perceptual methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.4.1. Experiment procedure . . . . . . . . . . . . . . . . . . . . . . . 274.4.2. Subject profile . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.5. Evaluation of preliminary hypothesis . . . . . . . . . . . . . . . . . . . 284.5.1. Statistical methods . . . . . . . . . . . . . . . . . . . . . . . . . 284.5.2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.5.3. Exploration of new hypotheses . . . . . . . . . . . . . . . . . . 29

4.6. Evaluation of new hypotheses . . . . . . . . . . . . . . . . . . . . . . . 304.6.1. Statistical methods . . . . . . . . . . . . . . . . . . . . . . . . . 304.6.2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.7. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5. Experiment A: exploring the role of frequency tremor frequency, -intensity, -power and beat on the perceived severity of vocal tremor 335.1. Theoretical background . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.1.1. Frequency tremor frequency and -intensity interdependence . . 335.1.2. Frequency tremor measures in pathology . . . . . . . . . . . . . 345.1.3. Beat frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.2. Aims and objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.2.1. Hypotheses formalization . . . . . . . . . . . . . . . . . . . . . 38

5.3. Acoustic methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.4. Perceptual methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.4.1. Experiment procedure . . . . . . . . . . . . . . . . . . . . . . . 435.4.2. Subject profile . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.5. Statistical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

ii

Page 7: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

5.6. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.7. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6. Experiment B: exploring the role of cyclicality on the perceivedseverity of vocal tremor 556.1. Theoretical background . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.1.1. Vocal tremor regularity and waveform . . . . . . . . . . . . . . 556.1.2. Frequency tremor cyclicality . . . . . . . . . . . . . . . . . . . . 57

6.2. Aims and objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.2.1. Research questions . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.2.1.1. Hypotheses formalization: questions 1 and 2 . . . . . 596.2.1.2. Hypotheses formalization: question 3 . . . . . . . . . 606.2.1.3. Expectations: question 4 . . . . . . . . . . . . . . . . 60

6.3. Acoustic methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616.4. Perceptual methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.4.1. Experiment procedure . . . . . . . . . . . . . . . . . . . . . . . 636.4.2. Subject profile . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6.5. Statistical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646.6. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656.7. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

7. Conclusion 73

Bibliography 77

III. Appendix 84

A. Preliminary experiment A1MFC experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A1Intraclass correlation coefficient . . . . . . . . . . . . . . . . . . . . . . . . . A1Linear regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A2Multiple stepwise regression . . . . . . . . . . . . . . . . . . . . . . . . . . . A5

B. Experiment A B1Intraclass correlation coefficient . . . . . . . . . . . . . . . . . . . . . . . . . B1Linear regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B2Multiple stepwise regression . . . . . . . . . . . . . . . . . . . . . . . . . . . B17

iii

Page 8: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

C. Experiment B C1Intraclass correlation coefficient (total) . . . . . . . . . . . . . . . . . . . . . C1Intraclass correlation coefficient (noise-noise) . . . . . . . . . . . . . . . . . C1Intraclass correlation coefficient (noise-no-noise) . . . . . . . . . . . . . . . . C2Linear regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C3Multiple stepwise regression . . . . . . . . . . . . . . . . . . . . . . . . . . . C9

iv

Page 9: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

List of Figures

1.1. Sinusoidal carrier signals, modulated in their frequency and amplitude 6

2.1. Exemplary tremor analysis of a synthesized sound with tremor.praat 13

4.1. Consonance representation of two simple tones as a function of frequencydifference with critical bandwidth as a unit . . . . . . . . . . . . . . . 22

4.2. Preliminary experiment: linear regression plot (H1) . . . . . . . . . . . 294.3. Preliminary experiment: linear regression plots (H2 – H3) . . . . . . . 31

5.1. Exemplary tremor analysis of a synthesized sound with tremor.praat 365.2. Pulse detection in the first 0.02 seconds during the To Manipulation...

process of tremor synthesis . . . . . . . . . . . . . . . . . . . . . . . . 415.3. Experiment A: linear regression plots (H1 – H8) . . . . . . . . . . . . 475.4. Experiment A: linear regression plots (H9 – H15) . . . . . . . . . . . . 485.5. Experiment A: between variables correlations – linear regression plots 51

6.1. Intraclass correlation coefficients (ICC) of data subsets with 95% confi-dence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6.2. Experiment B: linear regression plots (H1 – H6) . . . . . . . . . . . . . 676.3. Experiment B: between variables correlations – linear regression plots 696.4. Comparison of means in frequency subgroups . . . . . . . . . . . . . . 70

v

Page 10: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

List of Tables

4.1. Consonance order for two-tone intervals, in decreasing order of "perfec-tion", from most consonant to most dissonant . . . . . . . . . . . . . . 23

4.3. Preliminary experiment: frequency ratios of frequency tremor frequen-cies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.4. Preliminary experiment: subject profile . . . . . . . . . . . . . . . . . 28

5.1. Experiment A: frequency ratios of frequency tremor frequencies and fre-quency tremor intensity parameters . . . . . . . . . . . . . . . . . . . . 42

5.2. Experiment A: subject profile . . . . . . . . . . . . . . . . . . . . . . . 435.3. Experiment A: dependent and independent variables with corresponding

levels of measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.4. Experiment A: between variables correlation matrix . . . . . . . . . . 50

6.3. Experiment B: noise levels and frequency ratios . . . . . . . . . . . . . 636.4. Experiment B: dependent and independent variables with corresponding

levels of measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . 646.5. Experiment B: between variables correlation matrix . . . . . . . . . . 68

vi

Page 11: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Abbreviation directory

A2 scientific pitch notation for musical pitch: A2 = 110 HzA3 scientific pitch notation for musical pitch: A3 = 220 HzAM amplitude modulationBOTOX botulinum toxinC4 scientific pitch notation for musical pitch: C4 = 261.63 HzCAPE-V Consensus Auditory-Perceptual Evaluation of VoiceCs control subjectsCSL Computerized Speech LabCV coefficient of variationCWT continuous wavelet transformDC mean amplitude of the waveform, originating from direct currentDGN Deutsche Gesellschaft für Neurologie (German Neurological Society)EEG electroencephalographyEGG electroglottographyEMD empirical mode decompositionE3 scientific pitch notation for musical pitch: E3 = 164.81 HzET essential tremorFFT fast Fourier transformFM frequency modulationH hypothesisHyp hypothesisICC intraclass correlation coefficientIMF intrinsic mode functioninterdistance difference between the respective first or second frequency tremor measure

of two soundsintradistance difference between the first and second frequency tremor measure of one

soundm2 minor second

vii

Page 12: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

M2 major secondm3 minor thirdM3 major thirdm6 minor sixthM6 major sixthm7 minor seventhM7 major seventhm9 minor ninthMA modulation amplitudemax maximumMDVP Multi-Dimensional Voice ProgramMF modulation frequencyMFC Multiple Forced Choicemin minimumMSP Motor Speech ProfileNIF noise intensity factoroff-med off medicationon-med on medicationP4 perfect fourthP5 perfect fifthP8 perfect octaveP12 perfect twelfthPD Parkinson’s diseasePDs Parkinson’s disease subjectsPSOLA Pitch Synchronous Overlap and AddRMS root mean squareSM2 septimal whole tonesm3 septimal minor thirdSPL sound pressure levelST semitonesSTRAIGHT Speech Transformation and Representation based on Adaptive Interpola-

tion of weiGHTed spectrogramSUBJ subjectTEMPO F0 extraction method: Time-domain Excitation extractor using Minimum

Perturbation OperatorTT tritone

viii

Page 13: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Vs vibrato singersVT vocal tremorVTP Voice and Tremor ProtocolWAVE Waveform Audio File Format♀ female♂ male

ix

Page 14: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Symbol directory

A mean amplitude in PaATrF amplitude tremor frequency in HzATRI amplitude tremor intensity index in % (MDVP)ATRIabs absolute amplitude tremor intensity in Pa (MDVP)β0 unstandardized coefficient B (constant)βi unstandardized coefficient B of xi

df0m frequency modulation depth in HzdecA linear decline of intensity in Pa/sdecF linear decline of fundamental frequency in Hz/se Euler’s number: mathematical constantf frequency in HzF0 fundamental frequency in HzF 0 mean fundamental frequency in HzF0 deviation peak to peak variation of fundamental frequency in HzF0,s fundamental frequency at t = 0 in HzF0M(t) resulting modulated pitch at time t in Hzfbeat beat frequency in Hzfenv envelope frequency in Hzff0m frequency modulation rate in Hzfres frequency of the resulting signal in HzFatr amplitude tremor frequency in Hz (MDVP)Fftr frequency tremor frequency in in Hz (MDVP)FTrC frequency tremor cyclicalityFTrF frequency tremor frequency in HzFTrF1 first frequency tremor frequency in HzFTrF2 second frequency tremor frequency in HzFTrF1,1 first frequency tremor frequency of the first sound in HzFTrF1,2 first frequency tremor frequency of the second sound in Hz

x

Page 15: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

FTrF2,1 second frequency tremor frequency of the first sound in HzFTrF2,2 second frequency tremor frequency of the second sound in HzFTrF1,1−1,2 FTrF1,1 − FTrF1,2 interdistance in HzFTrI frequency tremor intensity index in %FTRI frequency tremor intensity index in % (MDVP)FTRIabs absolute frequency tremor intensity in Hz (MDVP)FTrI1 first frequency tremor intensity index in %FTrI2 second frequency tremor intensity index in %FTrI1,1 first frequency tremor intensity index of the first sound in %FTrI1,2 first frequency tremor intensity index of the second sound in %FTrI2,1 second frequency tremor intensity index of the first sound in %FTrI2,2 second frequency tremor intensity index of the second sound in %FTrP frequency tremor power indexFTrP1 first frequency tremor power indexFTrP2 second frequency tremor power indexFTrP1,1 first frequency tremor power index of the first soundFTrP1,2 first frequency tremor power index of the second soundFTrP2,1 second frequency tremor power index of the first soundFTrP2,2 second frequency tremor power index of the second soundFTrP2,1−2,2 FTrP2,1 − FTrP2,2 interdistanceFTrX frequency tremor measurei consecutive index of a variableIn noise intensity factorj consecutive index of a variableµ mean of the distributionµfneur neurological tremor frequency in Hzm upper limit of a consecutive indexMatr magnitude of amplitude tremor in % (VTP)Mftr magnitude of frequency tremor in % (VTP)n upper limit of a consecutive indexn(t) noise functionπ pi number: mathematical constantp(u) probability density functionPatr periodicity of amplitude tremor in % (VTP)Pftr periodicity of frequency tremor in % (VTP)r Pearson correlation coefficient

xi

Page 16: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

R2 coefficient of determinationR2 adj. adjusted coefficient of determinationRatr rate of amplitude tremor in Hz (VTP)Rftr rate of frequency tremor in Hz (VTP)Σ mathematical operator: sumσ standard deviationσneur neurological tremor depth in%t time in sTtotal total sound duration in su continuous variablevAm amplitude coefficient of variation in % (VTP)vF0 fundamental frequency coefficient of variation in % (VTP)Vmax maximum voltage measured in the amplitude envelope in mVVmin minimum voltage measured in the amplitude envelope in mVxi independent variabley linear function of xyres signal resulting from the superimposition of frequency tremor sines

xii

Page 17: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Thesis motivation and outline

Vocal tremor is a frequent symptom of age-related, neurodegenerative or demyelinat-ing diseases and naturally characterizes the aging process. With the world’s populationgrowing rapidly older, its prevalence is expected to increase. Past research using acousticmethodology has focused on the accurate quantification of vocal tremor, but in doing sohas frequently reduced it to a mere function of the speech signal. However, as with anyaspect of voice quality, a meaningful assessment of vocal tremor must take its psychoa-coustic nature into consideration. Without an understanding of the relationship betweenacoustics and percepts, vocal tremor measures have no pragmatic context. In clinicalenvironments, this practical dimension is particularly important when evaluating thedegree of impairment or vocal tremor severity.

The present thesis aims at identifying acoustic factors, hereafter called predictors, thatplay a significant role in the perception of vocal tremor severity, focusing exclusively onfrequency tremor. These predictors are detected on the basis of synthetic signals thatallow for a controlled manipulation of individual parameters, without the interferingmultidimensionality of natural speech signals. Knowledge on perceptually dominantacoustic correlates may help bridge the gap between physiology and perception.

The thesis is divided into a theoretical and an empirical part. Chapter 1 begins withan introduction to the topic, addresses the quantification and parametrization of vocaltremor and establishes working definitions. Chapter 2 gives an overview of algorithmsthat use the acoustic signal to measure vocal tremor. The theoretical part closes with areview on relevant findings on vocal tremor (chapter 3). The empirical part comprisesthree experiments conducted during the course of the thesis (chapters 4 – 6). For a betteroverview, each experiment is addressed in a separate chapter. The empirical part endswith an overall conclusion that summarizes and critically discusses the main findings ofthe thesis (chapter 7). It lists possible limitations and formulates questions for futureresearch. All statistical results are attached in the appendix. All sounds files, scriptsand raw data are included in the enclosed CD.

Page 18: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Part I.

Theoretical part

1

Page 19: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

1. Introduction to vocal tremor

1.1. Tremor definition and classification

Tremor, informally described as "shakiness", "tremble" or "quiver" is defined as the rhyth-mical, involuntary oscillatory movement of a body part (Deuschl et al., 1998). Typ-ically it is divided into two categories. The first, physiological tremor, is a perfectlynormal bodily mechanism which remains unnoticed, unless enhanced by interfering,generally reversible variables such as cold, fatigue, anxiety or withdrawal from addic-tive substances. The second, pathological tremor, is usually noticeable, persistent andemerges as a symptom of one or more medical conditions.

Tremor is biaxially classified by clinical features and etiology (Bhatia et al., 2018).Clinical features are specified, amongst other things, in terms of tremor characteristics.Tremor characteristics in turn encompass anatomical distribution, activation conditions(rest and action) and tremor frequency. Single features, such as tremor frequency, do nothave much diagnostic value on their own, given that most pathologies exhibit similarfrequency ranges of 4 - 8 Hz. Therefore, remarkable effort is put into the meticulousdocumentation of clinical feature clusters. These feature combinations, called syndromes,enable the identification of etiologies that are either genetic, acquired or idiopathic. Asingle syndrome may have heterogeneous etiologies and a single etiology may produceseveral syndromes. Some of the most common tremor syndromes include essential tremor(ET), enhanced physiological tremor and dystonic tremor. Frequent etiological causesare Parkinson’s disease (PD) or multiple sclerosis.

Structurally speaking, the cause of limb tremor is attributed to four potential sources:mechanical tremor of an extremity, reflex activation between agonist and antagonistmuscles that leads to oscillatory activity, central oscillation as a result of rhythmicneuron activity within a nucleus, or within a loop of neuron or nuclei populations andfinally malfunction of feed-forward loops in the central nervous system (Deuschl et al.,

2

Page 20: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

2001). The latter is most likely caused by delayed muscle (re)activation, leading tomovement overshoot and increased movement correction.

1.2. Vocal tremor

When tremor affects the vocal folds, it is called vocal tremor or voice tremor. It isimportant to note that the principles of limb tremor are not simply transferable to vocaltremor and that the classification described above is not particularly useful when dealingwith this form of tremor. Though frequency tremor rate has received quite a bit ofattention in clinical guidelines, its value in discerning between conditions and oftentimeseven between dysphonic speakers and healthy controls is limited, see chapter 3. This ispartly owed to the fact that tremor rates often overlap between different subject groupsand that they may even change over the course of a condition (Gillivan-Murphy,2013).

Vocal tremor may appear coupled with limb tremor, i. e. tremor of the head or torso,thereby affecting the respiratory and suprasegmental structures of the speech productionmechanism, but it may also exist in isolation. The exact sources of vocal tremor genesiswill be of no concern in the present thesis. Instead, a functional perspective will beadopted. As such, all forms of vocal tremor can be regarded as disturbances and/or la-tencies in the (electro)physiological feedback processes of phonatory muscle control thatcause recurrent, cyclic movement deviations. The repetitive over- and undershootingcaused by these deviations can be thought to be reflected in the waveform of tremor.The potential effects of multiple sources of tremor will also not be considered here, be-cause during phonation all individual components melt together into a single acousticsignal that is perceived in its entirety.

In contrast to limb tremor, there is no official definition for vocal tremor in place. Thisis partly attributed to the different approaches used for the analysis of vocal tremor.Consequently, the descriptions used in literature tend to vary. However, some constantsexist which can be used to establish working terms. Vocal tremor is an acoustic event thatcan be described as a modulation of the vocal fold vibration, resulting in fundamentalfrequency (F0) and amplitude variations that are perceived as fluctuations in pitch andloudness. One of its prime characteristics is that it is unintentional, which differentiatesit from the deliberate use of modulation as an artistic effect in vocal vibrato. Secondly, itis infrasonic, operating outside the range of audible frequencies at 1.5 - 15 Hz. Thirdly, it

3

Page 21: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

is quasi-cyclical and recurring, as opposed to instantaneous, short-term F0 and amplitudevariations, known as jitter and shimmer.

The cause of vocal tremor may be –but is not exclusively– an underlying pathology. Toa certain extent, such irregularities in F0 and amplitude are an inherent aspect of alltime-varying signals and a perfectly normal trademark of every healthy voice. In fact,these minor imperfections immensely contribute to the overall sound quality, grantingthe human voice its perceived naturalness.

However, as shown by Brückl (2011) vocal tremor may also be an indicator of advanced(chronological) age and is perceptually associated with an aging voice. When syntheti-cally induced, vocal tremor noticeably elevates the estimated age (Harnsberger et al.,2010). Of course, the risk of acquiring a pathology increases with age, but this is notthe only reason as to why elderly people exhibit vocal tremor. The aging process isaccompanied by a number of physiologic and anatomic changes in the speech produc-tion apparatus, such as a decrease of tissue elasticity and mass due to ossification anddecomposition, as well as a weakening of muscle force (Linville, 2000). These changesaffect the regulatory processes of muscle control, impacting the voice’s pitch, loudness,stability and increasing the proportion of noise. Additionally, aging is characterized by adecline in neurotransmitters, in particular dopamine and serotonin, which in turn is asso-ciated with a decline in motor performance. The accumulation of these factors impedesthe fine-tuning of the microprocesses associated with speech production and especiallyphonation, causing functional irregularities that affect the overall voice quality. In thecase of vocal tremor, these irregularities are perceived as a shakiness of the voice.

1.2.1. Vocal tremor quantification

Considering that vocal tremor is an acoustic event, it seems almost self-evident thatit should be quantified through the use of acoustic measures. However, this is notnecessarily the standard used in practice. In clinical environments, perceptual ratingscale protocols such as CAPE-V (Kempster et al., 2009) are frequently employed toquantify voice anomalies, including vocal tremor. As explained by Kreiman and Ger-ratt (2010), the problem with such scales, aside from their questionable reliability, istwo-fold. Firstly, it is unclear what exactly they measure, because the construct theymeasure is not properly defined in terms of acoustic variables. Secondly, they provideno information whatsoever in regards to the integral quality of the voice.

4

Page 22: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Quantification of vocal tremor is traditionally also performed invasively, i. e. via na-soendoscopy, or through the use of non-invasive methods such as electroglottography(EGG) and more recently electroencephalography (EEG). While these techniques pro-vide many useful insights, they offer no information in regards to how vocal tremoractually sounds.

A quite different, non-invasive approach are instrumental measures such as the Multi-Dimensional Voice Program (MDVP) (Deliyski, 1993) which use the acoustic signalas an input. While this algorithm quantifies very specific properties of the signal, itis not at all clear how the measured properties relate to the perceived voice quality.The reason for this, as Kreiman and Gerratt (2010) put it, is that "their status asmeasurements of quality perception derives solely from correlations, rather than evidencederived from psychoacoustic experimentation." Thus, while they provide an alternativeto invasive methods of quantification, their value remains limited.

Information about the relation between acoustic parameters and their perceptual cor-relates is of utmost importance when assessing vocal tremor, because it would not onlyprovide a solid basis for treatment selection and adaptation, but may even enable clin-icians to draw conclusions in regards to underlying physiological changes that are per-ceptually relevant. Currently, the majority of clinical decisions relies heavily on theauditory impression of the treating physician. For example, dosage levels of medicationsuch as botulinum toxin (BOTOX) are based on their perceived reduction of tremorseverity (Anand et al., 2012). This is problematic, because an auditory impression thatlacks an objective underpinning in terms of acoustics may lead to incorrect judgement,due to interfering psychoacoustic phenomena that may falsify the overall impression.Visa versa, acoustic measures that do not consider the perceptual dimension are simplenumbers, lacking pragmatic context.

Oftentimes, the assessment of voice quality is governed by factors of logistic nature. Theindisputable advantage of perceptual rating scales is that they are easy to apply, easy tounderstand and cost-effective. This makes them practical to use in the hassle of clinicalroutine. In contrast, instruments such as the MDVP are not only cost-intensive, butalso far more complex to administer and interpret.

In practical terms, the severity of a disorder, in this case vocal tremor, is measured byhow much the voice is perceived to have deteriorated or improved. When gauging theamount of impairment in the voice, we perceive the speech signal as a whole. That is tosay, we do not separate between factors that are intrinsic or extrinsic to vocal tremor,

5

Page 23: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

i. e. the amount of noise present in the voice. Even if we tried to concentrate on asingle aspect, it remains questionable whether and, if so, how well we would be ableto encapsulate it. Consequently, an algorithm that measures vocal tremor needs to beable to account for phenomena that are both acoustically and perceptually entangledand weight them in. This can only be achieved through psychometric work that focuseson the identification of factors which lead to the attribution of a particular degree orseverity to vocal tremor.

1.2.2. Acoustic parameters of vocal tremor

Vocal tremor is often referred to as a modulation. The term "modulation" describes theprocess of a carrier signal being altered in its frequency or amplitude by a modulatingsignal, resulting in a variation of pitch or loudness respectively, see figures 1.1a and1.1b. In phonation, the carrier signal is the F0, which is modulated by the mechanismscontrolling its period and intensity (Winholtz and Ramig, 1992). These in turn areaffected by tremor, so the tremor signal becomes the modulating signal. When thefrequency of the modulator is close to the F0, the modulation is in the range of thecycle-to-cycle variations of F0 and amplitude (jitter and shimmer). In this case it is notpossible to separate the modulator from the signal. However, when the modulation isin the range of tremor, the modulating signal can be extracted and analyzed separately(ibd).

(a) Frequency modulation (FM) (b) Amplitude modulation (AM)

Figure 1.1.: Sinusoidal carrier signals, modulated in their frequency (left) and amplitude(right), adopted from Titze (1994)

Both frequency and amplitude tremor are parametrized in terms of their rate and extent.The rate of a modulation describes how fast the signal deviates around its mean F0 or itsmean amplitude. The extent expresses the modulation’s magnitude relative to the mean

6

Page 24: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

F0 or the mean amplitude. It is essentially the modulation’s elongation. The nomen-clature used to describe these four basic parameters is heterogeneous and sometimesconfusing. Most authors agree on the term "rate" or "frequency" to describe how fastthe F0 deviates around its mean. However, some tend to differentiate between frequencydomains, using labels such as wow or wobble for low frequencies and tremor or flutterfor higher ones (Rothman and Arroyo, 1987; Aronson et al., 1992). Schoentgen(2002) additionally introduces the term "microtremor" to designate a normal, low fre-quency modulation of the vocal cycle lengths, as opposed to a pathologic one. Thesedifferentiations are not adopted in the current thesis, since they unnecessarily complicatematters. Besides, as Aronson et al. (1992) point out, pathologic voices do not alwaysexhibit a clear rate of F0 modulation and no acoustic basis for formally distinguishingbetween tremor classes has currently been found (Kreiman et al., 2003). Various termsare also used for the modulation’s magnitude. Some examples include "extent", "depth","level" and "amplitude", the latter often being confused with amplitude tremor, as a typeof modulation affecting the intensity of the signal.

1.2.3. Modulatory interdependences

Modulation in F0 does not only occur as a result of fluctuating vocal fold adductionand abduction rates, but may also be a byproduct of amplitude modulation duringsubglottal pressure changes. With changing subglottal pressure, the extent of the vocalfold vibration varies as well, leading to passive changes in muscle tension as the vocalfolds are displaced more or less from midline (Titze, 1989). These differences in muscletension cause the F0 to vary, with it being higher when the tension increases and lowerwhen the tension decreases.

The reasons for amplitude modulation are more complex and can be attributed to threefactors that are outlined below:

1. Resonance-harmonics interaction

Amplitude modulation may originate from the F0 variation itself. According tothe resonance-harmonics interaction hypothesis, when the F0 varies, its integermultiples (harmonics) vary as well, interacting with the spectral maxima of thevocal tract resonances (formants). If this shift brings a harmonic closer to thefirst formant, that formant will become louder and the overall intensity of thesound will increase, under the premise that all other parameters remain constant.

7

Page 25: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

The opposite happens if the harmonic moves farther away from the first formant(Horii, 1989). Depending on whether the harmonic is slightly lower or slightlyhigher than the first formant, the F0 and amplitude may either undulate in phase(both increase and decrease in synchrony) or out of phase (when one increases, theother decreases). If the harmonic shifts symmetrically around the first formant,the amplitude modulation will be twice as fast as the frequency modulation, seeHorwitz (2014) for a graphic representation.

2. Characteristics of the voice source:

A different cause of amplitude modulation are fluctuations in subglottal pressure(see above) or in the glottal adduction force. The glottal leakage involved with thelatter absorbs quite a bit of the sound energy, thereby attenuating the vocal tractresonances and reducing the amplitudes of the formant peaks (Sundberg, 1994).

3. Vocal tract shape:

Amplitude modulation may also occur when the vocal tract changes its shapeas it transitions between different articulatory settings. Movements of the larynx,tongue, lips or jaw cause the formants to shift up- or downwards (refer to Kienast(2002) for details), thereby modifying the amplitude of the signal. In a way, this isthe inverse of the resonance-harmonics hypothesis, with the harmonics remainingfixed while the formants change (Horwitz, 2014).

Research regarding the quantification and parametrization of vocal tremor is ongoing,with a number of different methods being proposed. The next chapter lists a numberof relevant algorithms and related technical findings in chronological order. Single casestudies and algorithms that have not been validated on populations with vocal tremorhave been excluded from this review.

8

Page 26: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

2. Vocal tremor measurementalgorithms

2.1. Evaluation of the pitch power spectrum

Yair and Gath (1988) introduced a way to quantify frequency modulation in vocaltremor through the evaluation of the pitch power spectrum. For this purpose, an au-tocorrelation function is applied to the low-passed speech signal to obtain the cyclelength sequence. The signal is modeled as a point process with the glottal pulses beingrepresented as a sequence of pulsation instances. Before approximating the spectrumof the modulating signal, the local trends are removed to achieve stationarity of thepitch contour and to counteract masking effects on the spectrum. The model’s accuracywas verified on natural data. The rate of the frequency modulation appears as a sharppeak in the pitch spectrum’s horizontal axis. Its magnitude is defined as the energyconcentration in the spectral peak.

2.2. The Vocal Demodulator

The Vocal Demodulator developed by Winholtz and Ramig (1992) outputs five param-eters: F0, amplitude modulation frequency and -level, as well as frequency modulationfrequency and -level. F0 is detected with a zero-crossing technique. Prior to demod-ulation, the signal is low-passed filtered near the F0 to reduce contamination effectsdue to harmonic and formant energy. After the demodulation, the signal is low-passfiltered again at 25 Hz to remove F0 residual components. Pathologic signals exhibitmultiple prominent modulation frequencies at various intensities which can be obtainedby analyzing the demodulated outputs via a spectrum analyzer or fast Fourier trans-form (FFT). When this type of analysis is not available, the demodulator generates a

9

Page 27: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

weighted average of individual spectral peaks in the range of 2.5 - 25 Hz. Level measure-ment is based on a full-wave rectification of the demodulated signals, averaged over 0.5 sintervals, using the following equations:

Amplitude modulation level (%) = Vmax − Vmin

Vmax + Vmin· 100 (2.1)

Frequency modulation level (%) = F0 deviation − F0

F0· 100 (2.2)

where:

Vmax: maximum voltage measured in the amplitude envelope in mVVmin: minimum voltage measured in the amplitude envelope in mVF0 deviation: peak to peak variation in F0 in HzF0: mean fundamental frequency in Hz

The demodulator was validated on synthetic data with known properties and on naturalphonation. Spectral measurements produced by a waveform analyzer were comparedto those produced by the demodulator based on target frequencies and levels. Thecorrelation results indicate an accurate quantification of the modulation components.

2.3. Multi-Dimensional Voice Program

The Multi-Dimensional Voice Program (MDVP) is a commercial, proprietary tool whichextracts four vocal tremor parameters, defined as follows:

1. Frequency tremor frequency (Fftr) in Hz: frequency of the strongest low frequencyF0-modulating component in the specified analysis range

2. Amplitude tremor frequency (Fatr) in Hz: frequency of the strongest low frequencyamplitude-modulating component in the specified analysis range

3. Frequency tremor intensity index (FTRI) in %: magnitude of the strongest lowfrequency F0-modulating component in the specified analysis range

10

Page 28: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

4. Amplitude tremor intensity index (ATRI) in %: magnitude of the strongest lowfrequency amplitude-modulating component in the specified analysis range

Both FTRI and ATRI are expressed relative to the mean F0 (F0) as ratios. As such theydo not have a physical unit and are therefor given in %:

FTRI = FTRIabs − F0

F0· 100 (2.3)

ATRI = ATRIabs − A

A· 100 (2.4)

As described by Deliyski (1993), an adaptive time domain pitch-synchronous methodis used for the F0 extraction. The F0 estimation is based on a short term autocorrelationanalysis with non-linear coding. The data resulting from the pitch extraction is dividedinto 2 s windows with a 1 s step overlap. The following procedures are then applied onevery window: low-pass filtering of the F0 or amplitude data at 30 Hz and downsam-pling to 400 Hz, total energy calculation of the resulting signal, subtraction of the DCcomponent, computation of the autocorrelation function on the residual and divisionby the total energy, extraction of the period of variation and calculation of the tremorintensity (global maximum of the average autocorrelation curve) and tremor frequency(corresponding position on the curve) for both F0 and amplitude. The performance ofthe MDVP algorithm has been evaluated several times. Though in the past it has beenconsidered the gold standard, the results reported by (Brückl et al., 2017) challengethis view, revealing some of its deficiencies.

2.4. The Modulogram

The algorithm developed by Buder and Strand (2003) provides a graphic output in theform of a series of low frequency spectrograms ("modulograms"), as well as quantitativemeasures of the frequencies, magnitudes, durations and sinusoidal forms of both F0 andsound pressure level (SPL) modulations. These, as well as combinations thereof, aresummarized as histograms that depict time-collapsed distributions of the modulations,derived by cumulating the amplitudes from successive FFT analyses over time. Thequantification and representation of the modulation’s duration is worth noting, becauseit takes a dynamic, time-variant prospect of vocal tremor into account.

11

Page 29: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

F0 extraction is based on cross-correlation analysis. SPL data is extracted by a rootmean square (RMS) smoothing procedure and then converted into a logarithmic dBscale for further analysis. The data is then downsampled to prepare it for FFT anal-ysis in the different frequency domains. The modulogram’s performance was tested onsynthetic and natural data from a normophonic and two pathologic speakers (on and offmedication). These tests revealed multiple irregular modulation frequencies.

2.5. Continuous wavelet transform analysis

In the method proposed by Cnockaert et al. (2005) and Cnockaert et al. (2008)the F0 trace is obtained by means of a continuous wavelet transform (CWT), using theso called complex Morlet wavelet. A second CWT is applied to extract the modulationfrequency and amplitude. The modulation frequency is defined as the sum of all in-stantaneous frequencies of the CWT of the F0 trace in the frequency interval 3 - 15 Hz,weighted by the wavelet transform energy. The amplitude modulation is obtained bysumming the square of the modulus of the wavelet transform over the aforementionedfrequency interval, normalized by the average F0.

The evaluation of the proposed method on sustained synthetic vowels showed a correctdetection of the modulation frequency, but an underestimation of the modulation am-plitude. The underestimation increases when the modulation frequency increases andthe average F0 decreases. This error occurs because the resulting faster F0 variationsare more difficult to detect, due to the smoothing of the CWT over the wavelet’s du-ration. Interestingly, the comparison of the CWT-based F0 estimation to the Hilberttransform, the TEMPO method and Praat’s forward cross-correlation analysis yieldedsimilar underestimations for all techniques, except the Hilbert transform.

2.6. TREMOR.PRAAT

The Praat based algorithm developed by Brückl (2017) outputs a total of 14 vocaltremor measures. The technical definitions for four of these parameters are adopted fromMDVP (FTrF = Fftr, ATrF = Atrf , FTRI = FTrI, ATRI = ATrI). Brückl alsointroduces the concept of power indices and tremor cyclicality (refer to sections 5.2 and6.1.2 respectively for more details), as well as measures of the contour’s mean magnitudesand combinatory constructs.

12

Page 30: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

The analysis methods for both frequency and amplitude tremor comprise of three steps,see figure 2.1. First, the frequency contour is calculated with Praat’s built in cross-correlation analysis function To Pitch (cc). The amplitude contour can be derived intwo ways: either with Praat’s function To AmplitudeTier (period), which uses theintegral instead of the maximum when determining single amplitude values, or via cal-culation of RMS. The amplitudes are calculated pitch-synchronously to avoid artificialmodulations that would otherwise occur. For autocorrelation purposes the amplitudesmust be resampled at a constant rate, due to the time-varying extraction used in the ToAmplitudeTier function. Second, the linear declinations are removed and the contoursare normalized, so as to express the intensity indices relative to the means. Third, theautocorrelation of the frequency and amplitude contours is performed to retrieve thestrongest tremor frequency of the contour (FTrF, ATrF ). Subsequently, the contour’smaxima and minima are picked via the ToPointsProcess (peaks) in dependence of thisfrequency. The ordinates correspond to the intensity values, from which the intensityindices are derived through averaging:

(F, A)TrI =(∑m

i=1 |maxi|m

+∑n

j=1 |minj |n

)÷ 2 (2.5)

Figure 2.1.: Exemplary tremor analysis of a synthesized sound with tremor.praat. Fromtop to bottom: (a) oscillogram of a 1.5 s snippet with FTrF = 5 Hz, FTrI =10 %, decF = 15 Hz/s, ATrF = 6 Hz, ATrI = 15 % and decA = 15 Pa/s (b)broadband spectrogram, (c) extracted F0 contour, (d) normalized, de-declinedF0 contour with detected maxima and minima (short, dashed lines), (e) extractedamplitude contour with the To AmplitudeTier function, (f) resampled, normalized,de-declined amplitude contour with detected maxima and minima.

The algorithm’s performance has been tested numerous times both on synthetically

13

Page 31: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

generated vowels with known properties and natural data (Brückl, 2012; Brückl,2015; Brückl et al., 2017). The most recent comparison with MDVP verifies thattremor.praat’s measures are highly significantly more valid than those of its counter-part. As shown therein, though the MDVP algorithm is unable to deal with declines, thisdoes not seem to be the underlying cause for its errors, which at this point remain un-known. In contrast, tremor.praat’s errors are both explainable and systematic. Theunderestimation exhibited by the intensity measures at higher tremor values is caused bythe averaging of the intensities within the time windows. In the case of deficient ATrF

extraction, the values are one or two octaves too low. This is attributed to interferingsubharmonics present in an artificial signal and is unlikely to occur in natural signalsthat are far less cyclic.

2.7. Empirical mode decomposition

The algorithm proposed by Mertens et al. (2015) outputs measures of frequency tremorrate and extent, which are termed neurological tremor frequency (µfneur) and -depth(σneur) respectively. The tracking of the vocal length cycles is done in the temporaldomain and relies on amplitude and salience analysis of the cycle peaks. The final lengthtime series is obtained via dynamic programming and resampled at a constant rate forfurther processing. It is then analyzed via empirical mode decomposition (EMD), whichbreaks up the time series into a sum of oscillating components (intrinsic mode functions(IMFs)). The instantaneous frequency and amplitude of each IMF is determined viaFM-AM decomposition. If the weighted average instantaneous frequencies of the IMFsfall within the range of ≥ 2 Hz and ≤ 15 Hz they are assigned to neurological tremor.The neurological tremor frequency is obtained via the weighted sum of the individualmode frequencies, expressed relative to its weighted temporal average. The neurologicaltremor depth is estimated via the standard deviation of the empirical mode sum, dividedby the average cycle length. The algorithm was validated on natural data from Parkinsonpatients and control speakers.

Given the accessibility of tremor.praat and its superior performance, its measures willbe used as a reference point in the empirical part of the present thesis. The perceptualrelevance of these measures will be explored on the basis of synthetically constructedsounds that are presented to listeners for evaluation of tremor severity. The focus of thisthesis will be on frequency tremor. The next chapter presents a review of past researchon vocal tremor, focusing on findings that are relevant for the thesis at hand.

14

Page 32: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

3. Past findings on vocal tremor

The following review summarizes 14 relevant papers in the time period 1992 – 2017,sorted in chronological order:

Winholtz and Ramig (1992)Data: Sustained /a/ vowels from 12 vocal tremor (VT) subjects, 12 vibrato singers (V) and 12 controls (C)Method: F0 detection via zero crossing, FFT in the frequency domain of 2.5 - 25 HzMeasures: FM frequency and -level, AM frequency and -levelResults: Significantly higher AM frequencies in VTs and Cs than in Vs. Significantly higher levels of AM and

FM modulation in VTs and Vs than in Cs. No significant differences between groups for FM frequency

Dromey et al. (2002)Data: Sustained /a/ vowels from 10 ET subjects at self-selected high and low pitch- and loudness levelsMethod: MDVP-based contour extraction, calculation of modulation rate and extent via a peak-picking algo-

rithmMeasures: Frequency and amplitude modulation rate and -extent, AM/FM coefficients of variation (CV) for rate

and extentResults: AM and FM rate significantly increase with increasing F0. AM rate significantly increases with increas-

ing loudness. AM extent significantly decreases with decreasing F0. AMs show greater aperiodicitythan FMs

Kreiman et al. (2003)Data: Synthetic copies of sustained /a/ vowels from 32 dysphonic speakersMethod: F0 detection via peak-picking or zero-crossing, formant-based synthesis and VT generation with a sine

wave or irregular modelPerceptualanalysis:

Similarity ratings of original sounds and synthetic copies examining the effect of FM rate and -extent(10 raters each) and the pattern of the modulating waveform (5 raters)

Results: Differences in tremor rate are significantly easier to detect in sinusoidal tremors. Greater tremorextents significantly reduce ability to detect rate differences

Cnockaert et al. (2008)Data: Sustained /a/ vowels from 37 PD subjects and 35 controlsMethod: CWT analysisMeasures: Average modulation frequency (MF) and -amplitude (MA)Results: MF significantly higher in both ♀ and ♂ PDs than in controls. Amongst PDs, MA significantly higher

for ♀. Statistically significant discrimination achieved between subject groups for ♂ based on MF

15

Page 33: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Shao et al. (2010)Data: Sustained /i/ vowels from 25 VT subjects and 24 controlsMethod: MDVP and non-linear dynamic analysisMeasures: F ftr, F atr, F T RI, AT RI

Results: Significantly higher F ftr and F T RI in VT subjects than in controls. No significant differences inF atr and AT RI between groups

Tanaka et al. (2011)Data: Sustained /a/ vowels and standard sentences from 39 PD subjects on medication and 62 controlsMethod: MDVP analysisMeasures: F ftr, F atr, F T RI, AT RI

Results: Significantly higher F ftr and F T RI in ♂ PD subjects than in ♂ controls. Significantly higher FTRIin ♀ PD subjects than in ♀ controls

Anand et al. (2012)Data: 288 synthetic /a/ vowels, obtained from original phonation of 4 ET subjectsMethod: Synthesization of F0 contours with STRAIGHT, manipulation of F0, frequency modulation rate (ff0m)

and -depth (df0m)Perceptualanalysis:

Multiple, averaged severity ratings on a scale of 1 - 7 from 6 raters

Results: Voices with low F0 are perceived to have greater tremor. Perceived severity increases with higher ff0m

and df0m (non linearly for df0m). Perceived severity increases more noticeably for df0m > 8 Hz

Brückl (2012)Data: 729 synthetic "♀" /a/ vowels; 88 natural sustained, ♀ /a/ vowels rated by 30 listeners in regards to

estimated speaker ageMethod: Comparison of tremor.praat version 2.01 and MDVP version 2.6.2Measures: F T rF , F T rI, AT rF , AT rI, F T rP , AT rP ; F ftr, F atr, F T RI, AT RI

Results: tremor.praat detects F T rF and AT rF exactly but marginally underestimates F T rI and AT rI.MDVP produces errors and outputs too low values, especially for F ftr and F atr. Age best indicatedby AT rI and AT rP measures

Gillivan-Murphy (2013)Data: Multiple recordings of sustained /a/ vowels from 32 PD subjects and 28 controlsMethod: Voice and Tremor Protocol (VTP) analysis in Motor Speech Profile (MSP) from Computerized Speech

Lab (CSL), KayPENTAXMeasures: Rate, magnitude and periodicity of frequency and amplitude tremor (Rftr, Ratr, Mftr, Matr, P ftr,

P atr); F0 and amplitude coefficients of variation (vF0, vAm)Perceptualanalysis:

CAPE-V based ratings of instability and tremor on a 100 mm visual analog scale from 3 raters

Results: Ratr significantly differentiates PDs from controls. PDs: Mftr and vF0 are associated with a greateramount of perceived instability/tremor. Controls: Matr and vF0 correlate positively and significantlywith perceived instability/tremor. An increase in Mftr is associated with an increase in perceivedinstability. An increase in Rftr is associated with a decrease in perceived tremor

16

Page 34: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Lester-Smith and Story (2015)Data: Simulated sustained /a/, /@/ and /i/ vowelsMethod: Manipulation of F0, vocal tract shape, degree of adduction, F0 modulation rate and extent with or

without induced noise using a kinematic vocal fold modelPerceptualanalysis:

Shakiness ratings from 42 raters via forced-choice comparison of identical pairs or pairs differing inone characteristic (256 trials)

Results: Perceived shakiness significantly increased with greater degrees of adduction and an /i/-shaped vocaltract when noise is present. Higher F0 only affected perceived shakiness when no noise was present

Mertens et al. (2015)Data: Sustained /a/ vowels from 661 pooled PD subjects and 197 controlsMethod: EMD analysisMeasures: Neurological tremor frequency (µfneur) and -depth (σneur)Results: µfneur and σneur differ significantly between PDs and controls, as well as between genders. Significant

interaction for variables "gender" and "pathology" for σneur attributed to significant differences in σneur

between ♀ PDs

Brückl (2015)Data: Sustained /a/ vowels from 234 PD subjects (on and off medication) and 105 controlsMethod: tremor.praat version 2.06 analysisMeasures: F T rF , F T rI, AT rF , AT rI, F T rP , AT rP

Results: Speaker age correlates highly significantly with F T rI, F T rP and AT rI, AT rP . Significantly higherintensity and power measures in PDs off-med than in controls. Age effects roughly twice as bigas pathology effects. FTrI and FTrP significantly higher in PDs off-med than in PDs on-med. Nodifference for AT rI and AT rP between PD subjects due to medication

Lester-Smith and Story (2016)Data: Vibrato on sustained vowels /A/ and /i/ on target notes A2/E3 for ♂ and A3/C4 for ♀ in two voice

qualities (pressed, breathy) from 4 trained singers. F0 modulation rate: 5.0 - 5.4 Hz for ♀ and 5.2 -5.8 Hz for ♂ extent: 3.0 - 5.6% for ♀ and 1.2 - 3.3% for ♂

Perceptualanalysis:

Shakiness ratings from 20 raters via forced-choice comparison of identical pairs or pairs differing inone characteristic (64 trials)

Results: A pressed voice significantly increases the perception of shakiness in ♀ voices

Hemmerich et al. (2017)Data: Sustained phonation from 20 VT subjects (mild, moderate and severe tremor groups) and 2 age-sex

matched controls using visual, auditory, palpatory and respiratory assessmentPerceptualanalysis:

Averaged tremor severity ratings (across all assessment methods) on a 0-3 scale by 2 raters

Results: Number and severity of tremor in phonatory structures affected by tremor increases with vocal tremorseverity. Strong positive correlation found between tremor index (% of structures affected multipliedby mean severity of tremor) and perceived severity of vocal tremor

As shown by this review, vocal tremor perception data are sparse. The following empiri-cal part sought to explore this subject further, beginning with the role of harmonicity.

17

Page 35: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Part II.

Empirical part

18

Page 36: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

4. Preliminary experiment: exploringthe role of harmonicity on theperceived severity of vocal tremor

4.1. Theoretical background

Pertinent algorithms show that tremulous voices contain more than one frequency tremorfrequency, see chapter 2. Despite this fact, past investigations have focused mainly on themost dominant (strongest) candidate, leaving the role of weaker frequencies unaccountedfor. However, the relation of superimposed frequency tremor frequencies may have de-cisive implications for the perception of vocal tremor. This relation can be described inmusical terms.

4.1.1. Consonance and dissonance in music theory

Musical intervals are perceived whenever two tones are presented simultaneously or suc-cessively. The focus of this chapter lies in simultaneously perceived intervals as parts ofcomplex tones. In vocal tremor, a complex tone is constituted by a carrier signal and twofrequency tremor frequencies of a certain ratio. The frequency ratio of one tremor fre-quency to the other determines which interval is formed. This interval may in turn causea certain auditory sensation which in Western music theory is frequently described independence of its degree of harmonicity.1 Intervals causing a pleasant, "stable" sensationare named consonant, whereas intervals that give rise to unpleasantness and "instability"are labeled dissonant. The harmonicity degree of two frequency tremor frequencies may

1Throughout this thesis, "harmonicity" is loosely used as an umbrella term to refer to the degree ofconsonance (or dissonance) of an interval. In its narrow definition however, harmonicity characterizesthe degree to which the frequencies of overtones coincide with whole multiples of the fundamentalfrequency.

19

Page 37: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

in turn have an impact on the perceived severity of vocal tremor by either attenuatingor amplifying it.

Though it has become perfectly natural to associate consonance to agreeableness, thesource and degree of this alleged interdependence is not at all self evident. Additionally,the transcultural applicability of aesthetic responses to consonance and dissonance hasbeen a subject of debate for many decades, producing conflicting opinions. While someauthors attribute the often detected preference for consonance to human biology (Bidel-man and Krishnan, 2009), others believe that it is owed to cultural factors, such asexposure to harmony (McDermott et al., 2010) or musical experience (McDermottet al., 2016), and yet others that it possibly emerges as a mixture of both (Fritz et al.,2009).

The oldest and probably most established explanation in regards to the source of con-sonance is of mathematical origin and treats consonance as being highly dependent onthe simplicity of the interval’s underlying frequency ratio. This observation is attributedto Pythagoras, who noted that frequency ratios which can be expressed in small integernumbers are generally perceived as more consonant, whereas complex ratios are per-ceived as more dissonant. In how far consonance and agreeableness coincide when ratedby listeners remains to be tested empirically, see below.

Helmholtz (1877) developed Pythagoras’ theory further, relating consonance to theway the partials of two tones interact with each other. When the upper partials (over-tones) coincide, as is the case in simple frequency ratios, the perception of consonancearises. The closer the resemblance of the resulting upper partials combination to asimple harmonic series, the more consonant the interval is assumed to be (Terhardt,1974; Plack, 2010). Dissonance on the other hand occurs through the grating interac-tion of upper partials that are slightly mismatched in their frequencies, giving rise to aphenomenon known as beating, see section 5.1 for more details. Inspite of Helmholtz’smajor contribution in unveiling the principles behind consonance and dissonance, histheory leaves some psychoacoustic aspects unexplained, see Lots and Stone (2008) fora number of critique points.

4.1.2. Perception of musical consonance and dissonance

Though Helmholtz’s interval ranking is theoretically well founded, there seems to be ageneral lack of studies that confirm its applicability in an empirical setting. It is also

20

Page 38: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

unclear to what degree listeners’ ratings of consonance and agreeableness are congruent.Out of the few findings available on the subject, some show deviations from the theoreticnorm, while others have been largely conform with it:

One of the first to explore possible discrepancies between consonance, as defined intheory, and actual listener preferences was Kaestner (1909). In his extensive studies,Kaestner was able to reveal incongruencies between theory and practice, noticingthat listeners tended to prefer the major third over all other intervals. Another earlyattempt to replicate Helmholtz’s tonal hierarchy in a real life situation was undertakenby Malmberg (1914), who compared the ratings of naive listeners to those of experts.Naive listeners showed the same pointed preference for the major third as in the precedingstudy by Kaestner, while experts favored the octave in terms of consonance. In a laterinvestigation that built on Malmberg’s findings, Metz et al. (1981) observed a lackof consistent directional distinction among consonant intervals. The pleasantness ofdissonant intervals varied however, running in parallel with the theoretically establishedorder.

In a study conducted by Maher (1980), the author sought to provide empirical sup-port for the notion that intervals differ in the psychological effects they induce, usinga rating scale battery of eleven scales that drew on a variety of dimensions. Disre-garding some of the typical problems connected with adjective-pair based scales, suchas the uncertainty as to whether test subjects share a common semantic apparatus,Maher’s findings largely confirmed the associations predicted by the music-theoreticwritings listed therein, but were unable to reproduce others. This may be attributedto the inadequacy of some scales to capture certain psychological effects or to the factthat some intervals, such as the minor second, fulfill much clearer communicative andemotive goals than others, see Raven (2005).

4.1.3. Sensory consonance and dissonance

As Rasch and Plomp (1999) point out, one must differentiate between consonance ina strictly musical and consonance in a perceptual sense. Perceptual (psychoacoustic)consonance is often termed tonal or sensory consonance. Though musical consonancehas its roots in perceptual consonance, it is governed by the rules imposed by musictheory, "which to a certain extent, can operate independently from perception" (ibd).

21

Page 39: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

As shown by Plomp and Levelt (1965) in a meta analysis, a more decisive parameter forthe tonal consonance of a two-tone interval is the frequency difference between two tonesin proportion to their critical bandwidth and not solely their frequency ratio (musicalinterval). This aspect takes into account the neuroanatomy of the inner ear, namelythe principle of tonotopy on the basilar membrane. Thus, two tones sound perceptuallyconsonant if they are separated by less than a semitone –in which case they fuse together–or more than a critical bandwidth, which means that they do not interfere with eachother. If the tones are less than a critical bandwidth apart, dissonance occurs. Thegreatest dissonance is reached when the two tones are separated by roughly a quarter ofthe critical bandwidth, which is about 20-25 Hz in low frequency regions up to 500 Hz,see figure 4.1.

Figure 4.1.: Consonance representation of two simple tones as a function of frequency differencewith critical bandwidth as a unit, from Plomp and Levelt (1965).

4.1.4. Perception of consonance and dissonance in vocal tremor

Given the infrasonic range of vocal tremor (1–15 Hz), the dissonance criterion of Plompand Levelt (1965) –which is based on the subdivision of the audible frequency rangeinto critical bands as proposed by Zwicker (1961)– is not applicable. Thus, it is unclearwhether and, if so, how sensory dissonance is perceivable in this narrow frequency rangeand to what extent it contributes to the perceived severity of vocal tremor, if at all. Asa first attempt to approach this subject and in lack of a more suitable alternative, thecontribution of musical consonance in explaining the perceived severity of vocal tremor

22

Page 40: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

was analyzed, under the assumption that the harmonicity factor becomes audible whenaffecting the modulations mounted on a carrier signal. Helmholtz’s hierarchy of two-tone intervals, derived from the frequency ratio theory, was used as a reference point,see table 4.1 for a better overview.

Consonance quality Interval name Frequency ratio

Absolute consonances unison 1⁄1

octave 2⁄1

Perfect consonances fifth 3⁄2

fourth 4⁄3

Medial consonances major sixth 5⁄3

major third 5⁄4

Imperfect consonances minor third 6⁄5

minor sixth 8⁄5

Dissonances major second 9⁄8

major seventh 15⁄8

minor seventh 16⁄9

minor second 16⁄15

tritone 45⁄32

Table 4.1.: Consonance order for two-tone intervals as established by Helmholtz (1877), indecreasing order of "perfection", from most consonant to most dissonant

4.2. Aims and objectives

Due to the lack of data on the transferability of Helmholtz’s hierarchy of consonance inhuman speech, a preliminary experiment was conducted to analyze the role of harmonic-ity on the perceived severity of vocal tremor.

4.2.1. Preliminary hypothesis

In particular, the aim of this experiment was to examine whether frequency tremor fre-quency ratios corresponding to intervals that are considered to be dissonant in Westernmusical theory would enhance the perceived severity of vocal tremor, while ratios cor-responding to intervals that are considered to be consonant would be perceived as lesssevere. Accordingly, the following preliminary hypothesis was formulated:

23

Page 41: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Hypothesis 1 (H1): When comparing voices that each contain two superimposed fre-quency tremor frequencies of a certain ratio, the voice containing the more dissonantfrequency ratio (= musical interval) is perceived as more pathological.

4.3. Acoustic methods

Sustained vowels with frequency tremor were generated with the software Praat(Boersma and Weenink, 2017), applying the source-filter based formant synthesisused in Brückl’s synthesis script (Brückl et al., 2017). The procedure contains thefollowing steps:

1. The glottal source signal is modeled as proposed by Rosenberg (1971), but withadjusted settings to create a smoother glottal flow shape at the opening and closingphase of the glottis.2

2. The source signal is then filtered by a time-invariant transfer function of a "female",/a/-shaped vocal tract.

3. The resulting signal serves as a carrier and is frequency modulated with a sinu-soidally shaped waveform, using the pitch synchronous overlap-and-add (PSOLA)resynthesis method (Moulines and Charpentier, 1990).

For the purpose of this experiment, Brückl’s script was modified to include two fre-quency tremor frequencies and two frequency tremor intensities, based on the principleof Fourier synthesis. Since the focus lied on the analysis of frequency tremor perception,the parameters used to generate amplitude tremor were removed entirely, resulting inthe following formula, expressed as a function of time:

F0M(t) = F0,s + FTrI1 · F0 · sin (FTrF1 · 2π · t)

+ FTrI2 · F0 · sin (FTrF2 · 2π · t) − decF · t(4.1)

2Speech signals are modeled with the glottal flow derivative. A polynomial curve and adjusted powersettings are used to reflect the idea of the glottis opening like a zipper. Additionally, the opening andcollision phase of the glottis, as well as the radiation factor at the lips are taken into consideration.The collision phase parameter models the glottal flow derivative with an exponential decay function,resulting in a smoother transition towards the glottis closure (see Praat’s help file: To Sound:phonation...). A slower slope in the closing phase of the glottis reduces the amplitudes of the middleand higher frequencies in the glottal spectrum (Stevens and Hanson, 1995; Weenink, 2013).

24

Page 42: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

withF0,s = F0 + (Ttotal · decF )

2 (4.2)

where:

t: a certain time point in sF0M(t): resulting modulated pitch at time t in HzF0,s: fundamental frequency at t = 0 in HzFTrI1: first frequency tremor intensity index in %FTrI2: second frequency tremor intensity index in %F0: mean fundamental frequency in HzFTrF1: first frequency tremor frequency in HzFTrF2: second frequency tremor frequency in HzdecF : linear decline of fundamental frequency in Hz/sTtotal: total sound duration in s

For the generation of the source signal the sampling frequency was set to 48 kHz, theadaptation factor to 0.5, the maximum period to 0.05 s, the open phase to 0.7, thecollision phase to 0.03, power1 to 3.0 and power2 to 4.0. The signal’s duration was setto 3 seconds and the mean fundamental frequency (F0) to 200 Hz.

For the tremor creation a synthesis time step of 0.005 was selected, resulting in a synthesis"sampling frequency" of 200 samples per second. The analysis time window used duringthe pitch analysis process of the PSOLA resynthesis had a pitch floor of 75 Hz and a pitchceiling of 600 Hz3. The fundamental frequency’s linear decline amounted to 10 Hz/s andthe frequency tremor intensity for both frequency modulations to 10 %. This intensityvalue is quite high, but it was purposefully set so high to facilitate easier perception.

The above parameters remained the same for all sixteen sounds produced. FTrF1 –thelower of the two modulation frequencies– was also kept constant at 2.5 Hz, serving as ananchor. FTrF2 was systematically varied to portray all twelve intervals of the diatonicscale. Additionally, two compound intervals spanning over more than an octave (m9and P12), as well as two frequency ratios that do not correspond to any interval of thediatonic scale (sm3, SM2) were included, see table 4.3. The compound intervals wereadded with the purpose of verifying whether they would be rated differently than the

3Please note that these values had to be changed for the following experiments, due to erroneous pitchextraction which caused sound artifacts. This was noticed after the preliminary experiment hadalready been conducted, see section 5.3

25

Page 43: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

simple m2 and P5 intervals and the non-diatonic ones to examine whether they wouldbe perceived as particularly pathological, given that they are not common in Westernmusical culture. The lower and upper limits of the tremor frequency range were adjustedon the basis of perceived naturalness. Frequencies over 7.5 Hz sounded too artificial andwere therefor excluded.

The amplitude peaks were scaled to 0.9 to avoid clipping when converting the soundsto WAVE files. During the WAVE file conversion, a "plopping" artifact at the end ofeach sound was noted. In order to reduce this effect, all sounds were cut at the lastzero-crossing as a patch up solution.

Interval abbreviation semitones frequency ratio F T rF1 F T rF2fraction decimal in Hz in Hz

Minor second m2 1 16⁄15 1.067 2.667 2.500Major second M2 2 9⁄8 1.125 2.813 2.500

Septimal whole tone SM2 ~2.5 8⁄7 1.143 2.857 2.500Septimal minor third sm3 2 2⁄3 7⁄6 1.167 2.917 2.500

Minor third m3 3 6⁄5 1.200 3.000 2.500Major third M3 4 5⁄4 1.250 3.125 2.500

Perfect fourth P4 5 4⁄3 1.333 3.333 2.500Tritone TT 6 45⁄32 1.406 3.516 2.500

Perfect fifth P5 7 3⁄2 1.500 3.750 2.500Minor sixth m6 8 8⁄5 1.600 4.000 2.500Major sixth M6 9 5⁄3 1.667 4.167 2.500

Minor seventh m7 10 16⁄9 1.778 4.444 2.500Major seventh M7 11 15⁄8 1.875 4.688 2.500Perfect octave P8 12 2⁄1 2.000 5.000 2.500Minor ninth m9 13 32⁄15 2.133 5.333 2.500

Perfect twelfth P12 19 3⁄1 3.000 7.500 2.500

Table 4.3.: Preliminary experiment: frequency ratios of frequency tremor frequencies

4.4. Perceptual methods

The assessment procedure for this listening experiment was a two-alternative forcedchoice comparison. The use of a rating scale to gauge the perception of tremor waspurposefully avoided due to the biased nature of rating scale data. For more informationon this subject, see (Paulhus, 1991; Bortz and Döring, 2006).

The presented sounds were separated by a 0.1 s "plop" filler. This made it easier torecognize when the first sound ended and the second one began. Under the assumption

26

Page 44: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

of symmetry, each sound was compared to every other sound in one direction, resultingin a total of 16 · 15/2 = 120 sounds. Symmetry was assumed primarily for practicalreasons, in order to avoid excessive subject fatigue. The order of presented sounds(smaller interval vs. larger interval) was random for every individual subject, with thesole constraint that each one occurred in first position an equal number of times.

4.4.1. Experiment procedure

The listening experiment was run in Praat, using the Multiple Forced Choice (MFC)setup. The experiment conditions were partially uncontrolled with varying environmentsand equipment. This was an acceptable limitation, given the preliminary nature of theexperiment.

Keeping the holistic property of voice perception in mind, when assessing how severely avoice is impacted by vocal tremor, it makes little sense to ask how "shaky" or "unstable" itis, because there is no guarantee that the listener will be able to focus on a single aspect.Even if this were the case, it is uncertain that the isolated aspect will be the one that wassought for. The better question would be to ask how much a voice deviates from whatis considered to be a healthy voice. Hence, subjects were asked to judge which of thetwo presented sounds sounded more pathological. The verbatim question was: "Whichvoice sounds more pathological?" [Welche Stimme klingt pathologischer?], see page A1.In this context, "pathological" was understood to mean "sick, sickly, deviating from ahealthy condition". Since synthetic sounds were used to imitate healthy natural vowelproduction, the generated vocal tremor was the sole pathological aspect in this context,thereby enabling a direct mapping between perceived pathology and severity.

Another problem with adjectives such as "shaky", "unstable" or "tremulous" which arecommonly used to describe vocal tremor is that they impose a pre-determined, monodi-mensional sensation on subjects which may bias their judgment. Given that in naturalvoices vocal tremor is a complex phenomenon, a vast number of factors may contributeto the overall impression of a listener and thus influence the perceived severity of vocaltremor. The purpose of this thesis is to determine these exact factors.

Subjects were allowed to listen to the sound pairs as often as they liked, but wereinstructed to decide spontaneously and to not detain themselves in case of uncertainty.The experiment lasted approximately 30 minutes. The 120 sound pairs were divided

27

Page 45: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

into three blocks of 40 sound pairs with designated pauses. Subjects were encouraged tocomply to these pauses and to take as many additional ones as they deemed necessary.

4.4.2. Subject profile

Twelve subjects (including the author), aged 21-62 (mean age: 34.8) participated inthe experiment. Five of the subjects specified to have knowledge in phonetics and tento have knowledge in music. Four participants stated to have hearing disorders, seetable 4.4. Since this was an accidental entry in one case and the other three casesinvolved intermittent tinnitus, none of them were excluded.

Subject age knowledge knowledge hearingin phonetics in music disorders

SUBJ 01 25 yes no noSUBJ 02 33 yes yes noSUBJ 03 21 no no noSUBJ 04 25 no yes noSUBJ 05 23 no yes noSUBJ 06 33 no yes noSUBJ 07 61 no yes yesSUBJ 08 25 yes yes noSUBJ 09 42 yes yes noSUBJ 10 62 no yes yesSUBJ 11 27 yes yes yesSUBJ 12 41 no yes yes

Table 4.4.: Preliminary experiment: subject profile

4.5. Evaluation of preliminary hypothesis

4.5.1. Statistical methods

Interrater reliability measures for the perceptual ratings were carried out with Brückl’sirrNA package in R (2017), using a two-ways random effects model (agreement definition)intraclass correlation coefficient (ICC), based on an average group measure (Brückl,2018). The interrater reliability is interpreted throughout the thesis as proposed byCicchetti (1994).

28

Page 46: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

A simple linear regression with a significance level of α = 0.05 was calculated in SPSS(2017) to determine the proportion of variance in the ratings explained by H1. Thedependent variable used for the regression analysis was the mean rating, the independentvariable was derived by comparing the harmonicity of the two underlying intervals inaccordance to Helmholtz’s consonance ranking. A rating of 1.5, implying a guess, wasassumed for those sound pairs in which a compound interval was compared to its simplecounterpart, or non-diatonic intervals were compared to each other.

4.5.2. Results

The intraclass correlation coefficient showed a highly significant, but only fair interraterreliability for the group measure (ICC(A, k) = 0.558, ICC(A, 1) = 0.096, p < 0.001)which is still high enough to justify an averaging of the ratings for further analysis.

The regression analysis showed that harmonicity significantly contributed to explainingthe total rating variance (F(1,118) = 5.567, p < 0.05). However, the correlation wasnegative, pointing to the opposite of the expected direction. This means that the moreconsonant and not the more dissonant intervals tended to be rated as more pathological.That said, the overall proportion of variance explained by harmonicity was only 3.7 %(R2 = 0.045, adj. R2 = 0.037). The low goodness of fit indicates that basically nolinear relationship exists between harmonicity and the mean ratings, since the regressionprediction could not estimate the real data points, see figure 4.2.

Figure 4.2.: Preliminary experiment: linear regression plot (H1), negative correlation

4.5.3. Exploration of new hypotheses

Based on these findings, the raw rating data were inspected anew, giving rise to theexploration of the following new hypotheses:

29

Page 47: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Hypothesis 2 (H2): When comparing voices that each contain two superimposed fre-quency tremor frequencies of a certain interval, the voice containing the larger interval(in semitones) is perceived as more pathological.

Hypothesis 3 (H3): When comparing voices that each contain two superimposed fre-quency tremor frequencies of a certain interval, the first voice is more often perceived asmore pathological with increasing interval difference (in non-absolute metrics, given insemitones) between the two voices.

The direction of H3 is derived on the basis of the assumption that the larger interval isperceived as more pathological. A positive interval difference essentially means that thefirst voice contains the larger interval of the two. A negative interval difference impliesthe opposite.

4.6. Evaluation of new hypotheses

4.6.1. Statistical methods

Simple linear regressions were calculated in SPSS with α = 0.05 to determine the propor-tion of variance explained by H2 and H3. The dependent variable used for the regressionanalysis was the mean rating, the independent variables were derived by comparing in-terval sizes and calculating the interval difference.

A stepwise multiple regression analysis was then conducted in SPSS with the variablesharmonicity, interval size and interval difference to develop a prediction model for theperceived vocal tremor severity. An entry criterion of 0.05 (PIN) and a removal criterionof 0.095 (POUT) were set as the minimum and maximum probability for a variable toenter or exit the analysis model.

4.6.2. Results

The linear regression analysis indicated that interval size (H2) is a highly significantpredictor (F(1,118) = 105.696, p < 0.001), explaining 46.8 % (R2 = 0.472, adj. R2 =0.468) of the total rating variance. The positive correlation between interval size and themean ratings confirms the H2 with the ratings tending towards a mean of 2 in those cases

30

Page 48: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

where the hypothesis predicted the second sound being perceived as more pathological,see figure 4.3a.

The linear regression analysis showed that interval difference (H3) is a highly significantpredictor (F(1,118) = 159.611, p < 0.001), explaining 57.1 % (R2 = 0.575, adj. R2 =0.571) of the total rating variance. Interval difference correlates negatively with themean ratings, see figure 4.3b. This means that with increasing interval difference theratings tend more towards the first sound being perceived as more pathological, verifyingH3.4

(a) H2: positive correlation (b) H3: negative correlation

Figure 4.3.: Preliminary experiment: linear regression plots (H2 – H3)

The multiple regression analysis showed that harmonicity (H1) no longer significantlycontributed to explaining the rating variance when combined with interval size (H2)and interval difference (H3) and was therefor removed from the model. Interval sizeand interval difference were significant predictors (F(2,117) = 84.458, p < 0.001) andaccounted for 58.4 % (R2 = 0.591, adj. R2 = 0.584) of the total rating variance. Out ofthe two, the influence of interval difference was roughly three times as high as that ofinterval size (compare standardized coefficients β on page A6). The model’s estimationof the dependent variable is given by:

y =β0 + β1 · x1 + β2 · x2

y =1.391 + (−0.017 · interval difference) + (0.088 · interval size)(4.3)

4Note that the interval difference is not given in absolute values. "Increasing interval difference" de-scribes the transition from highly negative values that tend towards the second sound being perceivedas more pathological to highly positive values that tend towards the first sound being perceived asmore pathological.

31

Page 49: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

4.7. Discussion

Contrary to the initial hypothesis, harmonicity was not a relevant impact factor for theperceived severity of vocal tremor in this experiment, since it explained very little of therating variance. This however, is not sufficient evidence to disqualify Helmholtz’s con-sonance ranking nor its transferability on human speech. The present findings could beindicative of the fact that musical consonance is not an adequate criterion for perceivedconsonance, as pointed out in section 4.1, or that the frequency range in question is toonarrow for any perceivable differences. Alternatively, the findings imply that other fac-tors, besides harmonicity, are perceptually more dominant and therefor better suited toexplain the rating variance in this experiment. It should also be noted that a non-linearrelationship cannot be excluded.

The current data suggests that the between-sounds interval difference (in semitones),derived by comparing the ratios of the two superimposed frequency tremor frequenciesper sound, plays a far more prominent perceptual role, with the mean ratings moreevidently tending towards one extreme as the interval difference becomes larger. Thispredictor explained a total of 57.1 % of the rating variance. However, it remains unclearwhether this auditory impression is attributed to musical interval differences betweentwo sounds, or to actual frequency tremor frequencies, with the higher frequency tremorfrequency being perceived as more pathological.

Participating subjects informally described the perceived frequency modulations as rhyth-mical events, with some modulations sounding "faster" than others. Several subjectsstated to have rated these "faster" rhythmical changes as more pathological, while oth-ers reported to have made their judgments in dependence of the "regularity" of changes.Although it is not entirely clear what exactly subjects mean with these descriptions andwith what acoustic properties they correlate, it is likely that the perceived "faster" mod-ulations point to higher frequency tremor frequencies and that the mentioned rhythmic"regularity" refers to variations in the pitch contour due to the superposition of the fre-quency tremor frequencies. Additionally, these rhythmical variations may in part alsobe attributed to fluctuations in the loudness as a result of beat effects, see section 5.1.

Due to limitations in the experiment design with only one of the two frequency tremorfrequencies alternating, the influence of specific frequency tremor values and that of beateffects on the perceived severity of vocal tremor could not be explored within the frameof the preliminary experiment and required further analysis. Consequently, two further

32

Page 50: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

experiments, hereafter named experiment A and experiment B, were conducted to ad-dress this and further issues. The aims and objectives, methods, results and implicationsof these experiments are outlined in the respective following sections.

33

Page 51: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

5. Experiment A: exploring the role offrequency tremor frequency,-intensity, -power and beat on theperceived severity of vocal tremor

5.1. Theoretical background

Due to the design of the preliminary experiment, with one frequency tremor frequencyserving as a fixed, low anchor point, the interval’s size (in semitones) automaticallyincreased when the value of the second frequency tremor frequency increased. This gaverise to the proposition that the perceived severity of vocal tremor is associated to theactual frequency tremor frequencies and not to the intervals per se.

5.1.1. Frequency tremor frequency and -intensity interdependence

When considering the rate of a modulation in an acoustic analysis, one must also considerthe magnitude of the modulation. Regardless of how fast the signal deviates aroundthe F0, which is determined by the FTrF , the magnitude of the deviation, given bythe FTrI, is what makes the modulation of the signal perceivable. In other words, afrequency tremor present in the voice will only be registered by the auditory system inconjunction with its magnitude.

One of the few papers to address this relation used a model of speech production tocompare different, artificially induced sources of tremor. The authors noted that a certainmagnitude threshold must be reached before listeners are capable of detecting a tremorin the voice and that this threshold varied across the different sources (Barkmeier-Kraemer and Story, 2010). It is precisely this discernibility which turns a systematic

34

Page 52: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

disorder that may otherwise not be noticeable into an actual impairment that can takea severe toil on one’s communicative behavior. Consequently, it makes sense to regardthese two measures jointly. Given the close interconnection of FTrF and FTrI, it iscurious that clinical guidelines of tremor pay very little, if any attention to the frequencytremor’s magnitude, refer to DGN (2012) for more details.

5.1.2. Frequency tremor measures in pathology

If frequency tremor measures mirror the severity of vocal tremor, it makes sense to expectthat they would be elevated in vocal tremor subjects. However, the studies which em-ploy acoustic tremor measures to examine whether dysphonic and normophonic groupscan be differentiated from each other on the basis of these measures, paint no clearpicture. This is not only attributed to the sparse number of studies using the aforemen-tioned quantifiers to study vocal tremor, but also to the diverse algorithms, methods andsamples used. While most of the reviewed papers confirm a general trend towards sig-nificantly higher frequency or amplitude tremor values in dysphonic groups, the findingsin regards to which of the four comparable measures are elevated are not consistent, seechapter 3. The only noticeable exception seems to be frequency tremor magnitude (un-der the premise that "magnitude", "intensity", "amplitude", "extent", "depth" and "level"all characterize the elongation of the F0 modulation), which was significantly higher inthe dysphonic groups across all of the papers reviewed (Ludlow et al., 1986; Winholtzand Ramig, 1992; Jiang et al., 2000; Shao et al., 2010; Tanaka et al., 2011; Brückl,2015) and lower in patients under medical treatment (Cnockaert et al., 2007).

However, one should keep in mind that even if pathologic conditions are characterized byhigher frequency or amplitude tremor values, the reverse does not necessarily hold true.Elevated frequency or amplitude tremor values do not forcibly point to a pathology, butmay arise as a natural indicator of the aging process. For more details refer to Brückl(2015) in regards to the highly significant correlation of frequency tremor magnitudemeasures to the covariate speaker age. Since research on the perception of vocal tremorseverity is still in its infancy, it is not yet clear which acoustic parameters have animpact on perception, to what extent and under what circumstances. Some indicationsare given in Anand et al. (2012) and Gillivan-Murphy (2013), see chapter 3, butfurther analysis is required.

35

Page 53: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

5.1.3. Beat frequency

If the perceived severity of vocal tremor depends on the frequency tremor values, it isthinkable that it could also depend on the difference between the two frequency tremorfrequencies. The emerging interactions are then even more complex, since the interplayof the two frequency tremor frequencies within the regarded, narrow range gives rise tothe following physical effect:

The simultaneous sounding of two tones of slightly different frequencies (in our case oftwo frequency tremor sines) results in a signal yres with a frequency fres (the mean ofthe two original frequencies) and periodic variations in its amplitude. If the frequencydifference is less than 20 Hz, as is the case in vocal tremor, the variation becomes audiblein the form of individually discernible beats, perceived as a fluctuation in the loudnessof the signal (Rasch and Plomp, 1999). The amplitude variation is caused by phasedifferences that emerge over time between the two frequencies. Depending on whetherthe frequencies are in or out of phase, they either constructively or destructively interferewith each other.

The frequency of the envelope (fenv) of the resulting wave is half the difference of thetwo original frequencies. However, due to the fact that our ear is unable to discern phasedifferences, what is actually perceived is only the magnitude of the envelope. Thus, thesubjective, audible rate by which the amplitude variates is equal to the difference of thetwo frequencies. This rate is hereafter called the beat frequency (fbeat). The describedrelations are illustrated in figure 5.1.

The superposition of the two waveforms is given by:

yres = sin (2 π f1 t) + sin (2 π f2 t) (5.1)

Using trigonometric identity, equation 5.1 transforms to:

yres = 2 · sin(

2π ·(

f1 + f22

)· t

)· cos

(2π ·

( |f1 − f2|2

)· t

)(5.2)

with:

fenv = |f1 − f2|2 , fres = f1 + f2

2 , fbeat = |f1 − f2|

36

Page 54: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

The larger the difference between the two frequency tremor frequencies, the higher thefrequency of the envelope (fenv) and the faster the fluctuations.

Figure 5.1.: Exemplary tremor analysis of a synthesized sound with tremor.praat. From topto bottom: (a) oscillogram of a 1.5 s snippet with FTrF1 = 6 Hz, FTrF2 = 4 Hz,FTrI = 10 % and decF = 10 Hz/s, (b) broadband spectrogram, (c) extracted F0contour, (d) normalized, de-declined F0 contour (bolded black line) with envelopeperiod (blue and red lines) and beat period (bolded blue and red lines)

The beat frequency resulting from the interaction of the two frequency tremor sine waveswould theoretically not be within audible range if in isolation. However, due to beingmounted on a carrier signal, the fluctuations may become noticeable and fuse with thevariations of the pitch contour –caused by the superimposition of the frequency tremorfrequencies– to produce the overall impression of a rhythmical event. This may in turninfluence the perceived severity of vocal tremor. Faster, more dynamic changes of thepitch contour and of the loudness are expected to be perceived as more severe, becausethey add to the overall complexity of the F0 pattern and may be associated with increasedinstability.

5.2. Aims and objectives

The implied findings of the preliminary experiment called for further clarification on thesubject of whether the perceived severity of vocal tremor is better explained by between-sounds interval differences or by the actual values of the frequency tremor frequenciesand by beat effects, resulting from the interaction of two very similar frequency tremorfrequencies within one sound. For this purpose, the experiment design was extended to

37

Page 55: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

include a second, higher fixed anchor point around which the respective other frequencyalternated, so that the significance of both the between-sounds interval difference and theactual values of the frequency tremor frequencies could be explored simultaneously.

Due to the complexity of the between- and within-sound relations and a lack of nomen-clature on this topic, the author introduced the terms interdistance and intradistance todescribe the relation between two frequency tremor parameters. Interdistance refers tothe difference between the respective first or second frequency tremor measure (FTrX)of two sounds. Intradistance refers to the difference between the first and second fre-quency tremor measure of one sound and is also called beat frequency when the differencein question is that between FTrF s. Both interdistance and intradistance are addressedwith two indices, the first referring to the parameter in question, while the second pointsto the sound: FTrXparameter,sound.

In particular, the experiment aimed to address the following questions:

1. How does the frequency tremor frequency influence the perception of vocal tremorseverity? Does the perceived severity of vocal tremor increase with higher fre-quency tremor frequencies?

2. How does the interdistance of FTrF1,1 and FTrF1,2 as well as the interdistanceof FTrF2,1 and FTrF2,2 impact the perceived severity of vocal tremor?

3. How does the intradistance of FTrF1,1 and FTrF2,1 as well as the intradistanceof FTrF1,2 and FTrF2,2 impact the perceived severity of vocal tremor?

The experiment at hand also sought to analyze the influence of the frequency tremorintensity index (FTrI) on the perceived severity of vocal tremor. Given that frequencytremor intensity seems to play a subordinate role in the clinical classification of vocaltremor, findings on this subject are of particular interest for future evaluation purposes.The current framework also allowed for an examination of the adequacy of Brückl’sfrequency tremor power index (FTrP), which results from weighting the intensity indiceswith a factor depending on the tremor frequencies, as given by:

FTrP = FTrI · FTrF

FTrF + 1 (5.3)

ATrP = ATrI · ATrF

ATrF + 1 (5.4)

38

Page 56: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

These measures take into account psychoacoustic factors and are potentially better suitedto capture the concept of tremor severity. As such, the following questions emerged:

4. How does the frequency tremor intensity influence the perceived severity of vocaltremor? Does the perceived severity of vocal tremor increase with higher frequencytremor intensity values?

5. How does the interdistance of FTrP1,1 and FTrP1,2 as well as the interdistanceof FTrP2,1 and FTrP2,2 impact the perceived severity of vocal tremor?

6. How does the intradistance of FTrP1,1 and FTrP2,1 as well as the intradistanceof FTrP1,2 and FTrP2,2 impact the perceived severity of vocal tremor?

5.2.1. Hypotheses formalization

The questions described above were formalized in thirteen theoretically founded hy-potheses. This number is appropriate, considering the explorative nature of the presentthesis and a general lack of psychoacoustic data on the subject. Additionally, the initialharmonicity hypothesis, as well as the best predictor variable of the preliminary exper-iment (interval difference) were retested, resulting in a total of fifteen hypotheses to beverified:

Hypothesis 1 (H1): When comparing voices that each contain two superimposed fre-quency tremor frequencies of particular intensities, the voice containing the highest fre-quency tremor frequency is perceived as more pathological.

Hypothesis 2 (H2): When comparing voices that each contain two superimposed fre-quency tremor frequencies of particular intensities, the voice containing the higher FTrF1

of the two is perceived as more pathological.

Hypothesis 3 (H3): When comparing voices that each contain two superimposed fre-quency tremor frequencies of particular intensities, the voice containing the higher FTrF2

of the two is perceived as more pathological.

Hypothesis 4 (H4): When comparing voices that each contain two superimposed fre-quency tremor frequencies of particular intensities, the first voice is more often per-ceived as more pathological with increasing interdistance (in non-absolute metrics) be-tween FTrF1,1 and FTrF1,2.

39

Page 57: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Hypothesis 5 (H5): When comparing voices that each contain two superimposed fre-quency tremor frequencies of particular intensities, the first voice is more often per-ceived as more pathological with increasing interdistance (in non-absolute metrics) be-tween FTrF2,1 and FTrF2,2.

Hypothesis 6 (H6): When comparing voices that each contain two superimposed fre-quency tremor frequencies of particular intensities, the voice containing the higher fre-quency tremor frequency intradistance (= the higher beat frequency) is perceived as morepathological.

Hypothesis 7 (H7): When comparing voices that each contain two superimposed fre-quency tremor frequencies of a certain intradistance, the first voice is more often per-ceived as more pathological with increasing intradistance difference (in non-absolute met-rics).

Hypothesis 8 (H8): When comparing voices that each contain two superimposed fre-quency tremor frequencies of particular intensities, the voice containing the higher fre-quency tremor intensity is perceived as more pathological.

Hypothesis 9 (H9): When comparing voices that each contain two superimposed fre-quency tremor frequencies of particular intensities, the voice containing the higher FTrP1

of the two is perceived as more pathological.

Hypothesis 10 (H10): When comparing voices that each contain two superimposedfrequency tremor frequencies of particular intensities, the voice containing the higherFTrP2 of the two is perceived as more pathological.

Hypothesis 11 (H11): When comparing voices that each contain two superimposedfrequency tremor frequencies of particular intensities, the first voice is more often per-ceived as more pathological with increasing interdistance (in non-absolute metrics) be-tween FTrP1,1 and FTrP1,2.

Hypothesis 12 (H12): When comparing voices that each contain two superimposedfrequency tremor frequencies of particular intensities, the first voice is more often per-ceived as more pathological with increasing interdistance (in non-absolute metrics) be-tween FTrP2,1 and FTrP2,2.

Hypothesis 13 (H13): When comparing voices that each contain two frequency powerindices of a certain intradistance, the first voice is more often perceived as more patho-logical with increasing intradistance difference (in non-absolute metrics).

40

Page 58: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Hypothesis 14 (H14): When comparing voices that each contain two superimposedfrequency tremor frequencies of a certain interval, the first voice is more often perceivedas more pathological with increasing interval difference (in non-absolute metrics) betweenthe two voices.

Hypothesis 15 (H15): When comparing voices that each contain two superimposedfrequency tremor frequencies of a certain ratio, the voice containing the more dissonantratio (= musical interval) is perceived as more pathological.

The common direction used for the inter- and intradistance related hypotheses wasestablished on the premise that an increasing inter- or intradistance value translatesas the first sound having the larger value (of the measurement in question). It is inturn expected that the sound containing the higher value is considered to be morepathological.

5.3. Acoustic methods

Sustained vowels with frequency tremor were built in Praat using the same synthesisscript and the same source signal generation procedures described in the preliminaryexperiment, see section 4.3. However, the manipulation arguments for the tremor syn-thesis had to be adjusted, due to minor artifacts that manifested as "crackling" soundsduring the experiment construction.

The source of the problem was localized after careful, visual inspection of Praat’sManipulation object and is caused by pitch extraction errors. The overlap-and-addresynthesis in Praat is computed by using both the original sound (Pitch object) and thepulses generated during the pitch analysis in the To Manipulation... process. The widthof the analysis window influences the detection and computation of these pulses. Whenthe analysis is performed with a large window (in order to detect low frequencies), thepulses at the beginning and at the end of the sound are not properly detected/extracted.Due to the missing pulses, the resynthsis fails for these frames despite existing pitchvalues, as shown in figure 5.2a.

This problem can be circumvented by minimizing the analysis window and setting thepitch floor as high and the pitch ceiling as low as possible. For the current scenario,these values were set to ±5 Hz of the fundamental frequency. With these settings, all

41

Page 59: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

pulses were detected correctly, see figure 5.2b, save for a single one at the end. However,this was a negligible effect, because it was addressed when solving the persisting problemof the sounds not ending at the last zero-crossing. Instead of cutting the sounds, thelast three periods were faded out. This proved to be the far better solution, since the"plopping" artifacts were no longer perceivable, due to the barely audible last period.

(a) Pulse detection failure

(b) Correct pulse detection

Figure 5.2.: Pulse detection in the first 0.02 seconds during the To Manipulation... processof tremor synthesis. Green points represent the existing pitch values, grey pointsshow the extracted pulses

The final adjustment was the synthesis time step, which was set to 0.002 (instead of0.005). This creates 500 (instead of 200) pitch contour points per second that can bemanipulated in the synthesis and is more precise than the original setting. As in thepreliminary experiment, the sound duration was set to 3 seconds, the mean fundamental

42

Page 60: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

frequency (F0) to 200 Hz and the linear decline to 10 Hz/s.

In order to address the impact of the frequency tremor value on the perceived severityof vocal tremor as described in 5.2, the experiment design was extended to include twosets of sounds. The first set of sounds was built with a low FtrF2 anchor at 2.5 Hz.FtrF1 –the higher frequency tremor frequency– varied, covering the frequency ratioscorresponding to six of the twelve intervals of the diatonic scale. The second set ofsounds was built with a high FtrF1 anchor at 6 Hz. FtrF2 –the lower frequency tremorfrequency– varied, covering the remaining six intervals of the diatonic scale. The fre-quency tremor intensity index was set to 10 % for both frequency tremor frequenciesof all sounds. The two sounds from each set containing the highest frequency tremorfrequencies were generated anew with much lower frequency tremor intensity indices of3 %. This resulted in a total of 16 sounds, see table 5.1.

As in the preliminary experiment, the amplitude peaks were scaled to 0.9 to avoidclipping during WAVE file conversion.

Interval abbr. ST frequency ratio F T rF1 F T rF2 F T rI1 F T rI2fraction decimal in Hz in Hz in % in %

Minor second m2 1 16⁄15 1.067 2.667 2.500 10 10Major second M2 2 9⁄8 1.125 6.000 5.333 10 10Major second M2 2 9⁄8 1.125 6.000 5.333 3 3Minor third m3 3 6⁄5 1.200 3.000 2.500 10 10Major third M3 4 5⁄4 1.250 6.000 4.800 10 10Major third M3 4 5⁄4 1.250 6.000 4.800 3 3

Perfect fourth P4 5 4⁄3 1.333 3.333 2.500 10 10Tritone TT 6 45⁄32 1.406 6.000 4.627 10 10

Perfect fifth P5 7 3⁄2 1.500 3.750 2.500 10 10Minor sixth m6 8 8⁄5 1.600 6.000 3.750 10 10Major sixth M6 9 5⁄3 1.667 4.167 2.500 10 10Major sixth M6 9 5⁄3 1.667 4.167 2.500 3 3

Minor seventh m7 10 16⁄9 1.778 6.000 3.375 10 10Major seventh M7 11 15⁄8 1.875 4.688 2.500 10 10Major seventh M7 11 15⁄8 1.875 4.688 2.500 3 3Perfect octave P8 12 2⁄1 2.000 6.000 3.000 10 10

Table 5.1.: Experiment A: frequency ratios of frequency tremor frequencies and frequencytremor intensity parameters

43

Page 61: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

5.4. Perceptual methods

The perceptual methods used for this experiment were identical to the preliminary ex-periment and amounted to a pairwise forced A-B comparison of 120 sound pairs in onedirection, compare section 4.4.

5.4.1. Experiment procedure

The listening experiment was conducted in Praat, using the MFC setup. It took placeunder controlled conditions with identical equipment for all subjects. This compriseda pair of closed-back Beyerdynamic DT 770 Pro headphones and a Line 6 UX1 audiointerface. The participants were allowed to control the volume of the presented soundsover the audio interface. As outlined in section 4.4.1, the particpating subjects were askedto spontaneously decide which of the two presented sounds sounded more pathological.Subjects were allowed to listen to the sound pairs as often as they liked and to takeas many breaks as they needed, beside the designated ones. The experiment lastedapproximately 30 minutes.

5.4.2. Subject profile

Thirty-four subjects (including the author), aged 23-66 (mean age: 35.7) completedthe experiment, six of which had also taken part in the preliminary experiment. Thishowever was a negligible factor, since the time frame between the two experiments waslarge enough to rule out a persisting learning effect. Twelve subjects reported to haveknowledge in phonetics and fourteen to have knowledge in music or acoustics. None ofthe subjects specified to be suffering from hearing disorders, see table 5.2. Thus, allsubjects were included into the subsequent analysis.

5.5. Statistical methods

The interrater reliability was calculated in R, using a two-ways random effects model(agreement definition) intraclass correlation coefficient (ICC), based on an average mea-sure. Simple linear regressions with α = 0.05 were calculated in SPSS to determine the

44

Page 62: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Subject age knowledge knowledge hearing Subject age knowledge knowledge hearingin phonetics in music disorders in phonetics in music disorders

SUBJ 01 34 no no no SUBJ 18 29 no no noSUBJ 02 25 yes yes no SUBJ 19 33 no yes noSUBJ 03 47 no yes no SUBJ 20 66 no no noSUBJ 04 23 yes yes no SUBJ 21 39 yes yes noSUBJ 05 66 no no no SUBJ 22 25 yes yes noSUBJ 06 31 no no no SUBJ 23 36 no yes noSUBJ 07 38 no no no SUBJ 24 33 no no noSUBJ 08 33 yes yes no SUBJ 25 34 no no noSUBJ 09 28 no no no SUBJ 26 27 yes no noSUBJ 10 34 no no no SUBJ 27 27 yes yes noSUBJ 11 34 no yes no SUBJ 28 63 no no noSUBJ 12 32 no yes no SUBJ 29 34 yes yes noSUBJ 13 37 no no no SUBJ 30 36 no no noSUBJ 14 25 no no no SUBJ 31 30 yes no noSUBJ 15 27 yes no no SUBJ 32 26 yes yes noSUBJ 16 26 yes no no SUBJ 33 34 no no noSUBJ 17 66 no no no SUBJ 34 37 no yes no

Table 5.2.: Experiment A: subject profile

proportion of variance explained by H1 - H15. The dependent and independent vari-ables, as well as the corresponding levels of measurement used for the regressions areshown in table 5.3.

HypothesisDependent Independent Measurement

variable variable levelH1 mean rating comparison of FTrF1s and FTrF2s nominalH2 mean rating comparison of FTrF1s nominalH3 mean rating comparison of FTrF2s nominalH4 mean rating FTrF1,1 – FTrF1,2 interdistance intervalH5 mean rating FTrF2,1 – FTrF2,2 interdistance intervalH6 mean rating comparison of intradistances nominal

(= beat frequencies)H7 mean rating FTrF intradistance difference intervalH8 mean rating comparison of FTrF1s and FTrF2s nominalH9 mean rating comparison of FTrP1s nominalH10 mean rating comparison of FTrP2s nominalH11 mean rating FTrP1,1 – FTrP1,2 interdistance intervalH12 mean rating FTrP2,1 – FTrP2,2 interdistance intervalH13 mean rating FTrP intradistance difference intervalH14 mean rating interval difference intervalH15 mean rating comparison of dissonance nominal

Table 5.3.: Experiment A: dependent and independent variables with corresponding levels ofmeasurement

45

Page 63: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

As a second step, a stepwise multiple regression analysis was performed in SPSS with allvariables in order to determine the best prediction model. The entry and removal criteriafor inclusion or exclusion into the model were identical to the preliminary experiment,see section 4.6.1. Finally, correlations between each variable and itself were calculated,using the Pearson correlation coefficient and applying a two-tailed test of significancewith an alpha level of 0.05.

5.6. Results

The intraclass correlation coefficient demonstrated a highly significant, excellent in-terrater agreement for the group measure (ICC(A, k) = 0.948, ICC(A, 1) = 0.347,p < 0.001) which justifies the averaging of the ratings for further analysis.

The regression analyses showed that the "highest FTrF " (H1), "higher FTrF1" (H2)and "higher FTrF2" (H3) predictors were all highly significant (F (1, 118) = 16.823,F (1, 118) = 30.345, F (1, 118) = 25.148, p < 0.001), explaining 11.7 % (R2 = 0.125,adj. R2 = 0.117), 19.8 % (R2 = 0.205, adj. R2 = 0.198) and 16.9 % (R2 = 0.176, adj.R2 = 0.169) of the total rating variance respectively. The correlations were positive forall three predictors, see figures 5.3a – 5.3c. This means that in those cases where thehypotheses predicted the second sound being perceived as more pathological on the basisof higher FTrF values, the mean ratings tended towards 2, verifying the hypotheses.

The regression results indicated that FTrF1,1 – FTrF1,2 (H4) and FTrF2,1 – FTrF2,2

interdistance (H5) are both highly significant predictors (F (1, 118) = 33.968, F (1, 118) =17.34, p < 0.001). They accounted for 21.7 % (R2 = 0.224, adj. R2 = 0.217) and 12.1 %(R2 = 0.128, adj. R2 = 0.121) of the total rating variance respectively. The correlationswere negative, verifying the predicted direction of the first sound being perceived as morepathological with increasing interdistance (in non-absolute metrics), see figures 5.3d and5.3e.

The regression analyses revealed that both "higher beat frequency" (H6) and FTrF in-tradistance difference (H7) reached statistical significance (F (1, 118) = 4.796, F (1, 118) =6.497, p < 0.05), but only explained 3.1 % (R2 = 0.039, adj. R2 = 0.031) and 4.4 %(R2 = 0.052, adj. R2 = 0.044) of the total rating variance respectively. Beat frequencycorrelated positively to the mean ratings, while FTrF intradistance difference correlatednegatively, see figures 5.3f and 5.3g. Both predictors confirmed the trends predicted bythe hypotheses.

46

Page 64: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

The regression analysis showed that "higher FTrI" (H8) was a highly significant pre-dictor (F (1, 118) = 122.535, p < 0.001), accounting for 50.5 % (R2 = 0.509, adj.R2 = 0.505) of the total rating variance. As shown by figure 5.3h, it correlated positivelytowards the predicted direction, see figure 5.3h. This means that in those cases wherethe hypothesis predicted the second sound being perceived as more pathological, themean ratings tended towards 2.

The regression analyses demonstrated that both "higher FTrP1" (H9) and "higher FTrP2"(H10) were highly significant predictors (F (1, 118) = 471.583, F (1, 118) = 410.214,p < 0.001), explaining 79.8 % (R2 = 0.800, adj. R2 = 0.798) and 77.5 % (R2 = 0.777,adj. R2 = 0.775) of the total rating variance respectively. The correlations were positive,which verifies the expected mean trend towards 2, in those cases where the hypothesespredicted the second sound being perceived as more pathological, see figures 5.4i and5.4j.

The regression analyses indicated that both FTrP1,1 – FTrP1,2 (H11) and FTrP2,1

– FTrP2,2 (H12) interdistance are highly significant predictors (F (1, 118) = 186.85,F (1, 118) = 200.099, p < 0.001), accounting for 61.0 % (R2 = 0.613, adj. R2 = 0.610)and 62.6 % (R2 = 0.629, adj. R2 = 0.626) of the total rating variance respectively.The correlations were negative, confirming the expected trend of the first sound beingperceived as more pathological with increasing interdistance (in non-absolute metrics),see figures 5.4k and 5.4l.

The regression analysis showed that FTrP intradistance difference (H13) is a highlysignificant predictor (F (1, 118) = 15.231, p < 0.001) which explained 10.7 % (R2 = 0.114,adjusted R2 = 0.107) of the total rating variance. The correlation was negative, tendingtowards the predicted direction of the first sound being perceived as more pathologicalwith increasing intradistance difference (in non-absolute metrics), see figure 5.4m.

The results of the regression analyses for musical interval difference (H14) and dissonance(H15) yielded no statistically significant correlations, see figures 5.4n and 5.4o. Conse-quently, musical interval difference and dissonance can be dismissed as single predictorsfor vocal tremor severity within the current framework.

47

Page 65: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

(a) H1: positive correlation (b) H2: positive correlation

(c) H3: positive correlation (d) H4: negative correlation

(e) H5: negative correlation (f) H6: positive correlation

(g) H7: negative correlation (h) H8: positive correlation

Figure 5.3.: Experiment A: linear regression plots (H1 – H8)

48

Page 66: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

(i) H9: positive correlation (j) H10: positive correlation

(k) H11: negative correlation (l) H12: negative correlation

(m) H13: negative correlation (n) H14: positive correlation

(o) H15: positive correlation

Figure 5.4.: Experiment A: linear regression plots (H9 – H15)

49

Page 67: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

The final model of the multiple regression analysis contained four of the fifteen ini-tial predictors and was reached in four steps. These include: "higher FTrP1", "higherFTrP2", as well as FTrP2,1 – FTrP2,2 and FTrF1,1 – FTrF1,2 interdistance in thatorder. The final model was highly significant (F (4, 115) = 256.375, p < 0.001), account-ing for 89.6 % (R2 = 0.899, adj. R2 = 0.896) of the total rating variance. Out of thefour predictors FTrP2,1 – FTrP2,2 interdistance had the highest influence on the meanrating variable (compare standardized coefficients β on page B19), followed by "higherFTrP1" and finally by FTrF1,1 – FTrF1,2 interdistance and "higher FTrP2", which hadalmost equal impacts. The model’s estimation function is given by:

y =β0 + β1 · x1 + β2 · x2 + β3 · x3 + β4 · x4

y =1.005 + (0.171 · higher FTrP1) + (0.167 · higher FTrP2) +

+ (−0.035 · FTrP2,1−2,2) + (−0.044 · FTrF1,1−1,2)

(5.5)

This can be interpreted as follows: subjects find it increasingly easier to rate the per-ceived severity of vocal tremor when the FTrP2s of the presented sound pairs are fartherapart. Rating also becomes easier when the FTrF1s of the sound pairs exhibit a largedifference between each other. Finally, the sound with the higher frequency tremorpower indices is perceived as more pathological.

The linear relationship of each variable to every other variable and itself is illustratedin a correlation matrix, see table 5.4. The strongest correlations unsurprisingly existbetween variables that either express the same measure in different scales (i. e. nominal,interval) or in a weighted form (i. e. FtrI measure, FTrP measure). Alternatively, theyare owed to the experiment design.

1. Scale-related correlation:

The strongest correlation of this type exists between "higher FTrF1" and FTrF1,1

– FTrF1,2 interdistance. The "higher FTrF1" variable (H2), which is nominallyscaled, is highly significantly (p < 0.001) correlated to the FTrF1,1 – FTrF1,2 in-terdistance, which is interval scaled (H4). The correlation is negative (r = −0.907),meaning that when "higher FTrF1" tends to 2, the FTrF1,1 – FTrF1,2 interdis-tance decreases, indicating that the second sound has the higher FTrF1 value, seefigure 5.5a. The first variable expresses which of the sounds has a higher FTrF1,while the second variable additionally expresses how much higher the FTrF1 iscompared to that of the other sound.

50

Page 68: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Hyp_1

Hyp_2

Hyp_3

Hyp_4

Hyp_5

Hyp_6

Hyp_7

Hyp_8

Hyp_9

Hyp_10

Hyp_11

Hyp_12

Hyp_13

Hyp_14

Hyp_15

Pears

on C

orr

ela

tion

,885**

,875**

-,802**

-,787**

0,1

46

-0,1

41

-,200*

,366**

,357**

0,0

98

0,0

74

,211*

0,1

03

0,1

48

Sig

. (2

-taile

d)

0,000

0,000

0,000

0,000

0,111

0,123

0,029

0,000

0,000

0,287

0,420

0,021

0,262

0,107

Pears

on C

orr

ela

tion

,885**

,711**

-,907**

-,711**

,412**

-,405**

-0,1

05

,496**

,331**

-0,0

16

-0,0

18

0,0

04

-0,1

13

0,0

98

Sig

. (2

-taile

d)

0,000

0,000

0,000

0,000

0,000

0,000

0,252

0,000

0,000

0,864

0,848

0,965

0,219

0,288

Pears

on C

orr

ela

tion

,875**

,711**

-,767**

-,891**

-0,0

99

0,0

56

-0,1

05

,331**

,497**

0,0

20

-0,0

31

,355**

,354**

0,1

57

Sig

. (2

-taile

d)

0,000

0,000

0,000

0,000

0,282

0,543

0,253

0,000

0,000

0,825

0,741

0,000

0,000

0,087

Pears

on C

orr

ela

tion

-,802**

-,907**

-,767**

,767**

-,393**

,470**

0,1

30

-,445**

-,371**

0,0

05

0,0

03

0,0

18

0,1

59

-0,1

01

Sig

. (2

-taile

d)

0,000

0,000

0,000

0,000

0,000

0,000

0,156

0,000

0,000

0,955

0,974

0,849

0,083

0,270

Pears

on C

orr

ela

tion

-,787**

-,711**

-,891**

,767**

,180*

-,206*

0,1

74

-,248**

-,344**

-0,0

97

-0,0

30

-,504**

-,496**

-0,1

75

Sig

. (2

-taile

d)

0,000

0,000

0,000

0,000

0,049

0,024

0,057

0,006

0,000

0,290

0,743

0,000

0,000

0,056

Pears

on C

orr

ela

tion

0,1

46

,412**

-0,0

99

-,393**

,180*

-,848**

0,0

41

,329**

0,0

37

-0,1

40

-0,0

60

-,617**

-,784**

-0,1

03

Sig

. (2

-taile

d)

0,111

0,000

0,282

0,000

0,049

0,000

0,660

0,000

0,685

0,126

0,515

0,000

0,000

0,264

Pears

on C

orr

ela

tion

-0,1

41

-,405**

0,0

56

,470**

-,206*

-,848**

-0,0

41

-,337**

-0,0

92

0,1

42

0,0

46

,720**

,925**

0,0

86

Sig

. (2

-taile

d)

0,123

0,000

0,543

0,000

0,024

0,000

0,659

0,000

0,316

0,122

0,616

0,000

0,000

0,350

Pears

on C

orr

ela

tion

-,200*

-0,1

05

-0,1

05

0,1

30

0,1

74

0,0

41

-0,0

41

,686**

,686**

-,979**

-,977**

-,523**

0,0

08

-0,0

13

Sig

. (2

-taile

d)

0,029

0,252

0,253

0,156

0,057

0,660

0,659

0,000

0,000

0,000

0,000

0,000

0,933

0,884

Pears

on C

orr

ela

tion

,366**

,496**

,331**

-,445**

-,248**

,329**

-,337**

,686**

,846**

-,759**

-,752**

-,441**

-0,1

40

0,0

36

Sig

. (2

-taile

d)

0,000

0,000

0,000

0,000

0,006

0,000

0,000

0,000

0,000

0,000

0,000

0,000

0,127

0,695

Pears

on C

orr

ela

tion

,357**

,331**

,497**

-,371**

-,344**

0,0

37

-0,0

92

,686**

,846**

-,742**

-,769**

-,201*

0,1

12

0,0

91

Sig

. (2

-taile

d)

0,000

0,000

0,000

0,000

0,000

0,685

0,316

0,000

0,000

0,000

0,000

0,028

0,222

0,324

Pears

on C

orr

ela

tion

0,0

98

-0,0

16

0,0

20

0,0

05

-0,0

97

-0,1

40

0,1

42

-,979**

-,759**

-,742**

,993**

,570**

0,0

59

0,0

21

Sig

. (2

-taile

d)

0,287

0,864

0,825

0,955

0,290

0,126

0,122

0,000

0,000

0,000

0,000

0,000

0,519

0,817

Pears

on C

orr

ela

tion

0,0

74

-0,0

18

-0,0

31

0,0

03

-0,0

30

-0,0

60

0,0

46

-,977**

-,752**

-,769**

,993**

,466**

-0,0

51

-0,0

09

Sig

. (2

-taile

d)

0,420

0,848

0,741

0,974

0,743

0,515

0,616

0,000

0,000

0,000

0,000

0,000

0,582

0,922

Pears

on C

orr

ela

tion

,211*

0,0

04

,355**

0,0

18

-,504**

-,617**

,720**

-,523**

-,441**

-,201*

,570**

,466**

,777**

,216*

Sig

. (2

-taile

d)

0,021

0,965

0,000

0,849

0,000

0,000

0,000

0,000

0,000

0,028

0,000

0,000

0,000

0,018

Pears

on C

orr

ela

tion

0,1

03

-0,1

13

,354**

0,1

59

-,496**

-,784**

,925**

0,0

08

-0,1

40

0,1

12

0,0

59

-0,0

51

,777**

0,1

69

Sig

. (2

-taile

d)

0,262

0,219

0,000

0,083

0,000

0,000

0,000

0,933

0,127

0,222

0,519

0,582

0,000

0,065

Pears

on C

orr

ela

tion

0,1

48

0,0

98

0,1

57

-0,1

01

-0,1

75

-0,1

03

0,0

86

-0,0

13

0,0

36

0,0

91

0,0

21

-0,0

09

,216*

0,1

69

Sig

. (2

-taile

d)

0,107

0,288

0,087

0,270

0,056

0,264

0,350

0,884

0,695

0,324

0,817

0,922

0,018

0,065

Hyp_1

Hyp_2

Hyp_3

Hyp_11

Hyp_12

Hyp_13

Hyp_4

Hyp_5

Hyp_6

Hyp_7

Hyp_8

**.

Corr

ela

tion is s

ignific

ant

at

the 0

.01 level (2

-taile

d).

*. C

orr

ela

tion is s

ignific

ant

at

the 0

.05 level (2

-taile

d).

1

1

1

1

1

1

1

1

1

1

1

1

Hyp_9

Hyp_10

1

1

1

Hyp_14

Hyp_15

Tab

le5.

4.:

Expe

rimen

tA

:bet

ween

varia

bles

corr

elat

ion

mat

rix

51

Page 69: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

2. Weight-related correlation:

The strongest weight-related correlation is found between "higher FTrI" (H8) andboth the FTrP1,1 – FTrP1,2 (H11) and FTrP2,1 – FTrP2,2 interdistance (H12).The latter result from weighting the former with a frequency dependent factor. Thecorrelation is highly significant (p < 0.001) and negative in both cases (r = −0.979and r = −0.977 respectively), see figures 5.5b and 5.5c. This translates as follows:the sound which has the higher FTrI also has the higher FTrP1 and FTrP2

values.

3. Design-related correlation:

The strongest design attributed correlation is between the FTrP1,1 – FTrP1,2

(H11) and FTrP2,1 – FTrP2,2 interdistance (H12). The correlation is highly sig-nificant (p < 0.001) and positive, which means that these variables tend to increasetogether (r = 0.993), see figure 5.5d. This can be explained by the constant in-tensity values of both modulation frequencies within one sound, as well as by thefrequency distribution of the anchor points and the frequencies around them.

(a) H2 - H4: negative correlation (b) H8 - H11: negative correlation

(c) H8 - H12: negative correlation (d) H11 - H12: positive correlation

Figure 5.5.: Experiment A: between variables correlations – linear regression plots

52

Page 70: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

5.7. Discussion

The present experiment did not replicate the results of the preliminary experiment inregards to interval difference. It did however confirm that harmonicity was not a relevantpredictor within the current framework. Since interval difference did not reach statisticalsignificance, it was consequently dismissed as a relevant predictor. This instead pointedto the frequency tremor frequencies as a possible impact factor with higher frequenciesbeing perceptually associated with a more severe vocal tremor.

Interestingly, although the related FTrF predictors yielded significant results, theirregression coefficients explained at most 21.7 % of the rating variance. In contrast, theintensity predictor alone explained 50.5 %, which means that intensity is the primaryinfluencing factor for the severity judgement, as proposed in 5.1. However, the bestvocal tremor severity predictors in this experiment were in fact the weighted frequencypower indices "higher FTrP1" and "higher FTrP2" introduced by Brückl (2012), whichexplain 79.8 % and 77.5 % of the rating variance respectively. These results demonstratethat higher frequency tremor frequencies and intensities enhance the perceived severityof vocal tremor and are in line with those reported by Anand et al. (2012) and theobservations of Gillivan-Murphy (2013) in regards to frequency tremor intensity.

These findings confirm the superiority of measures that take both psychological andbiological aspects into account when assessing perception, over evaluation methods basedsolely on the physical properties of the signal. They also stress the need for morepsychoacoustically motivated research. The present findings are particularly importantwhen considering that current attempts to classify vocal tremor rely exclusively on thefrequency tremor frequency, neglecting both the intensity factor and consequently thepsychoacoustic significance of both measures’ weighted combination. Furthermore, theyhave practical relevance for clinical evaluation purposes, given the heavy influence of thetreating physicians’ auditory impression on some decisions.

However, caution is required when interpreting these findings due to the inherent limita-tions of the statistical methods used and their susceptibility to errors. The determinationcoefficients indicate how well each model fits the linear approximation, but they makeno statement as to whether the correct model was specified. Therefor, they are unableto detect relationships of a non-linear nature. As such, it could well be the case thatvariables which did not or only poorly contributed to explaining the rating variance,may have undetected predictive strength.

53

Page 71: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

The multiple regression analysis revealed that the predictive strength of the nominallyscaled FTrP1 and FTrP2 predictors was further augmented when combined with theFTrP2,1 – FTrP2,2 and FTrF1,1 – FTrF1,2 interdistances, summing up to a varianceexplanation of 89.6 %. This implies that different scales are worth considering whenexploring possible predictors and that the interval scaled variables do not necessarilyalways supply the highest information density (compare also H9 and H10 vs. H11 andH12). It also illustrates that even when individual predictors are not particularly conclu-sive on their own, they may add to the overall information level when combined withina model.

The correlation matrix revealed strong correlations between several independent vari-ables. This is hardly surprising, considering that many of these relationships trace backto scale- or weight-related dependencies. This could infer a multicolinearity problem,but this is not likely to be the case, given that all variables which were included into thefinal model had a high predictive strength on their own. Additionally, those parameterswhich were excluded either did not explain much of the total rating variance to beginwith or correlated very highly with those variables which entered the final model. Evenif multicolinearity existed, this would be of secondary concern, given that the core aimof this thesis was to identify which of the theoretically derived parameters contributebest to the overall explained variance and how they relate to each other.

Out of the three predictors associated with beat frequency (H6, H7, H13) only the FTrP

intradistance difference (H13) seemed to offer some informative gain in regards to therating variance, but this was lower than in most other significant cases. However, theeffects of these predictors may have been masked by the strong frequency tremor mod-ulations. It is also possible that beat frequency, being a physical measure, is not suitedto capture the perceived fluctuation. Fluctuation strength (in vacil) is a psychoacousticmetric proposed by Zwicker and Fastl (2007) to quantify the amount of perceivedmodulation in the frequency range below 20 Hz that might provide more insight and isperhaps worth pursuing further in future experiments.

The interrater agreement proved to be excellent (ICC(A, k) = 0.947) and noticeablyhigher than the interrater agreement of the preliminary experiment (ICC(A, k) = 0.558).This may lead back to similar rating strategies amongst raters of this group, the factthat raters found it easier to rate this experiment as opposed to the preliminary one orboth. Either way, one should be careful when interpreting the ICC values, at least as faras the group measures are concerned. These tend to increase not only when subjects are

54

Page 72: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

more homogenous in their ratings, but also with greater sample size, as noted in Brückl(2011). The ICC single measures are independent of the sample size and therefore morequalified for a direct comparison. In this case, the ICC single measure was also noticeablyhigher than the preliminary result.

This experiment contributed to identifying relevant predictors for the perceived severityof vocal tremor, but it did so on the basis of many simplifications. Firstly, the FTrI

values within one sound were kept at an equal, fairly high level. However, superimposedFTrFs in natural voices almost certainly vary in their intensity. Given that differencesin intensities across sounds seem to play a major role for the perceived severity of vocaltremor, intensity differences within a single sound may also have a sizable influence.

Secondly, all waveforms in this experiment were sinusoidal. Despite the mathematicaladvantages associated with sine waves, they remain a rough approximation of the realspeech signal, which is inherently unstable and as such subject to many types of per-turbations, i. e. jitter and shimmer. These perturbations in turn may dominate over oraugment the modulations and enhance the perceived severity. Additionally, the regular-ity of the modulating waveform itself may also play a perceptually significant role. Assuch, added irregularities to the modulations may have an interfering influence on theoverall auditory impression.

Thirdly, the frequency of the carrier signal may also be an interfering factor. In thisexperiment, no attempts were made to systematically vary the fundamental frequencyin order to explore its impact. However, as shown in Anand et al. (2012) significantinteractions may arise between F0 and the frequency tremor measures.

In summary, natural speech signals are infinitely more intricate than synthetically con-structed ones and as such have many more and much more complex interdependenciesthan the ones artificially induced here. In the following experiment the attempt wasmade to address one of them, namely the regularity of the modulating waveform, byadding noise to it. The exact aspects examined, as well as the results and their implica-tions are outlined in the next chapter.

55

Page 73: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

6. Experiment B: exploring the role ofcyclicality on the perceived severityof vocal tremor

6.1. Theoretical background

The perceived naturalness of a human voice is in great parts owed to its inherent im-perfection, caused by the characteristic deviations of every time-varying system. Theextent and regularity of such deviations is commonly (and at times too rashly) used as adichotomizing criterion for a voice to be labeled as "normophonic" or "dysphonic". Thesedeviations are not an isolated phenomenon. They are usually part of a complex networkof irregularities that interact and interfere with each other. When keeping this dynamicproperty in mind, the thought that a modulation may itself be subject to modulationsappears to be a logical deduction.

6.1.1. Vocal tremor regularity and waveform

Phonatory frequency modulations are usually quantified in terms of their rate (FTrF )and magnitude (FTrI). However, the regularity and waveform of a modulation mayalso be valuable sources of information. Although these aspects are most commonlyused to parametrize vocal vibrato, they can also be applied to vocal tremor, given thatthe acoustic manifestation of vibrato and tremor is comparable (Sundberg, 1994).

Regularity is somewhat vaguely explained by Sundberg as the similarity between fre-quency excursions. "Similarity" in this case characterizes the degree of variability in theduration and extent of each modulation cycle. In other words, it describes how wellthe tremor (or vibrato) correlates with itself and is consequently a measure of periodic-ity. The waveform parameter refers to the shape of the frequency tremor and is usually

56

Page 74: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

assumed to be sinusoidal, but could theoretically also be triangular, sawtoothed etc. De-spite being terminologically separated, the regularity and waveform of a modulation areinevitably connected to each other. Irregularities in the frequency tremor will eventuallybe reflected in a waveform that departs from sinusoidality and vice versa changes in thesine shape of a modulation imply changes in its regularity. As Kreiman et al. (2003)point out it is therefore sensible to assume that changes in one do not occur withoutsimultaneous changes in the other.

Despite the methodological and theoretical advantages associated with a sine wave ap-proximation of vocal tremor, no studies have up to date and to the author’s best knowl-edge systematically studied the shape of the modulating waveform in pathologic tremu-lous voices.1 In regards to the regularity of vocal tremor, some sparse observations havebeen made:

In an acoustic comparison of vocal tremor and vocal vibrato Ramig and Shipp (1987)reported that tremor shows greater variability in the rate of oscillation than vibrato, butthe difference did not reach statistical significance. Still, given the nature of vibrato, itis sensible to assume that the voluntary manipulation of a phonatory mechanism wouldproduce more regular acoustic patterns than the involuntary result of muscle controldisturbances.2

When studying the synchrony and interplay of muscle activation in patients with la-ryngeal tremor Koda and Ludlow (1992) found significant delays in the timing oftremor peaks between muscles and therefor postulated that the mechanism responsiblefor tremor is not invariant (and as such prone to irregularities). A direct comparisonof sustained vowel phonation in patients with perceptually salient vocal tremor andhealthy age-sex matched controls revealed that pathologic speakers displayed signifi-cantly higher percentages of tremor variation for both amplitude and frequency (Lud-low et al., 1986).

Dromey et al. (2002) assessed the steadiness of tremor rate and extent (operationalizedby the coefficient of variation) for both frequency and amplitude tremor in patientsdiagnosed with vocal tremor. The vocal tremor of all included speakers was characterized

1However, since every periodic oscillation can be expressed in terms of the Fourier series and as suchbe decomposed in sine- and cosine waves (or complex exponentials), one may argue that these othershapes would amount to more or less the same.

2In fact, the regularity of a vibrato modulation is considered a sign of the singer’s skill (Sundberg,1994) with particularly talented singers being able to maintain a regular vibrato even when theirauditory feedback is masked by noise up to a level of 100 dB (Schultz-Coulon, 1978).

57

Page 75: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

by erratic modulations of varying degree with greater instabilities in the extent thanin the rate of the modulations for both amplitude and frequency tremor. However,the analysis was based on F0 and RMS amplitude contours extracted with the MultiDimensional Voice Program (MDVP), which is prone to errors, see 2.3.

In contrast, Gillivan-Murphy et al. (2018) reported higher periodicity values in thefrequency and amplitude tremor of Parkinson patients as opposed to healthy controls.The periodicity of frequency tremor varied more amongst the Parkinson group. However,given the lack of statistical significance of the results and the gap of information inregards to the validity of the algorithm used (Voice and Tremor Protocol (VTP) fromthe Motor Speech Profile (MSP) of CSL), these observations remain questionable atbest.

The different assessment methods as well as the varying experiment designs, data sets,sample profiles and sizes make it difficult to compare these results to each other. It doeshowever become evident that the role of tremor periodicity is underexplored and thatno definitive statements can be made in regards to this matter. Even less informationexists on the subject of how tremor periodicity influences the perceived severity of vocaltremor.

To the author’s best knowledge, the sole study investigating the perceptual importance ofthe shape and regularity of the frequency tremor waveform was conducted by Kreimanet al. (2003). In this study subjects were asked to rate the similarity of synthetic versionsof pathologic, tremulous voices to the original samples in a pairwise comparison, usinga 100 mm visual analog scale that ranged from 0 mm (exact same) to 100 mm (verydifferent). Each synthetic version was generated with two tremor models: sine wave vs.irregular tremor. Both models proved to be "excellent" matches to the natural voiceswith no tremor model consistently performing better than the other. The choice of themodel depended on the pattern of the F0 variability. Across voices, the rating differenceswere correlated with the severity of the vocal tremor. The choice of the proper modelincreasingly affected acceptability of the synthetic voices when vocal tremor severitywas high. In mild forms of vocal tremor and in the presence of high frequency noiselisteners were insensitive to the long-term F0 variabilities induced by tremor, preferringboth models equally. These findings show that both models produced natural resultsand suggest that both irregular and periodic forms of vocal tremor may exist.

58

Page 76: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

6.1.2. Frequency tremor cyclicality

Given the aforementioned uncertainty in regards to the role and potential impact oftremor regularity on the perceived severity of vocal tremor, the following experimentaimed to investigate this topic further. The periodicity of synthetically generated vowelswas first manipulated and then measured with Brückl’s analysis script tremor.praat(Brückl, 2017). As explained in section 2.6, the underlying analysis algorithm oftremor.praat produces highly significantly more valid measures than its MDVP coun-terpart for all regarded parameters. The validity of tremor.praat’s periodicity mea-sures has not yet been confirmed, but the superiority of the algorithm in the previouslyconsidered parameters is a promising prerequisite and justifies its use.

The measure used to assess frequency tremor periodicity in tremor.praat is frequencytremor cyclicality (FTrC), which is technically speaking the pitch contour’s autocorre-lation coefficient, or to put it in Praat’s terminology "the degree of periodicity" of thecontour’s strongest pitch candidate, labeled the frequency’s "strength". The algorithmcurrently only provides accurate cyclicality results for a single (frequency or amplitude)modulation. When more modulations are present in the signal, they interfere with thecyclicality calculation, producing errors. The algorithm is presently being extended tohandle more than one modulation, but in the following experiment only sounds with asingle modulation were considered for cyclicality extraction.

6.2. Aims and objectives

The core aim of this experiment was to explore how the (ir)regularity of the modulatingwaveform impacts the perception of vocal tremor severity. For this reason, its periodicityhad to be altered. This was achieved by manipulating the contour through the additionof Gaussian noise, which essentially impacts how well the signal autocorrelates withitself. The secondary aim of this experiment was to explore the adequacy of FTrC

as a measurement of frequency tremor periodicity and to assess its predictive strength.Additionally, this experiment sought to validate the findings of the preceding experimentin regards to the predictors "higher FTrP1" and FTrP1,1 - FTrP1,2 interdistance.

In particular, the following questions were examined:

59

Page 77: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

1. How does added noise on the modulating wave form impact the perceived severityof vocal tremor? Does the perceived severity of vocal tremor increase with greaterlevels of noise?

2. How does the frequency power index relate to the perceived severity of vocaltremor? Does the perceived severity of vocal tremor increase with a higher fre-quency tremor power index?

The experiment included noisy sounds with a single frequency tremor frequency andnoiseless sounds containing two frequency tremor frequencies. This resulted in the fol-lowing sound pair combinations: noise – noise, noise – no noise, no noise – no noise, seesection 6.3 for more details. Due to the limitations explained in 6.1 in regards to theanalysis script, cyclicality can currently only be accurately measured for sounds con-taining a single frequency tremor frequency. Consequently, the following question couldonly be analyzed for the noise – noise sound pairs:

3. How does tremor cyclicality influence the perceived severity of vocal tremor? Doesthe perceived severity of vocal tremor increase with decreasing cyclicality?

The experiment was designed in such a way that the noise – no noise combinations haddifferent frequency tremor frequencies. Out of the two frequency tremor frequenciesof the noiseless sounds one was either higher, lower or equal to the single frequencytremor frequency of the noisy sounds. This allowed for the exploration of the followingadditional question and subquestions:

4. When comparing sounds with two superimposed frequency tremor frequencies tosounds with a single, noise-containing frequency tremor frequency, which is con-sidered more pathological?

a) Does the frequency tremor frequency influence the decision towards one di-rection or the other?

b) Is there a noise intensity threshold involved which facilitates the decisiontowards one direction or the other? If so, when is this threshold reached?

60

Page 78: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

6.2.1. Research questions

For a better overview, the formalization of questions 1 to 4 is presented in three separateblocks, based on the fact that either different statistical methods or different subgroupsof data were used.

6.2.1.1. Hypotheses formalization: questions 1 and 2

The formalization of the first two questions, as well as the aim to validate the aforemen-tioned predictors lead to following four hypotheses:

Hypothesis 1 (H1): When comparing voices that each contain one (or two) frequencytremor frequencies of particular intensities, the voice containing the higher noise inten-sity factor (NIF) is perceived as more pathological.

Hypothesis 2 (H2): When comparing voices that each contain one (or two) frequencytremor frequencies of particular intensities, the first voice is more often perceived asmore pathological with increasing NIF difference (in non-absolute metrics).

Hypothesis 3 (H3): When comparing voices that each contain one (or two) frequencytremor frequencies of particular intensities, the voice containing the higher FTrP1 of thetwo is perceived as more pathological.

Hypothesis 4 (H4): When comparing voices that each contain one (or two) frequencytremor frequencies of particular intensities, the first voice is more often perceived as morepathological with increasing interdistance (in non-absolute metrics) between FTrP1,1 andFTrP1,2.

The directions used in H2 and H4 were established based on the premise that an increas-ing difference or interdistance value translates as the first sound having the larger value(of the measurement in question). It is expected that the sound containing the highervalue is perceived as more pathological.

6.2.1.2. Hypotheses formalization: question 3

The formalization of question 3 lead to the following two hypotheses:

61

Page 79: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Hypothesis 5 (H5): When comparing voices that each contain added noise, the secondvoice is more often perceived as more pathological with increasing FTrC difference (innon-absolute metrics).

Hypothesis 6 (H6): When comparing voices that each contain added noise, the voicecontaining the lower FTrC value is perceived as more pathological.

The direction used in H5 was deduced based on the premise that an increasing cyclicalitydifference translates as the second sound having the lower cyclicality value. It is expectedthat the sound with the lower cyclicality value is perceived as more pathological.

6.2.1.3. Expectations: question 4

The expected tendency when comparing sounds that contain two superimposed fre-quency tremor frequencies to sounds with a single, noise-containing frequency modula-tion is the following, based on the individual frequencies involved:

When the noiseless sound contains a higher frequency tremor frequency, it is more oftenperceived as more pathological until a certain noise intensity threshold is reached. Fromthat point on, the noise intensity dominates over the frequency tremor frequency andthe noisy sound is considered more severe.

When the noiseless sound contains a lower frequency tremor frequency, it is more oftenperceived as less pathological. This tendency becomes more evident with increasingnoise intensity of the noisy sound.

When the noiseless sound contains an equal frequency tremor frequency, it is perceivedas equally pathological until a certain noise intensity threshold is reached. From thatpoint on, the noise intensity dominates over the frequency tremor frequency and thenoisy sound is considered more severe.

6.3. Acoustic methods

Sustained vowels with frequency tremor were created in Praat, applying the samesource signal and tremor generation procedures and identical manipulation argumentsto the ones used in Experiment A. The only difference was that the synthesis formula

62

Page 80: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

was adapted a second time to include the addition of Gaussian noise to the modulatingwaveform.

As demonstrated by the Central Limit Theorem, the probability density function ofGaussian noise is equal to that of the normal distribution. As such, the probabilitydensity function p of a random variable u is given in Vary et al. (1998) by:

p(u) = 1σ ·

√2 · π

· e− (u−µ)2

2·σ2 (6.1)

where:

u: a continous variable (−∞ ≤ u ≤ ∞)µ: mean of the (amplitude) distributionσ: standard deviation of the (amplitude) distribution

Initially, the attempt was made to add randomly generated Gaussian noise as a syn-thesis term directly into the synthesis formula. However, this resulted in every soundbeing synthesized with a different noise contour and consequently with different prop-erties. Therefor, it was essentially impossible to compare the sounds to each other.The makeshift solution consisted in generating a single noise contour instead. This wasmultiplied with a constant noise intensity factor, making it scalable, and then added sam-plewise to the tremor contour. The addition of noise had an impact on the cyclicality ofthe sounds, because it reduced their autocorrelation coefficients.

The arguments for this noise contour were set to µ = 0 and σ = 0.1. It is necessary tokeep the mean µ of the amplitude values at 0 to avoid DC bias. A sampling frequencyof 500 Hz was selected to match the synthesis time step used for the tremor synthesis.The noise intensity factor (NIF) ranged from 10 to 100, so that roughly two thirds ofthe random values deviated max. ±10 Hz around the fundamental frequency. High σ

or NIF values are problematic because of increasing errors during the pitch extractionwhich result in gross artifacts that reduce the perceived naturalness of the final signal.By giving σ such a low value and limiting the maximum NIF to 100, the probabilityof occurring intensity values outside the desired range was minimized. However, –dueto the random nature of noise– amplitude outliers cannot be completely avoided. Theresulting synthesis equation is given by:

63

Page 81: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

F0M(t) = F0,s + FTrI · F0 · sin (FTrF · 2π · t) + n(t) · In − decF · t (6.2)

where:

t: a certain time point in sF0M(t): resulting modulated pitch at time t in HzF0,s: fundamental frequency at t = 0 in HzFTrI: frequency tremor intensity index in %F0: mean fundamental frequency in HzFTrF : frequency tremor frequency in Hzn(t): noise function at time t

In: noise intensity factordecF : linear decline of fundamental frequency in Hz/s

As in the preceding experiments, the sound duration was set to 3 seconds, the meanfundamental frequency (F0) to 200 Hz and the linear decline to 10 Hz/s for all resultingsounds. Ten noise containing sounds were generated with a constant FTrF set at 5 Hzand a constant FTrI set at 10 %. The noise intensity factor was increased in stepsof 10 in the aforementioned range. Additionally, three sounds without noise, but withtwo frequency tremor frequencies were generated using the modified synthesis script ofExperiment A. The ratio of the two FTrFs was kept constant at a P5 interval for allthree sounds, but the FTrFs varied in such a way that the FTrF1 was once lower, onceequal and once higher than the 5 Hz used for the noise containing sounds. This resultedin a total of thirteen sounds, see table 6.3.

Finally, the amplitude peaks of all sounds were scaled to 0.9 to avoid clipping and thelast three periods were faded out to counteract "plopping" artifacts at the end of eachsound.

6.4. Perceptual methods

As in the preceding experiments, a pairwise forced A-B comparison in one directionwas used to rate a total of 78 sound pairs, which were divided into three blocks of 26pairs. For more details refer to section 4.4. The sounds pairs included the following

64

Page 82: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

F T rF1 F T rF2 F T rI1 F T rI2 Noise intensity factor Cyclicalityin Hz in Hz in % in %5.000 – 10 – 10 0.9995.000 – 10 – 20 0.9985.000 – 10 – 30 0.9955.000 – 10 – 40 0.9925.000 – 10 – 50 0.9875.000 – 10 – 60 0.9845.000 – 10 – 70 0.9785.000 – 10 – 80 0.9705.000 – 10 – 90 0.9575.000 – 10 – 100 0.9436.000 4.000 10 10 – ?5.000 3.333 10 10 – ?3.000 2.000 10 10 – ?

Table 6.3.: Experiment B: noise levels and frequency ratios. The sounds with two frequencytremor frequencies have unknown cyclicality values, since the analysis algorithm hasnot yet been extended to handle more than one FTrF .

combinations: a) noise - noise (n = 45), b) noise – no noise (n = 30) and c) no noise –no noise (n = 3).

6.4.1. Experiment procedure

The experiment procedure was the same as described in section 5.4.1. The experimenttook place after Experiment A, preceded by a 10 minute break. As is in the previousexperiments, the subjects were asked to judge which of the presented sounds soundedmore pathological. The experiment duration was approximately 20 minutes.

6.4.2. Subject profile

The subjects who took part in this experiment were the same as in the previous one, referto section 5.4.2 for the full subject profile. All thirty-four raters completed ExperimentB and were included into the subsequent analysis.

65

Page 83: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

6.5. Statistical methods

The interrater agreement was calculated in the same way as in the preceding experiments.In order to assess possible differences between the existing data subgroups, the ICC wasdetermined for all data, as well as for the noise – noise and noise – no noise combinationsseparately.

Simple linear regressions were calculated to determine the proportion of variance ex-plained by H1 – H4 using the entire data set. Regressions were also calculated on thenoise – noise subset to assess the variance explained by H5 and H6. The dependent andindependent variables, as well as the corresponding levels of measurement used for theregressions are shown in table 6.4.

HypothesisDependent Independent Measurement

variable variable levelH1 mean rating comparison of noise intensity factor nominalH2 mean rating noise intensity factor difference intervalH3 mean rating comparison of FTrP1s nominalH4 mean rating FTrP1,1 – FTrP1,2 interdistance intervalH5 mean rating cyclicality difference intervalH6 mean rating comparison of FTrCs nominal

Table 6.4.: Experiment B: dependent and independent variables with corresponding levels ofmeasurement

Thereafter, a multiple regression analysis was calculated with the variables deduced fromH1 – H4 to determine the final prediction model for the entire data set. The entry andexit criteria remained the same as in the previous experiments. The between variablecorrelations were calculated using a two-tailed test of significance with an alpha levelof 0.05.

In order to address the last question described in section 6.2 the 30 noise – no noisesound pairs were divided into the following three subgroups à 10 sound pairs: a) noise– high frequency, b) noise – mid frequency, c) noise – low frequency. Since the soundpairs were not always presented with the noisy sound occurring first, all pairs had tobe switched to one side and their means reversed accordingly. These groups were thendivided into two categories, based on their noise intensity factor: a) low noise (10-30)and b) high noise (40-100). The noise group classification was based on a first visual

66

Page 84: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Figure 6.1.: Intraclass correlation coefficients (ICC) of data subsets with 95% confidence inter-vals

inspection of the data. This allowed for a comparison of the means, based on the definednoise and frequency subgroups. For this purpose, 95 % confidence intervals were plottedin Matlab (R2017a). The comparison of means was done descriptively, given that theavailable data set was too small to perform t-tests.

6.6. Results

The ICC yielded a highly significant, excellent interrater agreement for all data(ICC(A, k) = 0.918, ICC(A, 1) = 0.247, p < 0.001). Amongst the examined data sub-groups, the interrater agreement for the noise – noise combinations (ICC(A, k) = 0.948,ICC(A, 1) = 0.348, p < 0.001 ) was higher than for the noise – no noise combinations(ICC(A, k) = 0.798, ICC(A, 1) = 0.103, p < 0.001). As illustrated by figure 6.1 thisdifference was significant. The group measure was quite high in all three cases, justifyingthe averaging of the ratings for further analysis.

As shown by the linear regression analysis, "higher noise intensity factor" (H1) was ahighly significant predictor (F (1, 76) = 135.458, p < 0.001), explaining 63.6 % (R2 =

67

Page 85: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

0.641, adj. R2 = 0.636) of the total rating variance. The correlation was positive,confirming the predicted trend, see figure 6.2a. This means that in those cases wherethe hypothesis predicted the second sound being perceived as more pathological on thebasis of a higher noise intensity factor, the mean ratings tended towards 2.

The regression results further indicated that noise intensity factor difference (H2) wasa highly significant predictor (F (1, 76) = 145.266, p < 0.001), accounting for 65.2 %(R2 = 0.657, adj. R2 = 0.652) of the total rating variance. The correlation was negative,see figure 6.2b. This verifies the expected tendency of the first sound being perceivedas more pathological with increasing noise intensity factor difference (in non-absolutemetrics).

The "higher FTrP1" variable (H3) yielded a positive correlation which did not prove tobe statistically significant, see figure 6.2c. Consequently it can be dismissed as a singlepredictor for the perceived severity of vocal tremor within the current framework.

The regression analysis showed that FTrP1,1 – FTrP1,2 interdistance (H4) was a sig-nificant predictor (F (1, 76) = 6.253, p < 0.05), explaining 6.4 % (R2 = 0.076, adj.R2 = 0.064) of the total rating variance. The correlation was negative, confirming theexpected direction of the first sound being perceived as more pathological with increasingFTrP1,1 – FTrP1,2 interdistance (in non-absolute metrics), see figure 6.2d.

The regression results demonstrated that FTrC difference (H5) was a highly significantpredictor (F (1, 43) = 150.920, p < 0.001) that accounted for 77.3 % (R2 = 0.778, adj.R2 = 0.773) of the rating variance in the noise – noise sound pairs. The correlation waspositive, see figure 6.2e. This verifies the expected tendency of the second sound beingperceived as more pathological with increasing cyclicality difference (in non-absolutemetrics).

Finally, the regression analysis indicated that "lower FTrC" (H6) was a highly significantpredictor (F (1, 43) = 297.609, p < 0.001), explaining 87.1 % (R2 = 0.874, adj. R2 =0.871) of the rating variance in the noise – noise sound pairs. The correlation waspositive, confirming the predicted trend, see figure 6.2f. This means that in those caseswhere the hypothesis predicted the second sound being perceived as more pathologicalon the basis of a lower cyclicality value, the mean tended towards 2.

68

Page 86: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

(a) H1: positive correlation (b) H2: negative correlation

(c) H3: positive correlation (d) H4: negative correlation

(e) H5: positive correlation (f) H6: positive correlation

Figure 6.2.: Experiment B: linear regression plots (H1 – H6)

The two cyclicality predictors were not considered in the calculation of the multipleregression analysis, given that they only applied to a subgroup of the total data. Thefinal model was reached in four steps and contained all four inserted predictors: "highernoise intensity factor", noise intensity factor difference, "higher FTrP1" and FTrP1,1 –FTrP1,2 interdistance. It was highly significant (F (4, 73) = 67.767, p < 0.001), account-ing for 77.6 % (R2 = 0.788, adj. R2 = 0.776) of the total rating variance. Out of thefour predictors "higher FTrP1" had the highest influence on the mean rating variable,followed by "higher noise intensity factor", FTrP1,1 – FTrP1,2 interdistance and finally

69

Page 87: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

noise intensity factor difference (compare standardized coefficients β on page C10). Themodel’s estimation function is given by:

y =β0 + β1 · x1 + β2 · x2 + β3 · x3 + β4 · x4

y =0.243 + (−0.003 · NIF difference) + (0.232 · higher NIF) +

+ (0.618 · higher FTrP1) + (0.391 · FTrP1,1−1,2)

(6.3)

This allows for the following interpretation: sounds with a higher noise intensity factorand a higher frequency power index are perceived as more pathological. Additionally,subjects find it increasingly easier to rate the perceived severity of vocal tremor whenthe noise intensity factor difference of the presented sounds is large and the FTrP1s arefarther apart.

The between variables correlations are illustrated in a matrix, see table 6.5. Thestrongest correlations are found between the noise intensity factor (H1 and H2) andFTrP (H3 and H4) variables respectively. In both cases, the variables express thesame measure in different scales (nominal, interval). The correlations are highly signif-icant (p < 0.001) and negative (r = −0.847 and r = −0.874). This means that when"higher NIF" and "higher FTrP1" each tend to 2, the noise intensity factor differenceand FTrP1,1 – FTrP1,2 interdistance decrease, signalizing that the second sound hasthe higher NIF and FTrP1 value, see figures 6.3a and 6.3b.

Table 6.5.: Experiment B: between variables correlation matrix

70

Page 88: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

(a) H1 – H2 : negative correlation (b) H3 – H4: negative correlation

Figure 6.3.: Experiment B: between variables correlations – linear regression plots

The comparison of means amongst the noise – no noise sound pairs confirmed the ex-pected tendencies described in section 6.2.1.3. For a better overview, the results arelisted separately for each of the three frequency dependent subgroups:

In the noise – high frequency subgroup, the mean tended towards the second, noiselesssound being perceived as more pathological (mean: 1.66), until a noise intensity factorthreshold of 30 was reached. From that point on, the first (noisy) sound tended to beperceived as more pathological on average (mean: 1.38), but the means did not linearlydecline with increasing NIFs. At a noise intensity factor of 60 they even swapped towardsthe noiseless sound again, see figure 6.4a.

In the noise – low frequency subgroup, the mean tended towards the first (noisy) soundbeing perceived as more pathological for both the low and high noise category (mean:1.32, mean: 1.27). This trend became on average more evident in the higher noisecategory, see figure 6.4b.

In the noise – mid frequency subgroup, subjects were indecisive in their rating for thelow noise category (mean: 1.52) with the mean tending towards the rating midpoint upuntil a NIF threshold of 40. After this point, the means tended on average towards thefirst (noisy) sound being perceived as more pathological (mean: 1.30), but they did notlinearly decline with increasing NIFs, see figure 6.4c.

As illustrated by the 95% confidence interval plots, the means of the two noise categoriesdiffered significantly from each other in all three frequency subgroups.

71

Page 89: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

(a) Noise – high frequency (b) Noise – low frequency

(c) Noise – mid frequency

Figure 6.4.: Comparison of means in frequency subgroups. 95% confidence interval plots for allthree frequency subgroups a) noise – high frequency, b) noise – low frequency, c)noise – mid frequency. Blue boxes indicate the 10-30 noise category, red boxes the40-100 noise category. The height of the boxes show the upper/lower CI limits.

6.7. Discussion

This experiment demonstrated that the (a)periodicity of the modulating waveform playsa prominent perceptual role for the overall perceived severity. As expected, sounds witha high noise intensity factor were perceived as being more pathological than those witha low NIF. These findings are reflected by the results of the linear regressions, with"higher NIF" and the NIF difference explaining 63.6 % and 65.2 % of the rating variancerespectively. Subjects tended to be more homogenous in their ratings when the NIFdifference between sounds was large, but struggled when the NIF values of the presented

72

Page 90: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

sounds were similar. On average though, subjects seemed to be able to reliably identifythe noisier of two sounds, even when these differed by a mere NIF step of 10 in thisexperimental setup. A close inspection of the noise – noise subset revealed that the onlyobserved errors occurred at a low noise range of 10-20 and 20-30.

As opposed to the previous experiment, the frequency power index predictors explainedvery little of the rating variance on their own. In fact, "higher FTrP1" did not evenreach statistical significance. When taking into account that the FTrP values in thisexperiment were equal for the majority of the sounds, this result should not be surpris-ing. However, the lack of significance as a single predictor should not be mistaken tomean that the variable in question does not contribute in explaining the total ratingvariance. This becomes evident when reviewing the multiple regression results and thestandardized coefficients in particular. These show that "higher FTrP1” have the highestinfluence on the mean variable within the final predictive model.

The interrater agreement in this experiment was excellent, though slightly lower thanin the previous experiment. Amongst the ICCs of the data subsets, the noise – no noisesubgroup differed significantly from the rest, yielding the lowest agreement. This impliesthat subjects found it particularly difficult to rate these sound pairs as compared to thenoise – noise combinations and/or that they used different rating strategies.

The detailed analysis of the noise – no noise sound pairs confirmed the expected turningpoints in the high and mid frequency subgroups. This indicates two things. Firstly, thata certain noise intensity threshold exists, after which the amount of noise present in thesignal dominates perceptually over the frequency tremor frequencies. Secondly, that thisthreshold is different for the individual frequency subgroups. The comparison of the midFTrF noisy sounds to the superimposed low FTrFs noiseless sounds also shows thatthe presence of a second frequency tremor frequency is perceptually less relevant in thepresence of a noisy sound with a single, higher FTrF . These findings are particularlyimportant when considering that in natural voices, a certain amount of noise is alwayspresent and that this noise may be masking other phenomena, or amplifying the overallperceived severity. They are also in line with the results reported by Kreiman et al.(2003), who observed that in the presence of noise listeners are less focused on thevariability of the F0 contour.

Though further analysis is necessary, this experiment validated cyclicality as an adequatemeasure for tremor periodicity. An increasing noise intensity factor is correlated with alower cyclicality value. Though in this case the decline was linear, this may not always be

73

Page 91: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

the case due to the random nature of noise and the possibility of amplitude outliers. Outof the two cyclicality predictors, "lower cyclicality" had the highest predictive strength,accounting for 87.1 % of the rating variance in the noise – noise subset. This is hardlysurprising, given that cyclicality was the only changing parameter in this subset. Forbrevity purposes, the linear regression calculated to test the predictive strength of theNIF variables for this data subset were not listed under section 6.6, but additionalanalysis revealed that "lower cyclicality" proved to be a better predictor than "higherNIF". In contrast, cyclicality difference was a worse predictor than noise intensity factordifference. This is to be expected, given that NIF difference draws on the actual NIFvalues, whereas cyclicality difference does not.

The observations of this experiment are based on generated pseudo noise with constantproperties for all sounds. This was a necessary simplification which made it possible tocompare sounds. However, noise in natural voices is time-variant not only in the aspectthat it may decrease and increase throughout the process of phonation, but also becauseit may have an intermittent character. This makes it far more complex to actuallyassess the perceived severity of natural voices. As such, the thresholds identified here areentirely imposed by the experiment design and only apply to the current framework.

74

Page 92: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

7. Conclusion

The task of linking perceptual correlates to acoustic variables is one that is ongoing andparticularly challenging for a number of reasons. Firstly, the holistic perception of nat-ural speech signals makes it difficult to isolate certain properties, in order to determinehow they are perceived. Secondly, in lack of a reliable model of voice quality, the ques-tion of how to best measure these specific aspects in perceptual terms remains open.Lastly, the relationship between physiology and acoustics on one hand, and acousticsand perception on the other is dominated by what is known as the lack of invarianceproblem. This means that there are no constant relationships between either of the twoand that a certain acoustic pattern may not only be produced by varying physiologi-cal mechanisms, but also be perceived differently in dependence of its context and thelistener’s disposition.

In order to bypass some of these complications, the present thesis employed syntheticallygenerated sounds, which allows for a controlled manipulation of specific vocal tremorparameters, while other factors are kept constant. Instead of attempting to target sin-gular auditory aspects through the use of semantic descriptors and due to the overalluncertainty in regards to suitable metrics, the overall perceived impairment, generatedthrough the interplay of various parameters, was assessed by means of forced-choice pair-wise comparisons. Finally, quasi-stationarity was simulated through the use of sustainedvowel phonation. On the basis of these simplifications several observations were made.The most important findings of the thesis are summarized here:

• Frequency tremor frequency is a highly significant predictor of vocal tremor severitywith higher frequencies generally being perceived as more severe when all otherfactors are kept constant.

• Listeners find it increasingly easier to make their judgement when the differencebetween the frequency tremor frequencies of two sounds becomes larger. Whensounds have two frequency tremor frequencies, the difference between the higher

75

Page 93: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

frequencies is perceptually more relevant than the one between lower frequencies,explaining roughly one fourth of the total rating variance.

• Frequency tremor intensity is a highly significant and even more powerful predictorthan frequency tremor frequency, with higher frequency tremor intensity indicesbeing associated with a more severe vocal tremor. This generally holds true even inthose cases, in which the sound with the higher tremor intensity exhibits a muchlower frequency in comparison to the other sound. Frequency tremor intensityalone accounts for half of the total rating variance.

• The most informative predictors of vocal tremor severity in clean, noiseless soundsare the highly significant, weighted frequency tremor power indices. These combineboth frequency tremor frequency and -intensity measures. Higher values contributeto the auditory severity of vocal tremor, explaining up to 80 % of the total ratingvariance.

• As with frequency tremor frequency, listeners find it increasingly easier to rate theseverity of vocal tremor when the difference between the frequency tremor powerindices of two sounds becomes larger.

• Frequency tremor cyclicality accurately portrays the tremor’s periodicity. Lowercyclicality values correspond to higher noise levels and add to the overall perceivedseverity.

• Listeners reliably distinguish between sounds with varying degrees of noise and areable to detect even small differences in the overall noise level with errors occurringmostly at low levels.

• A threshold exists, after which the noise level in the sound dominates perceptuallyover the frequency tremor frequencies. This threshold changes in dependence ofthe frequencies compared.

• The difference between the power indices of a sound, which is essentially a weightedbeat measure, is a significant predictor of vocal tremor severity, but contributesonly about a tenth in explaining the overall rating variance.

• The harmonicity of two superimposed frequency tremor frequencies does not seemto play a significant role in the perception of vocal tremor severity.

76

Page 94: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

• Informal descriptions reveal that superimposed frequency tremor modulations areperceived as rhythmical events, varying in their speed and degree of regularity.

Though these findings contribute in comprehending the intricacies of vocal tremor per-ception, it should be noted that they are at best a rough approximation of real casescenarios. The employed abstractions are methodologically necessary, but they do thecomplexity of the matter little justice and are therefor a justifiable, but nevertheless con-siderable limitation. Consequently, it is unclear whether and, if so, how accurately theidentified predictors describe the auditory sensation of vocal tremor in natural speechsignals. In natural phonation, modulations are likely to be much more subtle (eitherdue to their nature, or because they are masked by interfering factors), but also a lotmore erratic. Prior to validating these predictors on real phonation data, more exper-imental work is needed, especially in regards to emerging interactions between singleparameters.

Such interactions could appear when certain parameters are no longer kept constant.Anand et al. (2012) for example observed an effect of fundamental frequency on theperceived severity of vocal tremor, with low F0 voices being considered as more severelyaffected. When examining different fundamental frequencies, one should keep in mindthat other psychoacoustic effects, such as the frequency dependent loudness perceptiondepicted in equal loudness curves (Zwicker and Fastl, 2007), may also play a rolein the perceived severity of vocal tremor. Additionally, the findings of Mertens et al.(2015) suggest that some vocal tremor measures may significantly differ between genders.These implications are worth analyzing further.

Due to the limited framework of the present thesis, variations in the intensity of thesignal, expressed through amplitude tremor measures, could not be considered at all.However, as explained by theory and shown in past trials, frequency- and amplitudemodulations almost certainly coexist in vocal tremor. The question of how these am-plitude modulations are perceived, to what degree they contribute in explaining thevariance of severity ratings and how dominant they are when other factors are present,remains to be explored in future research.

Though this thesis was able to validate the adequacy of frequency tremor cyclicality asa measure of tremor periodicity, its informative strength as a predictor of vocal tremorseverity has not yet been sufficiently addressed. Upcoming experiments will focus onassessing the influence of tremor periodicity when coupled with other vocal tremor pa-rameters. It should also be noted that although lower cyclicality values were associated

77

Page 95: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

with a more severe vocal tremor in the current thesis, this type of relation is one that isartificially induced by the experiment design, because a lower cyclicality value manda-torily corresponded to more noise. This noise distracted from the otherwise perfectlycyclical attributes of a synthetically generated tremor, making it sound less synthetic (atleast at low levels of noise), but also to be judged as more pathological, given that theadded noise was the only source of change. The opposite may occur in natural voices.Since natural speech signals are far less periodic, higher cyclicality values may cause thevocal tremor to become more noticeable and therefor to be judged as more severe.

In light of the high values chosen in these experiments to simulate vocal tremor, themodest contribution of beat frequency measures in explaining the overall rating varianceis understandable. These predictors may still prove to be valuable assets in differentsettings. It may be worth systematically analyzing their predictive strength in thepresence of less prominent frequency tremor parameters.

The informal descriptions given by listeners in regards to the rhythmic sensations evokedby vocal tremor should also be analyzed more in depth. A systematic evaluation of theperception of tremor rhythm may provide further insights and explain some of the ratingstrategies used. As with all aspects of voice quality, the question of how to properly defineand assess tremor rhythm remains a challenge.

Furthermore, the current thesis made no attempt to distinguish between expert andnaive listeners. However, it is sensible to assume that the severity of vocal tremor maybe perceived differently in dependence of the rater experience and background. Thesepotential differences are also worth pursuing further.

This thesis has highlighted the superiority of psychoacoustically inspired measures aspredictors of vocal tremor severity. Though tremor.praat is still under developmentit is currently the only open source algorithm utilizing such measures. Consequently,it is worth developing and refining further in order to minimize any occurring errors.A potential error source, especially in noisy data, is the F0 extraction. As suggestedby Jouvet and Laprie (2017) Praat’s autocorrelation-based F0 extraction producesless frame errors for both clean and noisy data than the one based on cross-correlation.tremor.praat currently operates with the latter. It may be interesting to test whethertremor.praat’s performance is further improved through the use of autocorrelation.

Further insight on the mentioned aspects of vocal tremor perception will enable a morepractically oriented way of assessment that takes psychoacoustic aspects into account.

78

Page 96: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

This could be especially valuable in clinical environments, where a primary concern ofboth patients and physicians is the ability to measure and track the degree of deteriora-tion or improvement in a manner that agrees with perception.

79

Page 97: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Bibliography

Anand, S. et al. (2012), “An acoustic-perceptual study of vocal tremor”. In: Journal ofVoice 26 (6), pp. 811.e1–811.e7.

Aronson, A. E. et al. (1992), “Rapid voice tremor, or "flutter" in amyotrophic lateralsclerosis”. In: Annals of Otology, Rhinology & Laryngology 101, pp. 511–518.

Barkmeier-Kraemer, J. and Story, B. (2010), “Conceptual and clinical updates onvocal tremor”. In: American Speech-Language-Hearing Association (ASHA) Leader 15(14).

Bhatia, K. P. et al. (2018), “Consensus Statement on the Classification of Tremors.From the Task Force on Tremor of the International Parkinson and Movement DisorderSociety”. In: Movement Disorders 33 (1), pp. 74–87.

Bidelman, G. M. and Krishnan, A. (2009), “Neural correlates of consonance, disso-nance, and the hierarchy of musical pitch in the human brainstem”. In: The Journalof Neuroscience 29 (42), pp. 13165–13171.

Boersma, P. and Weenink, D. (2017), Praat: doing phonetics by computer. [Computerprogram, Version: 6.0.35]. University of Amsterdam.

Bortz, J. and Döring, N. (2006), Forschungsmethoden und Evaluation für Human-und Sozialwissenschaftler, 4. Auflage. Berlin: Springer, pp. 181–185.

Brückl, M. (2011). In: Altersbedingte Veränderungen der Stimme und Sprechweise vonFrauen. Ed. by W. F. Sendlmeier. Mündliche Kommunikation, Band 7. Berlin:Logos Verlag, p. 103.

Brückl, M. (2012), “Vocal Tremor Measurement Based on Autocorrelation of Con-tours”. In: Proceedings of INTERSPEECH 2012, 13th Annual Conference of the In-ternational Speech Communication Association. Portland (OR), pp. 715–718.

80

Page 98: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Brückl, M. (2015), “Measurement of Tremor in the Voices of Speakers with Parkinson’sDisease”. In: Proceedings of the 1st International Conference on Natural Language andSpeech Processing. Algiers, pp. 44–48.

Brückl, M. (2017), tremor.praat (Version 3.01). [Computer program, Version 3.01].Technische Universität Berlin. Berlin, Germany. url: http://brYkl.de/tremor3.

01.zip (visited on 07/28/2018).

Brückl, M. (2018), irrNA: Coefficients of Interrater Reliability – Generalized for Ran-domly Incomplete Datasets. [R package, Version 0.1.4]. Technische Universität Berlin.Berlin, Germany. url: https : / / cran . r - project . org / web / packages / irrNA /

index.html (visited on 07/28/2018).

Brückl, M. et al. (2017), “Acoustic Tremor Measurement: Comparing Two Systems”.In: Proceedings of the 10th International Workshop on Models and Analysis of VocalEmissions for Biomedical Applications (MAVEBA). Florence, pp. 19–22.

Buder, E. H. and Strand, E. A. (2003), “Quantitative and Graphic Acoustic Analysisof Phonatory Modulations: The Modulogram”. In: Journal of Speech, Language, andHearing Research 46, pp. 475–490.

Cicchetti, D. V. (1994), “Guidelines, criteria, and rules of thumb for evaluatingnormed and standardized assessment instruments in psychology”. In: PsychologicalAssessment 6 (4), pp. 284–290.

Cnockaert, L. et al. (2005), “Fundamental frequency estimation and vocal tremor anal-ysis by means of Morlet wavelet transforms”. In: Proceedings of IEEE InternationalConference on Acoustics, Speech, and Signal Processing (ICASSP ’05. Philadelphia,PA, USA, pp. 393–396.

Cnockaert, L. et al. (2007), “Effect of Intensive Voice Therapy on Vocal Tremor forParkinson Speakers”. In: Proceedings of INTERSPEECH 2007, 8th Annual Conferenceof the International Speech Communication Association. Antwerpen, pp. 1174–1177.

Cnockaert, L. et al. (2008), “Low-frequency vocal modulations in vowels produced byParkinsonian subjects”. In: Speech Communication 50, pp. 288–300.

81

Page 99: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

DGN (2012), Deutsche Gesellschaft für Neurologie, S1-Leitlinie: Tremor, pp. 1–18. url:https://www.dgn.org/images/red_leitlinien/LL_2012/pdf/030-011l_S1_

Tremor_2012-verlaengert.pdf (visited on 07/28/2018).

Deliyski, D. D. (1993), “Acoustic model and evaluation of pathological voice produc-tion”. In: Proceedings of the 3rd Conference on Speech Communication and Technology:Eurospeech 1993. Berlin, Germany, pp. 1969–1972.

Deuschl, G. et al. (1998), “Consensus Statement of the Movement Disorder Society onTremor”. In: Movement Disorders 13 (3), pp. 2–23.

Deuschl, G. et al. (2001), “The pathophysiology of tremor”. In: Muscle & nerve 24(6), pp. 716–735.

Dromey, C. et al. (2002), “The Influence of Pitch and Loudness Changes on the Acous-tics of Vocal Tremor”. In: Journal of Speech, Language, and Hearing Research 45, pp.879–890.

Fritz, T. et al. (2009), “Universal Recognition of Three Basic Emotions in Music”. In:Current Biology 19 (7), pp. 573–576.

Gillivan-Murphy, P. (2013), “Voice tremor in Parkinson’s disease (PD). Identification,characterisation and relationship with speech, voice, and disease variables”. Disserta-tion. Institute of Health & Society, University of Newcastle, pp. 140, 151–153, 201.url: https://theses.ncl.ac.uk/dspace/bitstream/10443/2170/1/Gillivan-

Murphy%2013.pdf (visited on 07/28/2018).

Gillivan-Murphy, P. et al. (2018), “Voice Tremor in Parkinson’s Disease: An AcousticStudy”. In: Journal of Voice. ePub ahead of Print.

Harnsberger, J. D. et al. (2010), “Noise and tremor in the perception of vocal agingin males”. In: Journal of Voice 24 (5), pp. 523–530.

Helmholtz, H. (1877), On the sensations of tone as a physiological basis for the theoryof music. New York, NY: Dover Publications, pp. 182, 183, 194.

Hemmerich, A. L. et al. (2017), “The Distribution and Severity of Tremor in SpeechStructures of Persons with Vocal Tremor”. In: Journal of Voice 31 (3), pp. 366–377.

82

Page 100: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Horii, Y. (1989), “Acoustic Analysis of Vocal Vibrato: A Theoretical Interpretation ofData”. In: Journal of Voice 3 (1), pp. 36–43.

Horwitz, R. (2014), “Vocal Modulation Features in the Prediction of Major DepressiveDisorder Severity”. Master thesis. Massachusetts Institute of Technology, pp. 11–15.url: http://hdl.handle.net/1721.1/93072 (visited on 07/28/2018).

Jiang, J. et al. (2000), “Acoustic and Airflow Spectral Analysis of Voice Tremor”. In:Journal of Speech, Language and Hearing Research 43, pp. 191–204.

Jouvet, D. and Laprie, Y. (2017), “Performance Analysis of Several Pitch DetectionAlgorithms on Simulated and Real Noisy Speech Data”. In: Proceedings of EUSIPCO’2017, 25th European Signal Processing Conference. Kos, Greece, pp. 1614–1618.

Kaestner, G. (1909), “Untersuchungen über den Gefühlseindruck unanalysierter Zweik-länge”. In: Psychologische Studien, Band 4. Ed. by W. Wundt. Leipzig: Engelmann,pp. 473–504.

Kempster, G. B. et al. (2009), “Consensus Auditory-Perceptual Evaluation of Voice:Development of a Standardized Clinical Protocol”. In: American Journal of Speech-Language Pathology 18, pp. 124–132.

Kienast, M. (2002), Phonetische Veränderungen in emotionaler Sprechweise. Berlin:Shaker Verlag, pp. 28–33.

Koda, J. and Ludlow, C. L. (1992), “An evaluation of laryngeal muscle activation inpatients with voice tremor”. In: Otolaryngology – Head and Neck Surgery 107 (5), pp.684–696.

Kreiman, J. and Gerratt, B. R. (2010), “Perceptual Assessment of Voice Quality:Past, Present and Future”. In: Perspectives on Voice and Voice Disorders 20 (2), pp.62–67.

Kreiman, J. et al. (2003), “Perception of Vocal Tremor”. In: Journal of Speech, Lan-guage, and Hearing Research 46, pp. 203–214.

Lester-Smith, R. A. and Story, B. H. (2015), “The effects of physiological adjust-ments on the perceptual and acoustical characteristics of simulated laryngeal vocaltremor”. In: The Journal of the Acoustical Society of America 138 (2), pp. 953–963.

83

Page 101: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Lester-Smith, R. A. and Story, B. H. (2016), “The effects of physiological adjust-ments on the perceptual and acoustical characteristics of vibrato as a model of vocaltremor”. In: The Journal of the Acoustical Society of America 140 (5), pp. 3827–3833.

Linville, S. E. (2000), “The Aging Voice”. In: Voice Quality Measurement. Ed. by R.Kent and M. Ball. San Diego, CA: Singular, pp. 359–376.

Lots, I. S. and Stone, L. (2008), “Perception of musical consonance and dissonance:an outcome of neural synchronization”. In: Journal of the Royal Society Interface 5(29), pp. 1429–1434.

Ludlow, C. L. et al. (1986), “Phonatory characteristics of vocal fold tremor”. In: Jour-nal of Phonetics 14, pp. 509–515.

Maher, T. F. (1980), “A rigorous test of the proposition that musical intervals havedifferent psychological effects”. In: American Journal of Psychology 95 (2), pp. 309–327.

Malmberg, C. F. (1914), “The Perception of Consonance and Dissonance”. Disserta-tion. Department of Philosophy and Psychology, University of Iowa, pp. 126–127.

Matlab (R2017a), Version 9.2.0.538062. Natick, Massachusetts, USA: The MathWorksInc.

McDermott, J. H. et al. (2010), “Individual Differences Reveal the Basis of Conso-nance”. In: Current Biology 20 (11), pp. 1035–1041.

McDermott, J. H. et al. (2016), “Indifference to dissonance in native Amazoniansreveals cultural variation in music perception”. In: Nature 535, pp. 547–550.

Mertens, C. et al. (2015), “Vocal tremor analysis via AM-FM decomposition of empir-ical modes of the glottal cycle length time series”. In: Proceedings of the 16th AnnualConference of the International Speech Communication Association (INTERSPEECH2015. Dresden, pp.766–770.

Metz, S. et al. (1981), “A psychophysical study of the perception of consonance anddissonance”. In: Bulletin of the Psychonomic Society 17 (2), pp. 89–92.

84

Page 102: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Moulines, E. and Charpentier, F. (1990), “Pitch-synchronous waveform processingtechniques for text-to-speech synthesis using diphones”. In: Speech Communication 9,pp. 453–467.

Paulhus, D. L. (1991), “Measurement and control of response bias”. In: Volume 1,Measures of personality and social psychological attitudes. Ed. by J. P. Robinsonet al. Measures of social psychological attitudes. San Diego, CA, US: Academic Press,pp. 17–59.

Plack, C. J. (2010), “Musical Consonance: The Importance of Harmonicity”. In: Cur-rent Biology 20 (11), pp. R476–R478.

Plomp, R. and Levelt, W. J. M. (1965), “Tonal Consonance and Critical Bandwidth”.In: The Journal of the Acoustical Society of America 38, pp. 548–560.

R Core Team (2017), R: A Language and Environment for Statistical Computing,Version 3.4.3. R Foundation for Statistical Computing. Vienna, Austria. url: https:

//www.R-project.org.

Ramig, L. A. and Shipp, T. (1987), “Comparative Measures of Vocal Tremor and VocalVibrato”. In: Journal of Voice 1 (2), pp. 162–167.

Rasch, R. and Plomp, R. (1999), “The Perception of Musical Tones”. In: The Psy-chology of Music, 2nd edition. Ed. by D. Deutsch. San Diego: Academic Press, pp.89–109.

Raven, H. (2005), “Gemeinsame Kommunikationsstrategien von Sprache und Musik –Musikalische Intervalle im Grundfrequenzverlauf emotionaler Äußerungen”. In: Stimm-licher Ausdruck in der Alltagskommunikation. Ed. by W. F. Sendlmeier and A.Bartels. Mündliche Kommunikation, Band 4. Berlin: Logos, pp. 109–133.

Rosenberg, A. E. (1971), “Effect of Glottal Pulse Shape on the Quality of NaturalVowels”. In: The Journal of the Acoustical Society of America 49, pp. 583–590.

Rothman, H. B. and Arroyo, A. A. (1987), “Acoustic Variability in Vibrato and ItsPerceptual Significance”. In: Journal of Voice 1 (2), pp. 123–141.

SPSS (2017), IBM SPSS Statistics, Version 25.0. Armonk, NY: IBM Corp.

85

Page 103: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Schoentgen, J. (2002), “Modulation frequency and modulation level owing to vocalmicrotremor”. In: The Journal of the Acoustical Society of America 112 (2), pp. 690–700.

Schultz-Coulon, H. J. (1978), “The Neuromuscular Phonatory Control System andVocal Function”. In: Acta Oto-Laryngologica 86 (1–6), pp. 142–153.

Shao, J. et al. (2010), “Acoustic analysis of the tremulous voice: Assessing the utilityof the correlation dimension and perturbation parameters”. In: Journal of Communi-cation Disorders 43, pp. 35–44.

Stevens, K. N. and Hanson, H. M. (1995), “Classification of Glottal Vibration fromAcoustic Measurements”. In: Vocal Fold Physiology: Voice Quality Control. Ed. by O.Fujimura and M. Hirano. San Diego, pp. 147–170.

Sundberg, J. (1994), “Acoustic and psychoacoustic aspects of vocal vibrato”. In: SpeechTransmission Laboratory. Quarterly Progress and Status Reports (STL-QPSR) 35 (2–3), pp. 45–68.

Tanaka, Y. et al. (2011), “Vocal Acoustic Characteristics of Patients with Parkinson’sDisease”. In: Folia Phoniatrica et Logopaedica 63, pp. 223–230.

Terhardt, E. (1974), “Pitch, consonance, and harmony”. In: The Journal of the Acous-tical Society of America 55, pp. 1061–1069.

Titze, I. R. (1989), “On the relation between subglottal pressure and fundamentalfrequency in phonation”. In: The Journal of the Acoustical Society of America 85 (2),pp. 901–906.

Titze, I. R. (1994), Principles of voice production. Englewood Cliffs, NJ: Prentice Hall,p. 210.

Vary, P. et al. (1998), Digitale Sprachsignalverarbeitung. Stuttgart: B. G. Teubner, p.149.

Weenink, D. (2013), “STEVIN Can Praat”. In: Essential Speech and Language Tech-nology for Dutch: Results by the STEVIN programme. Ed. by P. Spyns and J. Odijk.Theory and Applications of Natural Language Processing. San Diego: Springer, pp.79–94.

86

Page 104: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Winholtz, W. S. and Ramig, L. O. (1992), “Vocal Tremor Analysis With the VocalDemodulator”. In: Journal of Speech and Hearing Research 35, pp. 562–573.

Yair, E. and Gath, I. (1988), “On the Use of Pitch Power Spectrum in the Evaluationof Vocal Tremor”. In: Proceedings of the IEEE 76 (9), pp. 1166–1175.

Zwicker, E. (1961), “Subdivision of the Audible Frequency Range into Critical Bands”.In: The Journal of the Acoustical Society of America 33, p. 248.

Zwicker, E. and Fastl, H. (2007), Psychoacoustics – Facts and Models, 3rd edition.Springer, pp. 203–205, 247–253.

87

Page 105: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Part III.

Appendix

88

Page 106: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

A. Preliminary experiment

MFC experiment

Intraclass correlation coefficient

ICC p -value lower CI limit upper CI limitICC(1) 0.09434231 7.640000E-12 0.05881610 0.1416893ICC(k) 0.55556332 7.640000E-12 0.42853855 0.6645367ICC(A,1) 0.09520394 3.484657E-12 0.05974442 0.1424657ICC(A,k) 0.55804176 3.484657E-12 0.43252254 0.6659989ICC(C,1) 0.09630340 3.484657E-12 0.06045033 0.1440364ICC(C,k) 0.56117119 3.484657E-12 0.43569020 0.6687960

A1

Page 107: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H1

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_1b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,212a

,045 ,037 ,2021029275

Predictors: (Constant), Hyp_1a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

,227 1 ,227 5,567 ,020b

4,820 118 ,041

5,047 119

Dependent Variable: meana.

Predictors: (Constant), Hyp_1b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_1

1,658 ,059 27,873 ,000

-,088 ,037 -,212 -2,359 ,020

Dependent Variable: meana.

Seite 1A2

Page 108: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H2

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_2b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,687a

,472 ,468 ,1502084500

Predictors: (Constant), Hyp_2a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

2,385 1 2,385 105,696 ,000b

2,662 118 ,023

5,047 119

Dependent Variable: meana.

Predictors: (Constant), Hyp_2b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_2

1,101 ,043 25,400 ,000

,282 ,027 ,687 10,281 ,000

Dependent Variable: meana.

Seite 1A3

Page 109: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H3

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_3b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,758a

,575 ,571 ,1348359521

Predictors: (Constant), Hyp_3a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

2,902 1 2,902 159,611 ,000b

2,145 118 ,018

5,047 119

Dependent Variable: meana.

Predictors: (Constant), Hyp_3b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_3

1,522 ,012 123,645 ,000

-,022 ,002 -,758 -12,634 ,000

Dependent Variable: meana.

Seite 1A4

Page 110: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Multiple linear regression: H1 - H3

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1

2

Hyp_3 . Stepwise (Criteria: Probability-of-F-to-enter <= ,050,

Probability-of-F-to-remove >= ,095).

Hyp_2 . Stepwise (Criteria: Probability-of-F-to-enter <= ,050,

Probability-of-F-to-remove >= ,095).

Dependent Variable: meana.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1

2

,758a

,575 ,571 ,1348359521

,769b

,591 ,584 ,1328630781

Predictors: (Constant), Hyp_3a.

Predictors: (Constant), Hyp_3, Hyp_2b.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

2 Regression

Residual

Total

2,902 1 2,902 159,611 ,000b

2,145 118 ,018

5,047 119

2,982 2 1,491 84,458 ,000c

2,065 117 ,018

5,047 119

Dependent Variable: meana.

Predictors: (Constant), Hyp_3b.

Predictors: (Constant), Hyp_3, Hyp_2c.

Seite 1A5

Page 111: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_3

2 (Constant)

Hyp_3

Hyp_2

1,522 ,012 123,645 ,000

-,022 ,002 -,758 -12,634 ,000

1,391 ,063 22,134 ,000

-,017 ,003 -,585 -5,816 ,000

,088 ,041 ,214 2,128 ,035

Dependent Variable: meana.

Excluded Variablesa

Model Beta In t Sig.

Partial

Correlation

Collinearity

Statistics

Tolerance

1 Hyp_1

Hyp_2

2 Hyp_1

,077b

1,196 ,234 ,110 ,865

,214b

2,128 ,035 ,193 ,346

,081c

1,273 ,205 ,117 ,864

Dependent Variable: meana.

Predictors in the Model: (Constant), Hyp_3b.

Predictors in the Model: (Constant), Hyp3, Hyp_2c.

Seite 2A6

Page 112: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

B. Experiment A

Intraclass correlation coefficient

ICC p -value lower CI limit upper CI limitICC(1) 0.3466489 0 0.2906645 0.4144247ICC(k) 0.9474773 0 0.9330306 0.9600998ICC(A,1) 0.3469185 0 0.2909378 0.4146803ICC(A,k) 0.9475365 0 0.9331113 0.9601411ICC(C,1) 0.3518549 0 0.2954749 0.4199287ICC(C,k) 0.9486057 0 0.9344669 0.9609581

B1

Page 113: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H1

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_1b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,353a

,125 ,117 ,2845472201

Predictors: (Constant), Hyp_1a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

1,362 1 1,362 16,823 ,000b

9,554 118 ,081

10,916 119

Dependent Variable: meana.

Predictors: (Constant), Hyp_1b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_1

1,177 ,084 13,942 ,000

,218 ,053 ,353 4,102 ,000

Dependent Variable: meana.

Seite 1B2

Page 114: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H2

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_2b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,452a

,205 ,198 ,2712687893

Predictors: (Constant), Hyp_2a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

2,233 1 2,233 30,345 ,000b

8,683 118 ,074

10,916 119

Dependent Variable: meana.

Predictors: (Constant), Hyp_2b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_2

1,033 ,089 11,573 ,000

,315 ,057 ,452 5,509 ,000

Dependent Variable: meana.

Seite 1B3

Page 115: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H3

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_3b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,419a

,176 ,169 ,2761491638

Predictors: (Constant), Hyp_3a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

1,918 1 1,918 25,148 ,000b

8,998 118 ,076

10,916 119

Dependent Variable: meana.

Predictors: (Constant), Hyp_3b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_3

1,065 ,091 11,660 ,000

,292 ,058 ,419 5,015 ,000

Dependent Variable: meana.

Seite 1B4

Page 116: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H4

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_4b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,473a

,224 ,217 ,2680158154

Predictors: (Constant), Hyp_4a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

2,440 1 2,440 33,968 ,000b

8,476 118 ,072

10,916 119

Dependent Variable: meana.

Predictors: (Constant), Hyp_4b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_4

1,505 ,024 61,528 ,000

-,081 ,014 -,473 -5,828 ,000

Dependent Variable: meana.

Seite 1B5

Page 117: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H5

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_5b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,358a

,128 ,121 ,2840008342

Predictors: (Constant), Hyp_5a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

1,399 1 1,399 17,34 ,000b

9,517 118 ,081

10,916 119

Dependent Variable: meana.

Predictors: (Constant), Hyp_5b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_5

1,505 ,026 58,04 ,000

-,068 ,016 -,358 -4,164 ,000

Dependent Variable: meana.

Seite 1B6

Page 118: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H6

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_6b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,198a

,039 ,031 ,2981567952

Predictors: (Constant), Hyp_6a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

,426 1 ,426 4,796 ,030b

10,490 118 ,089

10,916 119

Dependent Variable: meana.

Predictors: (Constant), Hyp_6b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_6

1,321 ,089 14,870 ,000

,121 ,055 ,198 2,190 ,030

Dependent Variable: meana.

Seite 1B7

Page 119: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H7

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_7b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,228a

,052 ,044 ,2961127726

Predictors: (Constant), Hyp_7a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

,570 1 ,570 6,497 ,012b

10,347 118 ,088

10,916 119

Dependent Variable: meana.

Predictors: (Constant), Hyp_7b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_7

1,506 ,027 55,729 ,000

-,060 ,023 -,228 -2,549 ,012

Dependent Variable: meana.

Seite 1B8

Page 120: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H8

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_8b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,714a

,509 ,505 ,2130336267

Predictors: (Constant), Hyp_8a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

5,561 1 5,561 122,535 ,000b

5,355 118 ,045

10,916 119

Dependent Variable: meana.

Predictors: (Constant), Hyp_8b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_8

,498 ,093 5,348 ,000

,674 ,061 ,714 11,070 ,000

Dependent Variable: meana.

Seite 1B9

Page 121: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H9

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_9b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,894a

,800 ,798 ,1360706636

Predictors: (Constant), Hyp_9a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

8,731 1 8,731 471,583 ,000b

2,185 118 ,019

10,916 119

Dependent Variable: meana.

Predictors: (Constant), Hyp_9b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_9

,641 ,042 15,377 ,000

,580 ,027 ,894 21,716 ,000

Dependent Variable: meana.

Seite 1B10

Page 122: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H10

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_10b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,881a

,777 ,775 ,1437579562

Predictors: (Constant), Hyp_10a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

8,478 1 8,478 410,214 ,000b

2,439 118 ,021

10,916 119

Dependent Variable: meana.

Predictors: (Constant), Hyp_10b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_10

,654 ,044 14,842 ,000

,571 ,028 ,881 20,254 ,000

Dependent Variable: meana.

Seite 1B11

Page 123: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H11

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_11b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,783a

,613 ,610 ,1892304292

Predictors: (Constant), Hyp_11a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

6,691 1 6,691 186,85 ,000b

4,225 118 ,036

10,916 119

Dependent Variable: meana.

Predictors: (Constant), Hyp_11b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_11

1,512 ,017 87,492 ,000

-,065 ,005 -,783 -13,669 ,000

Dependent Variable: meana.

Seite 1B12

Page 124: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H12

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_12b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,793a

,629 ,626 ,1852489283

Predictors: (Constant), Hyp_12a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

6,867 1 6,867 200,099 ,000b

4,049 118 ,034

10,916 119

Dependent Variable: meana.

Predictors: (Constant), Hyp_12b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_12

1,511 ,017 89,353 ,000

-,071 ,005 -,793 -14,146 ,000

Dependent Variable: meana.

Seite 1B13

Page 125: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H13

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_13b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,338a

,114 ,107 ,2862430037

Predictors: (Constant), Hyp_13a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

1,248 1 1,248 15,231 ,000b

9,668 118 ,082

10,916 119

Dependent Variable: meana.

Predictors: (Constant), Hyp_13b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_13

1,509 ,026 57,713 ,000

-,205 ,052 -,338 -3,903 ,000

Dependent Variable: meana.

Seite 1B14

Page 126: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H14

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_14b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,003a

,000 -,008 ,3041542152

Predictors: (Constant), Hyp_14a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

,000 1 ,000 ,001 ,973b

10,916 118 ,093

10,916 119

Dependent Variable: meana.

Predictors: (Constant), Hyp_14b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_14

1,506 ,028 54,235 ,000

,000 ,005 -,003 -,034 ,973

Dependent Variable: meana.

Seite 1B15

Page 127: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H15

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_15b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,055a

,003 -,005 ,3036971952

Predictors: (Constant), Hyp_15a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

,033 1 ,033 ,357 ,552b

10,883 118 ,092

10,916 119

Dependent Variable: meana.

Predictors: (Constant), Hyp_15b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_15

1,456 ,089 16,432 ,000

,034 ,056 ,055 ,597 ,552

Dependent Variable: meana.

Seite 1B16

Page 128: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

B17

Page 129: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

ANOVAa

Model

Sum of

Squares df Mean Square F Sig.

1 Regression

Residual

Total

2 Regression

Residual

Total

3 Regression

Residual

Total

4 Regression

Residual

Total

8,731 1 8,731 471,583 ,000b

2,185 118 ,019

10,916 119

9,327 2 4,664 343,429 ,000c

1,589 117 ,014

10,916 119

9,441 3 3,147 247,411 ,000d

1,475 116 ,013

10,916 119

9,816 4 2,454 256,375 ,000e

1,101 115 ,010

10,916 119

Dependent Variable: meana.

Predictors: (Constant), Hyp_9b.

Predictors: (Constant), Hyp_9, Hyp_10c.

Predictors: (Constant), Hyp_9, Hyp_10, Hyp_12d.

Predictors: (Constant), Hyp_9, Hyp_10, Hyp_12, Hyp_4e.

Seite 2B18

Page 130: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_9

2 (Constant)

Hyp_9

Hyp_10

3 (Constant)

Hyp_9

Hyp_10

Hyp_12

4 (Constant)

Hyp_9

Hyp_10

Hyp_12

Hyp_4

,641 ,042 15,377 ,000

,580 ,027 ,894 21,716 ,000

,576 ,037 15,550 ,000

,339 ,043 ,523 7,911 ,000

,284 ,043 ,438 6,625 ,000

,710 ,058 12,350 ,000

,301 ,043 ,464 6,919 ,000

,233 ,045 ,360 5,203 ,000

-,015 ,005 -,167 -2,986 ,003

1,005 ,069 14,652 ,000

,171 ,043 ,263 3,960 ,000

,167 ,040 ,258 4,146 ,000

-,035 ,005 -,396 -6,514 ,000

-,044 ,007 -,259 -6,257 ,000

Dependent Variable: meana.

Excluded Variablesa

Model Beta In t Sig.

Partial

Correlation

Collinearity

Statistics

Tolerance

1 Hyp_1

Hyp_2

Hyp_3

Hyp_4

Hyp_5

Hyp_6

Hyp_7

Hyp_8

Hyp_10

Hyp_11

Hyp_12

Hyp_13

Hyp_14

Hyp_15

,030b

,665 ,507 ,061 ,866

,011b

,237 ,813 ,022 ,754

,138b

3,292 ,001 ,291 ,890

-,094b

-2,065 ,041 -,188 ,802

-,145b

-3,583 ,000 -,314 ,939

-,108b

-2,545 ,012 -,229 ,892

,082b

1,901 ,060 ,173 ,886

,189b

3,490 ,001 ,307 ,529

,438b

6,625 ,000 ,522 ,284

-,246b

-4,143 ,000 -,358 ,424

-,277b

-4,844 ,000 -,409 ,434

,070b

1,524 ,130 ,140 ,806

,124b

3,100 ,002 ,276 ,980

,023b

,547 ,585 ,051 ,999

,006c

,148 ,883 ,014 ,858

Seite 3B19

Page 131: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Excluded Variablesa

Model Beta In t Sig.

Partial

Correlation

Collinearity

Statistics

Tolerance

2 Hyp_1

Hyp_2

Hyp_3

Hyp_4

Hyp_5

Hyp_6

Hyp_7

Hyp_8

Hyp_11

Hyp_12

Hyp_13

Hyp_14

Hyp_15

3 Hyp_1

Hyp_2

Hyp_3

Hyp_4

Hyp_5

Hyp_6

Hyp_7

Hyp_8

Hyp_11

Hyp_13

Hyp_14

Hyp_15

4 Hyp_1

Hyp_2

Hyp_3

Hyp_5

Hyp_6

Hyp_7

Hyp_8

Hyp_11

Hyp_13

,006c

,148 ,883 ,014 ,858

,065c

1,592 ,114 ,146 ,726

,039c

,933 ,353 ,086 ,725

-,097c

-2,511 ,013 -,227 ,802

-,089c

-2,397 ,018 -,217 ,875

,013c

,305 ,761 ,028 ,687

-,015c

-,378 ,706 -,035 ,756

,109c

2,208 ,029 ,201 ,490

-,155c

-2,825 ,006 -,254 ,389

-,167c

-2,986 ,003 -,267 ,372

-,027c

-,648 ,518 -,060 ,702

,026c

,665 ,507 ,062 ,793

-,004c

-,108 ,914 -,010 ,986

,138d

2,905 ,004 ,261 ,486

,217d

4,685 ,000 ,400 ,461

,181d

3,755 ,000 ,330 ,450

-,259d

-6,257 ,000 -,504 ,512

-,187d

-4,882 ,000 -,414 ,665

,032d

,764 ,446 ,071 ,672

-,043d

-1,077 ,284 -,100 ,719

-,447d

-2,467 ,015 -,224 ,034

,211d

,644 ,521 ,060 ,011

,029d

,644 ,521 ,060 ,578

,016d

,425 ,671 ,040 ,787

,004d

,116 ,908 ,011 ,980

-,046e

-,851 ,396 -,079 ,296

-,040e

-,502 ,617 -,047 ,137

-,031e

-,499 ,619 -,047 ,222

-,052e

-1,020 ,310 -,095 ,333

-,040e

-1,048 ,297 -,098 ,611

,038e

1,020 ,310 ,095 ,630

,115e

,608 ,545 ,057 ,025

,239e

,840 ,402 ,078 ,011

,033e

,840 ,402 ,078 ,577

,033e

,987 ,326 ,092 ,782 Seite 4

B20

Page 132: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Excluded Variablesa

Model Beta In t Sig.

Partial

Correlation

Collinearity

Statistics

Tolerance

4

Hyp_14

Hyp_15

,033e

,987 ,326 ,092 ,782

-,008e

-,269 ,789 -,025 ,976

Dependent Variable: meana.

Predictors in the Model: (Constant), Hyp_9b.

Predictors in the Model: (Constant), Hyp_9, Hyp_10c.

Predictors in the Model: (Constant), Hyp_9, Hyp_10, Hyp_12d.

Predictors in the Model: (Constant), Hyp_9, Hyp_10, Hyp_12, Hyp_4e.

Seite 5B21

Page 133: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

C. Experiment B

Intraclass correlation coefficient (total)

ICC p-value lower CI limit upper CI limitICC(1) 0.2465520 0 0.1905147 0.3221759ICC(k) 0.9175317 0 0.8889136 0.9417267ICC(A,1) 0.2467369 0 0.1907221 0.3223313ICC(A,k) 0.9176070 0 0.8890445 0.9417664ICC(C,1) 0.2488126 0 0.1924408 0.3247747ICC(C,k) 0.9184451 0 0.8901362 0.9423750

Intraclass correlation coefficient (noise-noise)

ICC p-value lower CI limit upper CI limitICC(1) 0.3477224 0 0.2610320 0.4659517ICC(k) 0.9477125 0 0.9231367 0.9673892ICC(A,1) 0.3479509 0 0.2613075 0.4661190ICC(A,k) 0.9477624 0 0.9232362 0.9674108ICC(C,1) 0.3521454 0 0.2648643 0.4707234ICC(C,k) 0.9486677 0 0.9245281 0.9679884

C1

Page 134: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Intraclass correlation coefficient (noise-no-noise)

ICC p-value lower CI limit upper CI limitICC(1) 0.1037487 1.110223E-15 0.05820481 0.1898769ICC(k) 0.7973981 1.110223E-15 0.67755117 0.8885040ICC(A,1) 0.1038514 8.881784E-16 0.05831796 0.1899612ICC(A,k) 0.7975764 8.881784E-16 0.67800120 0.8885583ICC(C,1) 0.1042576 8.881784E-16 0.05853178 0.1906649ICC(C,k) 0.7982790 8.881784E-16 0.67884955 0.8890096

C2

Page 135: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

C3

Page 136: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H2

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_2b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,810a

,657 ,652 ,1530616642

Predictors: (Constant), Hyp_2a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

3,403 1 3,403 145,266 ,000b

1,781 76 ,023

5,184 77

Dependent Variable: meana.

Predictors: (Constant), Hyp_2b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_2

1,518 ,017 87,595 ,000

-,004 ,000 -,810 -12,053 ,000

Dependent Variable: meana.

Seite 1C4

Page 137: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H3

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_3b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,190a

,036 ,024 ,2563954958

Predictors: (Constant), Hyp_3a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

,188 1 ,188 2,855 ,095b

4,996 76 ,066

5,184 77

Dependent Variable: meana.

Predictors: (Constant), Hyp_3b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_3

1,248 ,162 7,688 ,000

,181 ,107 ,190 1,690 ,095

Dependent Variable: meana.

Seite 1C5

Page 138: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

C6

Page 139: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H5

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_5b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,882a

,778 ,773 ,1448568048

Predictors: (Constant), Hyp_5a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

3,167 1 3,167 150,920 ,000b

,902 43 ,021

4,069 44

Dependent Variable: meana.

Predictors: (Constant), Hyp_5b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_5

1,497 ,022 69,329 ,000

10,075 ,820 ,882 12,285 ,000

Dependent Variable: meana.

Seite 1C7

Page 140: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Linear regression: H6

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1 Hyp_6b

. Enter

Dependent Variable: meana.

All requested variables entered.b.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 ,935a

,874 ,871 ,1093005312

Predictors: (Constant), Hyp_6a.

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

3,555 1 3,555 297,609 ,000b

,514 43 ,012

4,069 44

Dependent Variable: meana.

Predictors: (Constant), Hyp_6b.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_6

,661 ,051 12,921 ,000

,562 ,033 ,935 17,251 ,000

Dependent Variable: meana.

Seite 1C8

Page 141: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Multiple regression analysis: H1 - H4

Variables Entered/Removeda

Model

Variables

Entered

Variables

Removed Method

1

2

3

4

Hyp_2 . Stepwise (Criteria: Probability-of-F-to-enter

<= ,050, Probability-of-F-to-remove >= ,

095).

Hyp_1 . Stepwise (Criteria: Probability-of-F-to-enter

<= ,050, Probability-of-F-to-remove >= ,

095).

Hyp_3 . Stepwise (Criteria: Probability-of-F-to-enter

<= ,050, Probability-of-F-to-remove >= ,

095).

Hyp_4 . Stepwise (Criteria: Probability-of-F-to-enter

<= ,050, Probability-of-F-to-remove >= ,

095).

Dependent Variable: meana.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1

2

3

4

,810a

,657 ,652 ,1530616642

,838b

,703 ,695 ,1433433223

,860c

,739 ,728 ,1352741195

,888d

,788 ,776 ,1227442996

Predictors: (Constant), Hyp_2a.

Predictors: (Constant), Hyp_2, Hyp_1b.

Predictors: (Constant), Hyp_2, Hyp_1, Hyp_3c.

Predictors: (Constant), Hyp_2, Hyp_1, Hyp_3, Hyp_4d.

Page 1C9

Page 142: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

ANOVAa

Model

Sum of

Squares df

Mean

Square F Sig.

1 Regression

Residual

Total

2 Regression

Residual

Total

3 Regression

Residual

Total

4 Regression

Residual

Total

3,403 1 3,403 145,266 ,000b

1,781 76 ,023

5,184 77

3,643 2 1,821 88,643 ,000c

1,541 75 ,021

5,184 77

3,830 3 1,277 69,760 ,000d

1,354 74 ,018

5,184 77

4,084 4 1,021 67,767 ,000e

1,100 73 ,015

5,184 77

Dependent Variable: meana.

Predictors: (Constant), Hyp_2b.

Predictors: (Constant), Hyp_2, Hyp_1c.

Predictors: (Constant), Hyp_2, Hyp_1, Hyp_3d.

Predictors: (Constant), Hyp_2, Hyp_1, Hyp_3, Hyp_4e.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant)

Hyp_2

2 (Constant)

Hyp_2

Hyp_1

3 (Constant)

Hyp_2

Hyp_1

Hyp_3

4 (Constant)

Hyp_2

Hyp_1

Hyp_3

Hyp_4

1,518 ,017 87,595 ,000

-,004 ,000 -,810 -12,053 ,000

1,195 ,096 12,456 ,000

-,002 ,001 -,468 -3,959 ,000

,213 ,062 ,404 3,414 ,001

,926 ,124 7,495 ,000

-,002 ,001 -,469 -4,200 ,000

,212 ,059 ,403 3,612 ,001

,180 ,056 ,190 3,196 ,002

,243 ,201 1,211 ,230

-,003 ,001 -,563 -5,422 ,000

,232 ,054 ,441 4,337 ,000

,618 ,118 ,650 5,229 ,000

,391 ,095 ,527 4,108 ,000

a.

Page 2C10

Page 143: On the Psychoacoustics of Vocal Tremor: Identifying Severity … · P4 perfectfourth P5 perfectfth P8 perfectoctave P12 perfecttwelfth PD Parkinson’sdisease PDs Parkinson’sdiseasesubjects

Dependent Variable: meana.

Excluded Variablesa

Model Beta In t Sig.

Partial

Correlation

Collinearity

Statistics

Tolerance

1 Hyp_1

Hyp_3

Hyp_4

2 Hyp_3

Hyp_4

3 Hyp_4

,404b

3,414 ,001 ,367 ,283

,190b

2,972 ,004 ,325 1,000

-,086b

-1,252 ,215 -,143 ,942

,190c

3,196 ,002 ,348 1,000

-,077c

-1,196 ,235 -,138 ,941

,527d

4,108 ,000 ,433 ,177

Dependent Variable: meana.

Predictors in the Model: (Constant), Hyp_2b.

Predictors in the Model: (Constant), Hyp_2, Hyp_1c.

Predictors in the Model: (Constant), Hyp_2, Hyp_1, Hyp_3d.

Page 3C11