measuring, estimating, and understanding the …cornea.berkeley.edu/pubs/148.pdfnew definition of...

35
Perception & Psychophysics 2001, 63 (8), 1421-1455 Psychophysics, as its metaphysical sounding name in- dicates, is the scientific discipline that explores the con- nection between physical stimuli and subjective responses. The psychometric function (PF) provides the fundamental data for psychophysics, with the PF abscissa being the stimulus strength and the ordinate measuring the observer’s response. I shudder when I think about the many hours re- searchers (including myself) have wasted in using ineffi- cient procedures to measure the PF, as well as when I see procedures being used that do not reveal all that could be extracted with the same expenditure of time—and, of course, when I see erroneous conclusions being drawn be- cause of biased methodologies. The articles in this special symposium issue of Perception & Psychophysics deal with these and many more issues concerning the PF. I was a reviewer on all but one of these articles and have watched them mature. I have now been given the opportunity to comment on them once more. This time I need not quibble with minor items but rather can comment on several deeper issues. My commentary is divided into three sections. 1. What is the PF and how is it specified? Inspired by Strasburger’s article (Strasburger, 2001a) on a new defin- ition of slope as applied to a wide variety of PF shapes, I will comment on several items: the connections between several forms of the psychometric function (Weibull, cu- mulative normal, and d ¢ ); the relationship between slope of the PF with a linear versus a logarithmic abscissa; and the connection between PFs and signal detection theory. 2. What are the best experimental techniques for mea- suring the PF? In most of the articles in this symposium, the two-alternative forced choice (2AFC) technique is used to measure threshold. The emphasis on 2AFC is appropri- ate; 2AFC seems to be the most common methodology for this purpose. Yet for the same reason I shall raise questions about the 2AFC technique. I shall argue that both its limited ability to reveal underlying processes and its inefficiency should demote it from being the method of choice. Kaern- bach (2001b) introduces an “unforced choice” method that offers an improvement over standard 2AFC. The article by Linschoten, Harvey, Eller, and Jafek (2001) on measuring thresholds for taste and smell is relevant here: Because each of their trials takes a long time, an optimal method- ology is needed. 3. What are the best analytic techniques for estimating the properties of PFs once the data have been collected? Many of the papers in this special issue are relevant to this question. The articles in this special issue can be divided into two broad groups: Those that did not surprise me, and those that did. Among the first group, Leek (2001) gives a fine his- torical overview of adaptive procedures. She also provides a fairly complete list of references in this field. For details on modern statistical methods for analyzing data, the pair of papers by Wichmann and Hill (2001a, 2001b) offer an ex- cellent tutorial. Linschoten et al. (2001) also provide a good methodological overview, comparing different meth- ods for obtaining data. However, since these articles were 1421 Copyright 2001 Psychonomic Society, Inc. I thank all the authors contributing to this special issue for their fine pa- pers. I am especially grateful for the many wonderful suggestions and comments that I have had from David Foster, Lew Harvey, Christian Kaernbach, Marjorie Leek, Neil Macmillan, Jeff Miller, Hans Strasburger, Bernhard Treutwein, Christopher Tyler, and Felix Wichmann. This re- search was supported by Grant R01EY04776 from the National Institutes of Health. Correspondence should be addressed to S. A. Klein, School of Optometry, University of California, Berkeley, CA 94720-2020. Measuring, estimating, and understanding the psychometric function: A commentary STANLEY A. KLEIN University of California, Berkeley, California The psychometric function, relating the subject’s response to the physical stimulus, is fundamental to psychophysics. This paper examines various psychometric function topics, many inspired by this special symposium issue of Perception & Psychophysics: What are the relative merits of objective yes/no versus forced choice tasks (including threshold variance)? What are the relative merits of adaptive versus con- stant stimuli methods? What are the relative merits of likelihood versus up–down staircase adaptive meth- ods? Is 2AFC free of substantial bias? Is there no efficient adaptive method for objective yes/no tasks? Should adaptive methods aim for 90% correct? Can adding more responses to forced choice and objec- tive yes/no tasks reduce the threshold variance? What is the best way to deal with lapses? How is the Wei- bull function intimately related to the d¢ function? What causes bias in the likelihood goodness-of-fit? What causes bias in slope estimates from adaptive methods? How good are nonparametric methods for estimating psychometric function parameters? Of what value is the psychometric function slope? How are various psychometric functions related to each other? The resolution of many of these issues is sur- prising.

Upload: phamnguyet

Post on 30-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Perception amp Psychophysics2001 63 (8) 1421-1455

Psychophysics as its metaphysical sounding name in-dicates is the scientific discipline that explores the con-nection between physical stimuli and subjective responsesThe psychometric function (PF) provides the fundamentaldata for psychophysics with the PF abscissa being thestimulus strength and the ordinate measuring the observerrsquosresponse I shudder when I think about the many hours re-searchers (including myself ) have wasted in using ineffi-cient procedures to measure the PF as well as when I seeprocedures being used that do not reveal all that could beextracted with the same expenditure of timemdashand ofcourse when I see erroneous conclusions being drawn be-cause of biased methodologies The articles in this specialsymposium issue of Perception amp Psychophysicsdeal withthese and many more issues concerning the PF I was areviewer on all but one of these articles and have watchedthem mature I have now been given the opportunity tocomment on them once more This time I need not quibblewith minor items but rather can comment on several deeperissues My commentary is divided into three sections

1 What is the PF and how is it specified Inspired byStrasburgerrsquos article (Strasburger 2001a) on a new defin-ition of slope as applied to a wide variety of PF shapes Iwill comment on several items the connections between

several forms of the psychometric function (Weibull cu-mulative normal and dcent ) the relationship between slopeof the PF with a linear versus a logarithmic abscissa andthe connection between PFs and signal detection theory

2 What are the best experimental techniques for mea-suring the PF In most of the articles in this symposiumthe two-alternative forced choice (2AFC) technique is usedto measure threshold The emphasis on 2AFC is appropri-ate 2AFC seems to be the most common methodology forthis purpose Yet for the same reason I shall raise questionsabout the 2AFC technique I shall argue that both its limitedability to reveal underlying processes and its inefficiencyshould demote it from being the method of choice Kaern-bach (2001b) introduces an ldquounforced choicerdquo method thatoffers an improvement over standard 2AFC The article byLinschoten Harvey Eller and Jafek (2001) on measuringthresholds for taste and smell is relevant here Becauseeach of their trials takes a long time an optimal method-ology is needed

3 What are the best analytic techniques for estimatingthe properties of PFs once the data have been collectedMany of the papers in this special issue are relevant to thisquestion

The articles in this special issue can be divided into twobroad groups Those that did not surprise me and those thatdid Among the first group Leek (2001) gives a fine his-torical overview of adaptive procedures She also providesa fairly complete list of references in this field For detailson modern statistical methods for analyzing data the pair ofpapers by Wichmann and Hill (2001a 2001b) offer an ex-cellent tutorial Linschoten et al (2001) also provide agood methodological overview comparing different meth-ods for obtaining data However since these articles were

1421 Copyright 2001 Psychonomic Society Inc

I thank all the authors contributing to this special issue for their fine pa-pers I am especially grateful for the many wonderful suggestions andcomments that I have had from David Foster Lew Harvey ChristianKaernbach Marjorie Leek Neil Macmillan Jeff Miller Hans StrasburgerBernhard Treutwein Christopher Tyler and Felix Wichmann This re-search was supported by Grant R01EY04776 from the National Institutesof Health Correspondence should be addressed to S A Klein School ofOptometry University of California Berkeley CA 94720-2020

Measuring estimating and understanding the psychometric function A commentary

STANLEY A KLEINUniversity of California Berkeley California

The psychometric function relating the subjectrsquos response to the physical stimulus is fundamental topsychophysics This paper examines various psychometric function topics many inspired by this specialsymposium issue of Perception amp Psychophysics What are the relative merits of objective yesno versusforced choice tasks (including threshold variance) What are the relative merits of adaptive versus con-stant stimuli methods What are the relative merits of likelihood versus upndashdown staircase adaptive meth-ods Is 2AFC free of substantial bias Is there no efficient adaptive method for objective yesno tasksShould adaptive methods aim for 90 correct Can adding more responses to forced choice and objec-tive yesno tasks reduce the threshold variance What is the best way to deal with lapses How is the Wei-bull function intimately related to the dcent function What causes bias in the likelihood goodness-of-fitWhat causes bias in slope estimates from adaptive methods How good are nonparametric methods forestimating psychometric function parameters Of what value is the psychometric function slope Howare various psychometric functions related to each other The resolution of many of these issues is sur-prising

1422 KLEIN

nonsurprising and for the most part did not shake my pre-vious view of the world I feel comfortable with them andfeel no urge to shower them with words On the other handthere were a number of articles in this issue whose re-sults surprised me They caused me to stop and considerwhether my previous thinking had been wrong or whetherthe article was wrong Among the present articles sixcontained surprises (1) the Miller and Ulrich (2001) non-parametric method for analyzing PFs (2) the Strasburger(2001b) finding of extremely steep psychometric func-tions for letter discrimination (3) the Strasburger (2001a)new definition of psychometric function slope (4) the Wich-mann and Hill (2001a) analysis of biased goodness-of-fitand bias due to lapses (5) the Kaernbach (2001a) modifi-cation to 2AFC and (6) the Kaernbach (2001b) analysisof why staircase methods produce slope estimates that aretoo steep The issues raised in these articles are instruc-tive and they constitute the focus of my commentaryItem 3 is covered in Section I Item 5 will be discussed inSection II and the remainder are discussed in Section IIIA detailed overview of the main conclusions will be pre-sented in the summary at the end of this paper That mightbe a good place to begin

When I began working on this article more and morethreads captured my attention causing the article to be-come uncomfortably large and diffuse In discussing thissituation with the editor I decided to split my article intotwo publications The first of them is the present articlefocused on the nine articles of this special issue and in-cluding a number of overview items useful for comparingthe different types of PFs that the present authors use Thesecond paper will be submitted for publication in Percep-tion amp Psychophysics in the future (Klein 2002)

The articles in this special issue of Perception amp Psycho-physics do not cover all facets of the PF Here I should liketo single out four earlier articles for special mention King-Smith Grigsby Vingrys Benes and Supowit (1994)provide one of the most thoughtful approaches to likeli-hood methods with important new insights Treutweinrsquos(1995) comprehensive well-organized overview of adap-tive methods should be required reading for anyone inter-ested in the PF Kontsevich and Tylerrsquos (1999) adaptivemethod for estimating both threshold and slope is prob-ably the best algorithm available for that task and should belooked at carefully Finally the paper with which I ammost familiar in this general area is McKee Klein andTellerrsquos (1985) investigation of threshold confidence lim-its in probit fits to 2AFC data In looking over these papersI have been struck by how Treutwein (1995) King-Smithet al (1994) and McKee et al (1985) all point out prob-lems with the 2AFC methodology a theme I will con-tinue to address in Section II of this commentary

I TYPES OF PSYCHOMETRIC FUNCTIONS

The Probability-Based (High-Threshold) Correction for Guessing

The PF is commonly written as follows

(1A)

where g 5 P(0) is the lower asymptote 1 l is the upperasymptote p(x) is the PF that goes from 0 to 100 andP(x) is the PF representing the data that goes from g to1 l The stimulus strength x typically goes from 0 to alarge value for detection and from large negative to largepositive values for discrimination For discriminationtasks where P(x) can go from 0 (for negative stimulusvalues) to 100 (for positive values) there is typicallysymmetry between the negative and positive range sothat l 5 g Unless otherwise stated I will for simplicityignore lapses (errors made to perceptible stimuli) andtake l 5 0 so that P(x) becomes

(1B)

In the Section III commentary on Strasburger (2001b) andWichmann and Hill (2001a) I will discuss the benefit ofsetting the lapse rate l to a small value (like l 5 1)rather than 0 (or the 001 value that Strasburger used)to minimize slope bias

When coupled with a high threshold assumption Equa-tion 1B is powerful in connecting different methodologiesThe high threshold assumption is that the observer is in oneof two states detect or not detect The detect state occurswith probability p(x) If the stimulus is not detected thenone guesses with the guess rate g Given this assumptionthe percent correct will be

(1C)

which is identical to Equation 1B The first term corre-sponds to the occasions when one detects the stimulus andthe second term corresponds to the occasions when onedoes not

Equation 1 is often called the correction for guessingtransformation The correction for guessing is clearer ifEquation 1B is rewritten as

(2)

The beauty of Equation 2 together with the high thresholdassumption is that even though g 5 P(0) can change thefundamental PF p(x) is unchanged That is one can alterg by changing the number of alternatives in a forcedchoice task [g 5 1(number of alternatives)] or one canalter the false alarm rate in a yesno task p(x) remains un-changed and recoverable through Equation 2

Unfortunately when one looks at actual data one willdiscover that p(x) does change as g changes for both yesno and forced choice tasks For this and other reasons thehigh threshold assumption has been discredited A mod-ern method signal detection theory for doing the cor-rection for guessing is to do the correction after a z-scoretransformation I was surprised that signal detection the-ory was barely mentioned in any of the articles constitut-ing this special issue I consider this to be sufficiently im-portant that I want to clarify it at the outset Before I canintroduce the newer approach to the correction for guess-ing the connection between probability and z-score isneeded

p xP x P

P( )

( ) ( )

( )=

0

1 0

P x p x p x( ) ( ) ( ) = + [ ]g 1

P x p x( ) ( ) ( ) = +g g1

P x p x( ) ( ) ( ) = +g l g1

MEASURING THE PSYCHOMETRIC FUNCTION 1423

The Connection Between Probability and z-ScoreThe Gaussian distribution and its integral the cumula-

tive normal function play a fundamental role in many ap-proaches to PFs The cumulative normal function F(z)is the function that connects the z-score (z) to probability(prob)

(3A)

The function F(z) and its inverse are available in programssuch as Excel but not in Matlab For Matlab one must use

(3B)

where the error function

is a standard Matlab function I usually check that F( 1) 501587 F(0) 5 05 and F(1) 5 08413 in order to be surethat I am using erf properly The inverse cumulative nor-mal function used to go from prob to z is given by

(4)

Equations 3 and 4 do not specify whether one usesprob 5 p or prob 5 P (see Equation 1 for the distinctionbetween the two) In this commentary both definitionswill be used with the choice depending on how one doesthe correction for guessingmdashour next topic The choiceprob 5 p(x) means that a cumulative normal function isbeing used for the underlying PF and that the correctionfor guessing is done as in Equations 1 and 2 On the otherhand in a yesno task if we choose prob 5 P(x) thenone is doing the z-score transform of the PF data beforethe correction for guessing As is clarified in the nextsection this procedure is a signal detection ldquocorrectionfor guessingrdquo and the PF will be called the dcent functionThe distinction between the two methods for correctionfor guessing has generated confusion and is partly re-sponsible for why many researchers do not appreciatethe simple connection between the PF and dcent

The z-Score Correction for Bias and Signal Detection Theory YesNo

Equation 2 is a common approach to correction forguessing in both yesno and forced choice tasks and itwill be found in many of the articles in this special issueIn a yesno method for detection the lower asymptoteP(0) 5 g is the false alarm rate the probability of say-ing ldquoyesrdquo when a blank stimulus is present In the past oneinstructed subjects to keep g low A few blank catch tri-als were included to encourage subjects to maintain theirlow false alarm rate Today the correction is done using az-score ordinate and it is now called the correction for re-sponse bias or simply the bias correction One uses as

many trials at the zero level as at other levels one encour-ages more false alarms and the framework is called signaldetection theory

The z-score (dcent ) correction for bias provides a direct butnot well appreciated connection between the yesno psy-chometric function and the signal detection dcent The two stepsare as follows (1) Convert the percent correct P(x) to zscores using Equation 4 (2) Do the correction for bias bychoosing the zero point of the ordinate to be the z scorefor the point x 5 0 Finally give the ordinate the name dcentThis procedure can be written as

(5)

For a detection task z(x) is called the z score of the hit rateand z(0) is called the z score of the lower asymptote (g) orthe false alarm rate It is the false alarm rate because thestimulus at x 5 0 is the blank stimulus In order to be ableto do this correction for bias accurately one must put asmany trials at x 5 0 as one puts at the other levels Notethe similarity of Equations 2 and 5 The main differencebetween Equations 2 and 5 is whether one makes the cor-rection in probability or in z score In order to distinguishthis approach from the older yesno approach associatedwith high threshold theory it is often called the objectiveyesno method where ldquoobjectiverdquo means that the responsebias correction of Equation 5 is used

Figure 1 and its associated Matlab Code 1 in the Appen-dix illustrates the process represented in Equation2 Panel ais a Weibull PF on a linear abscissa to be introduced inEquation 12 Panel b is the z score of panel a The lowerasymptote is at z 5 1 corresponding to P 5 1587For now the only important point is that in panel b if onemeasures the curve from the bottom of the plot (z 5 1)then the ordinate becomes dcent because dcent 5 z z(0) 5z 1 1 Panels d and e are the same as panels a and b ex-cept that instead of a linear abscissa they have natural logabscissas More will be said about these figures later

I am ignoring for now the interesting question of whathappens to the shape of the psychometric function as onechanges the false alarm rate g If one uses multiple rat-ings rather than the binary yesno response one ends upwith M 1 PFs for M rating categories and each PF hasa different g For simplicity this paper assumes a unityROC slope which guarantees that the dcent function is inde-pendent of g The ROC slopes can be measured using anobjective yes no method as mentioned in Section II in thelist of advantages of the yesno method over the forcedchoice method

I bring up the dcent function (Equation 5) and signal de-tection theory at the very beginning of this commentarybecause it is an excellent methodology for measuringthresholds efficiently it can easily be extended to thesuprathreshold regime (it does not saturate at P 5 1) andit has a solid theoretical underpinning Yet it is barelymentioned in any of the articles in this issue So thereader needs to keep in mind that there is an alternativeapproach to PFs I would strongly recommend the bookDetection Theory A Userrsquos Guide (Macmillan amp Creel-

cent =d x z x z( ) ( ) ( ) 0

z = =F 1 2 2 1(prob erfinv prob) ( )

erf )( exp x dy yx

= ( )ograve2 0 5 2

0

p

proberf

= =+ aelig

egraveccediloumloslashdivide

F( ) z

z12

2

prob = =aelig

egraveccedil

ouml

oslashdivide

yenograveF( ) ( ) exp z dy

yz

22

0 52

p

1424 KLEIN

man 1991) for anyone interested in the signal detectionapproach to psychophysical experiments (including theeffect of nonunity ROC slopes)

The z-Score Correction for Bias and Signal Detection Theory 2AFC

One might have thought that for 2AFC the connectionbetween the PF and dcent is well establishedmdashnamely (Greenamp Swets 1966)

(6)

where z(x) is the z score of the average of P1(x) for cor-rect judgments in Interval 1 and P2(x) for correct judg-ments in Interval 2 Typically this average P(x) is calcu-

lated by dividing the total number of correct trials by thetotal number of trials It is generally assumed that the2AFC procedure eliminates the effect of response bias onthreshold However in this section I will argue that the dcentas defined in Equation 6 is affected by an interval biaswhen one interval is selected more than the other

I have been thinking a lot about response bias in 2AFCtasks because of my recent experience as a subject in atemporal 2AFC contrast discrimination study where con-trast is defined as the change in luminance divided bythe background luminance In these experiments I noticedthat I had a strong tendency to choose the second intervalmore than the first The second interval typically appearssubjectively to be about 5 higher in contrast than it re-ally is Whether this is a perceptual effect because the in-

cent =d x z x( ) ( ) 2

0 1 2 30

05

1(a)

Wei

bull

with

bet

a =

2

0 1 2 32

0

2

4(b)

z-sc

ore

of W

eibu

ll (d

cent 1

)

0 1 2 305

1

15

2(c)

logl

og s

lope

of d

cent

stimulus in threshold units

2 1 0 10

05

1(d)

2 1 0 12

0

2

4(e)

2 1 0 105

1

15

2

stimulus in natural log units

(f)

dcent o

f Wei

bull

1

3

0

z-sc

ore

of W

eibu

ll (d

cent 1

)lo

glog

slo

pe o

f dcent

Wei

bull

with

bet

a =

2

dcent o

f Wei

bull

1

3

0

Figure 1 Six views of the Weibull function Pweibull 5 1 (1 g )exp( x tb ) where g 5 01587 b 5 2 and xt is the stimulus strength in

threshold units Panels andashc have a linear abscissa with xt 5 1 being the threshold Panels dndashf have a natural log abscissa with yt 5 0being the threshold In panels a and d the ordinate is probability The asterisk in panel d at yt 5 0 is the point of maximum slope on alogarithmic abscissa In panels b and e the ordinate is the z score of panels a and d The lower asymptote is z 5 1 If the ordinate isredefined so that the origin is at the lower asymptote the new ordinate shown on the right of panels b and e is dcent(xt) 5 z(xt) z(0) cor-responding to the signal detection dcent for an objective yesno task In panels c and f the ordinate is the logndashlog slope of dcent At xt 5 0 thelogndashlog slope 5 b The logndashlog slope falls rapidly as the stimulus strength approaches threshold The Matlab program that generated thisfigure is Appendix Code 1

MEASURING THE PSYCHOMETRIC FUNCTION 1425

tervals are too close together in time (800 msec) or a cog-nitive effect does not matter for the present article Whatdoes matter is that this bias produces a downward bias indcent With feedback lots of practice and lots of experiencebeing a subject I was able to reduce this interval bias andequalize the number of times I responded with each in-terval Naiumlve subjects may have a more difficult time re-ducing the bias In this section I show that there is a verysimple method for removing the interval bias by con-verting the 2AFC data to a PF that goes from 0 to 100

The recognition of bias in 2AFC is not new Green andSwets (1966) in their Appendix III34 point out that thebias in choice of interval does result in a downward biasin dcent However they imply that the effect of this bias issmall and can typically be ignored I should like to ques-tion that implication by using one of their own examplesto show that the bias can be substantial

In a run of 200 trials the Green and Swets example(Green amp Swets 1966 p 410) has 95 out of 100 correctwhen the test stimulus is in the second interval (z2 51645) and 50 out of 100 correct (z1 5 0) when the stim-ulus is in the first interval The standard 2AFC way to an-alyze these data would be to average the probabilities(95 1 50) 2 5 725 correct (zcorrect 5 0598) cor-responding to dcent 5 zIuml2 5 0845 However Green andSwets (p 410) point out that according to signal detec-tion theory one should analyze this data by averaging thez scores rather than averaging the probabilities or

(7)

The ratio between these two ways of calculating dcent is11630845 5 1376 Since dcent is approximately linearlyrelated to signal strength in discrimination tasks this 38reduction in dcent corresponds to an erroneous 38 increasein predicted contrast discrimination threshold when onecalculates threshold the standard way Note that if therehad been no bias so that the responses would be approx-imately equally divided across the two intervals then z2z1 and Equation 7 would be identical to the more familiarEquation 6 Since bias is fairly common especially amongnew observers the use of Equation 7 to calculate dcent seemsmuch more reasonable than using Equation 6 It is surpris-ing that the bias correction in Equation 7 is rarely used

Green and Swets (1966) present a different analysis In-stead of comparing dcent values for the biased versus nonbi-ased conditions they convert the dcents back to percent cor-rect The corrected percent correct (corresponding to dcent 51163) is 795 In terms of percent correct the bias seemsto be a small effect shifting percent correct a mere 7 from725 to 795 However the dcent ratio of 138 is a bettermeasure of the bias since it is directly related to the errorin discrimination threshold estimates

A further comment on the magnitude of the bias may beuseful The preceding subsection discussed the criterionbias of yesno methods which contributes linearly to dcent

(Equation 5) The present section discusses the 2AFC in-terval bias that contributes quadratically to dcent Thus forsmall amounts of bias the decrease in dcent is negligibleHowever as I have pointed out the bias can be large enoughto make a significant contribution to dcent

It is instructive to view the bias correction for the fullPF corresponding to this example Cumulative normalPFs are shown in the upper left panel of Figure 2 Thecurves labeled C1 and C2 are the probability correct forthe first and second intervals I1 and I2 The asterisks cor-respond to the stimulus strength used in this example at50 and 95 correct The dotndashdashed line is the aver-age of the two PFs for the individual intervals The slopeof the PF is set by the dotndashdashed line (the averageddata) being at 50 (the lower asymptote) for zero stim-ulus strength The lower left panel is the z-score version ofthe upper panel The dashed line is the average of the twosolid lines for z scores in I1 and I2 This is the signal de-tection method of averaging that Green and Swets (1966)present as the proper way to do the averaging The dotndashdashed line shown in the upper left panel is the z score ofthe average probability Notice that the z score for the av-eraged probability is lower than the averaged z score in-dicating a downward bias in dcent due to the interval bias asdiscussed at the beginning of this section The right pairof panels are the same as the left pair except that insteadof plotting C1 we plot 1 C1 the probability of re-sponding I2 incorrectly The dashed line is half the dif-ference between the two solid lines The other differenceis that in the lower right panel we have multiplied thedashed and dashndashdotted lines by Iuml2 so that these linesare dcent values rather than z scores

The final step in dealing with the 2AFC bias is to flipthe 1 C1 curve horizontally to negative abscissa valuesas in Figure 3 The ordinate is still the probability correctin interval 2 The abscissa becomes the difference instimulus strengths between I2 and I1 The flipped branchis the probability of responding I2 when the I2 stimulusstrength is less than that of I1 (an incorrect response)Figure 3a with the ordinate going from 0 to 100 is theproper way to represent 2AFC discrimination data Theother item that I have changed is the scale on the abscissato show what might happen in a real experiment The or-dinate values of 50 and 95 for the Green and Swets(1966) example have been placed at a contrast differenceof 5 The negative 5 value corresponds to the case inwhich the positive test pattern is in the f irst intervalThreshold corresponds to the inverse of the PF slopeThe bottom panel shows the standard signal detectionrepresentation of the signal in I1 and I2 dcent is the distancebetween these symmetric stimuli in standard deviationunits The Gaussians are centered at 65 The verticalline at 5 is the criterion such that 50 and 95 ofthe area of the two Gaussians is above the criterion Thez-score difference of 1645 between the two Gaussiansmust be divided by Iuml2 to get dcent because each trial hadtwo stimulus presentations with independent informa-

cent =+( )

= =dz z

22

1 6452

1 1632 1

1426 KLEIN

tion for the judgment This procedure identical to Equa-tion 7 gives the same dcent as before

Three Distinctions for Clarifying PFsIn dealing with PFs three distinctions need to be made

yesno versus forced choice detection versus discrimi-nation and constant stimuli versus adaptive methodsThese distinctions are usually clear but I should like topoint out some subtleties

For the forced choice versus yesno distinction thereare two sorts of forced choice tasks The standard versionhas multiple intervals separated spatially or temporallyand the stimulus is in only one of the intervals In the otherversion one of N stimuli is shown and the observer re-

sponds with a number from 1 to N For example Strasburg-er (2001b) presented 1 of 10 letters to the observer in a10AFC task Yesno tasks have some similarity to the lattertype of forced choice task Consider for example a detec-tion experiment in which one of five contrasts (includinga blank) are presented to the observer and the observer re-sponds with numbers from 1 to 5 This would be classifiedas a rating scale method of constant stimuli yesno tasksince only a single stimulus is presented and the rating isbased on a one-dimensional intensity

The detectiondiscrimination distinction is usually basedon whether the reference stimulus is a natural zero pointFor example suppose the task is to detect a high spatial fre-quency test pattern added to a spatially identical reference

0 1 2 30

2

4

6

8

1

C1C2

average

(a)

prob

abili

ty c

orre

ct

0 1 2 30

2

4

6

8

1

1 C1

C2

diff2

(b)

prob

abili

ty o

f I2

resp

onse

0 1 2 31

0

1

2

3

4(c)

C1

C2

z-average

stimulus strength

z-sc

ore

corr

ect

0 1 2 31

0

1

2

3

4

1 C1

C2

biased d cent

(d)

unbiased d cent

stimulus strength

z-sc

ore

for

I2 r

espo

nse

average prob

Figure 2 2AFC psychometric functions with a strong bias in favor of responding Interval 2 (I2) The bias is chosen from aspecific example presented by Green and Swets (1966) such that the observer has 50 and 95 correct when the test is in I1and I2 respectively These points are marked by asterisks The psychometric function being plotted is a cumulative normal Inall panels the abscissa is xt the stimulus strength (a) The psychometric functions for probability correct in I1 and I2 are shownand labeled C1 and C2 The average of the two probabilities labeled average is the dotndashdashed line it is the curve that is usu-ally reported The diamond is at 725 the average percent correct of the two asterisks (b) Same as panel a except that insteadof showing C1 we show 1ndashC1 the probability of saying I2 when the test was in I1 The abscissa is now labeled ldquoprobability ofI2 responserdquo (c) z scores of the three probabilities in panel a An additional dashed line is shown that is the average of the C1and C2 z-score curves The diamond is the z-score of the diamond in panel a and the star is the z-score average of the two panel casterisks (d) The sign of the C2 curve in panel c is flipped to correspond to panel b The dashed and dotndashdashed lines of panel chave been multiplied by Iuml2 in panel d so that they become dcent

MEASURING THE PSYCHOMETRIC FUNCTION 1427

pattern If the reference pattern has zero contrast the taskis detection If the reference pattern has a high contrast thetask is discrimination Klein (1985) discusses these tasks interms of monopolar and bipolar cues For discriminationa bipolar cue must be available whereby the test pattern canbe either positive or negative in relation to the reference Ifone cannot discriminate the negative cue from the positivethen it can be called a detection task

Finally the constant stimuli versus adaptive method dis-tinction is based on the formerrsquos having preassigned testlevels and the latterrsquos having levels that shift to a desiredplacement The output of the constant stimulus method is

a full PF and is thus fully entitled to be included in this spe-cial issue The output of adaptive methods is typically onlya single number the threshold specifying the location butnot the shape of the PF Two of the papers in this issue(Kaernbach 2001b Strasburger 2001b) explore the pos-sibility of also extracting the slope from adaptive data thatconcentrates trials around one level Even though adaptivemethods do not measure much about the PF they are sopopular that they are well represented in this special issue

The 10 rows of Table 1 present the articles in this specialissue (including the present article) Columns 2ndash5 corre-spond to the four categories associated with the first two

Figure 3 2AFC discrimination PF from Figure 2 has been extended to the 0 to 100 range without rescaling the or-dinate In panels a and b the two right-hand panels of Figure 2 have been modified by flipping the sign of the abscissa ofthe negative slope branch where the test is in Interval 1 (I1) The new abscissa is now the stimulus strength in I2 minus thestrength in I1 The abscissa scaling has been modified to be in stimulus units In this example the test stimulus has a con-trast of 5 The ordinate in panel a is the probability that the response is I2 The ordinate in panel b is the z score of thepanel a ordinate The asterisks are at the same points as in Figure 2 Panel c is the standard signal detection picture whennoise is added to the signal The abscissa has been modified to being the activity in I2 minus the activity in I1 The unitsof activation have arbitrarily been chosen to be the same as the units of panels a and b Activity distributions are shownfor stimuli of 5 and 15 units corresponding to the asterisks of panels a and b The subjectrsquos criterion is at 5 units ofactivation The upper abscissa is in z-score units where the Gaussians have unit variance The area under the two distri-butions above the criterion are 50 and 95 in agreement with the probabilities shown in panel a

15 10 5 0 5 10 150

5

1

prob

abili

ty o

f res

pons

e =

2

(a)

15 10 5 0 5 10 152

0

2

4

z-sc

ore

of r

espo

nse

= 2

(b)

contrast of I2 minus contrast of I1

15 10 5 0 5 10 150

05

1

activity in I2 minus activity in I1

responsewhen test in I1

(c)

response when test in I2

00822 0822z-score of activity in I2 minus activity in I1 (unit variance Gaussians)

d cent = 1645

crite

rion

16451645

resp

onse

1428 KLEIN

distinctions The last column classifies the articles accord-ing to the adaptive versus constant stimuli distinction Inorder to open up the full variety of PFs for discussion andto enable a deeper understanding of the relationship amongdifferent types of PFs I will now clarify the interrelation-ship of the various categories of PFs and their connectionto the signal detection approach

Lack of adaptive method for yesno tasks with con-trolled false alarm rate In Section II I will bring up anumber of advantages of the yesno task in comparisonwith to the 2AFC task (see also Kaernbach 1990) Giventhose yesno advantages one may wonder why the 2AFCmethod is so popular The usual answer is that 2AFC hasno response bias However as has been discussed the yesno method allows an unbiased dcent to be calculated and the2AFC method does allow an interval bias that affects dcentAnother reason for the prevalence of 2AFC experimentsis that a multitude of adaptive methods are available for2AFC but barely any available for an objective yesnotask in which the false alarm rate is measured so that dcentcan be calculated In Table 1 with one exception the rowswith adaptive methods are associated with the forcedchoice method The exception is Leek (2001) who dis-cusses adaptive yesno methods in which no blank trialsare presented In that case the false alarm rate is notmeasured so dcent cannot be calculated This type of yesnomethod does not belong in the ldquoobjectiverdquo category of con-cern for the present paper The ldquo1990rdquo entries in Table 1refer to Kaernbachrsquos (1990) description of a staircasemethod for an objective yesno task in which an equalnumber of blanks are intermixed with the signal trialsSince Kaernbach could have used that method for Kaern-bach (2001b) I placed the ldquo1990rdquo entry in his slot

Kaernbachrsquos (1990) yesno staircase rules are simpleThe signal level changes according to a rule such as thefollowing Move down one level for correct responses ahit ltyes|signalgt or a correct rejection ltno|blankgt moveup three levels for wrong responses a miss ltno|signalgtor a false alarm ltyes|blankgt This rule similar to the onedownndashthree up rule used in 2AFC places the trials so thatthe average of the hit rate and correct rejection rate is 75

What is now needed is a mechanism to get the observer toestablish an optimal criterion that equalizes the number ofldquoyesrdquo and ldquonordquo responses (the ROC negative diagonal)This situation is identical to the problem of getting 2AFCobservers to equalize the number of responses to Inter-vals 1 and 2 The quadratic bias in dcent is the same in bothcases The simplest way to get subjects to equalize theirresponses is to give them feedback about any bias in theirresponses With equal ldquoyesrdquo and ldquonordquo responses the 75correct corresponds to a z score of 0674 and a dcent 5 2z 51349 If the subject does not have equal numbers of ldquoyesrdquoand ldquonordquo responses then the dcent would be calculated bydcent 5 zhit zfalse alarm I do hope that Kaernbachrsquos clever yesno objective staircase will be explored by others Oneshould be able to enhance it with ratings and multiple stim-uli (Klein 2002)

Forced choice detection As can be seen in the secondcolumn of Table 1 a popular category in this specialissue is the forced choice method for detection Thismethod is used by Strasburger (2001b) Kaernbach(2001a) Linschoten et al (2001) and Wichmann and Hill(2001a 2001b) These researchers use PFs based on prob-ability ordinates The connection between P and dcent is dif-ferent from the yesno case given by Equation 5 For2AFC signal detection theory provides a simple connec-tion between dcent and probability correct dcent(x) 5 Iuml2 z(x)In the preceding section I discussed the option of averag-ing the probabilities and then taking the z score (highthreshold approach) or averaging the z scores and then cal-culating the probability (signal detection approach) Thesignal detection method is better because it has a strongerempirical basis and avoids bias

For an m-AFC task with m 2 the connection betweendcent and P is more complicated than it is for m 5 2 The con-nection given by a fairly simple integral (Green amp Swets1966 Macmillan amp Creelman 1991) has been tabulatedby Hacker and Ratcliff (1979) and by Macmillan and Creel-man (1991) One problem with these tables is that they arebased on the assumption of unity ROC slope It is knownthat there are many cases in which the ROC slope is notunity (Green amp Swets 1966) so these tables connecting

Table 1Classification of the 10 Articles in This Special Issue

According to Three Distinctions Forced Choice Versus YesNo Detection Versus Discrimination Adaptive Versus Constant Stimuli

Detection Discrimination Adaptive or Source m-AFC YesNo m-AFC YesNo Constant Stimuli

Leek (2001) General loose g 3 loose g AWichmann amp Hill (2001a) 2AFC (3) (3) (3) CWichmann amp Hill (2001b) 2AFC (3) (3) (3) CLinschoten et al (2001) 2AFC AStrasburger (2001a) General 3 3 CStrasburger (2001b) 10AFC AKaernbach (2001a) General 3 AKaernbach (2001b) General (1990) 3 (1990) AMiller amp Ulrich (2001) for g 5 0 3 (for 0ndash100) 3 CKlein (present) General 3 3 3 Both

MEASURING THE PSYCHOMETRIC FUNCTION 1429

dcent and probability correct should be treated cautiouslyW P Banks and I (unpublished) investigated this topic andfound that near the P 5 50 correct point the dependenceof dcent on ROC slope is minimal Away from the 50 pointthe dependence can be strong

YesNo detection Linschoten et al (2001) comparethree methods (limits staircase 2AFC likelihood) for mea-suring thresholds with a small number of trials In themethod of limits one starts with a subthreshold stimulusand gradually increases the strength On each trial the ob-server says ldquoyesrdquo or ldquonordquo with respect to whether or not thestimulus is detected Although this is a classic yesno de-tection method it will not be discussed in this article be-cause there is no control of response bias That is blankswere not intermixed with signals

In Table 1 the Wichmann and Hill (2001a 2001b) arti-cles are marked with 3s in parentheses because althoughthese authors write only about 2AFC they mention that alltheir methods are equally applicable for yesno or m-AFCof any type (detectiondiscrimination) And their Matlabimplementations are fully general All the special issue ar-ticles reporting experimental data used a forced choicemethod This bias in favor of the forced choice methodol-ogy is found not only in this special issue it is widespreadin the psychophysics community Given the advantages ofthe yesno method (see discussion in Section II) I hopethat once yesno adaptive methods are accepted they willbecome the method of choice

m-AFC discrimination It is common practice to rep-resent 2AFC results as a plot of percent correct averagedover all trials versus stimulus strength This PF goes from50 to 100 The asymmetry between the lower andupper asymptotes introduces some inefficiency in thresh-old estimation as will be discussed Another problem withthe standard 50 to 100 plot is that a bias in choice ofinterval will produce an underestimate of dcent as has beenshown earlier Researchers often use the 2AFC method be-cause they believe that it avoids biased threshold estimatesIt is therefore surprising that the relatively simple correc-tion for 2AFC bias discussed earlier is rarely done

The issue of bias in m-AFC also occurs when m 2 InStrasburgerrsquos 10AFC letter discrimination task it is com-mon for subjects to have biases for responding with par-ticular letters when guessing Any imbalance in the re-sponse bias for different letters will result in a reductionof dcent as it did in 2AFC

There are several benefits of viewing the 2AFC dis-crimination data in terms of a PF going from 0 to 100Not only does it provide a simple way of viewing and cal-culating the interval bias it also enables new methods forestimating the PF parameters such as those proposed byMiller and Ulrich (2001) as will be discussed in Sec-tion III In my original comments on the Miller and Ulrichpaper I pointed out that because their nonparametric pro-cedure has uniform weighting of the different PF levelstheir method does not apply to 2AFC tasks The asymmet-ric binomial error bars near the 50 and 100 levelscause the uniform weighting of the Miller and Ulrich ap-

proach to be nonoptimal However I now realize that the2AFC discrimination task can be fit by a cumulative nor-mal going from 0 to 100 Because of that insight Ihave marked the discrimination forced choice column ofTable 1 for the Miller and Ulrich (2001) paper with theproviso that the PF goes from 0 to 100 Owing to thepopularity of 2AFC this modification greatly expands therelevance of their nonparametric approach

YesNo discrimination Kaernbach (2001b) and Millerand Ulrich (2001) offer theoretical articles that examineproperties of PFs that go from P 5 0 to 100 (P 5 pin Equation 1) In both cases the PF is the cumulative nor-mal (Equation 3) Although these PFs with g 5 0 couldbe for a yesno detection task with a zero false alarm rate(not plausible) or a forced choice detection task with an in-finite number of alternatives (not plausible either) I sus-pect that the authors had in mind a yesno discriminationtask (Table 1 col 5) A typical discrimination task in vi-sion is contrast discrimination in which the observer re-sponds to whether the presented contrast is greater than orless than a memorized reference Feedback reinforces thestability of the reference In a typical discrimination taskthe reference is one exemplar from a continuum of stimu-lus strengths If the reference is at a special zero pointrather than being an element of a smooth continuum thetask is no longer a simple discrimination task Zero con-trast would be an example of a special reference Klein(1985) discusses several examples which illustrate how anatural zero can complicate the analysis One might won-der how to connect the PF from the detection regime inwhich the reference is zero contrast to the discriminationregime in which the reference (pedestal) is at a high con-trast The dcent function to be introduced in Equation 20 doesa reasonably good job of fitting data across the full range ofpedestal strength going from detection to discrimination

The connection of the discrimination PF to the signaldetection PF is the same as that for yesno detection givenin Equation 5 dcent(x) 5 z(x) z(0) where z is the z scoreof the probability of a ldquogreater thanrdquo judgment In a dis-crimination task the bias z(0) is the z score of the prob-ability of saying ldquogreaterrdquo when the reference stimulus ispresented The stimulus strength x that gives z(x) 5 0 isthe point of subjective equality (x 5 PSE) If the cumu-lative normal PF of Equation 3 is used then the z scoreis linearly proportional to the stimulus strength z(x) 5(x PSE)threshold where threshold is defined to be thepoint at which dcent 5 1 I will come back to these distinc-tions between detection and discrimination PFs after pre-senting more groundwork regarding thresholds log ab-scissas and PF shapes (see Equations 14 and 17)

Definition of ThresholdThreshold is often defined as the stimulus strength that

produces a probability correct halfway up the PF If humansoperated according to a high-threshold assumption thisdefinition of threshold would be stable across different ex-perimental methods However as I discussed followingEquation 2 high-threshold theory has been discredited

1430 KLEIN

According to the more successful signal detection theory(Green amp Swets 1966) the dcent at the midpoint of the PFchanges according to the number of alternatives in aforced choice method and according to the false alarm ratein a yesno method This variability of dcent with method is agood reason not to define threshold as the halfway pointof the PF

A definition of threshold that is relatively independent ofthe method used for its measurement is to define thresholdas the stimulus strength that gives a fixed value of dcent Thestimulus strength that gives dcent 5 1 (76 correct for 2AFC)is a common definition of threshold Although I will showthat higher dcent levels have the advantage of giving more pre-cise threshold estimates unless otherwise stated I will takethreshold to be at dcent 5 1 for simplicity This definition ap-plies to both yesno and m-AFC tasks and to both detectionand discrimination tasks

As an example consider the case shown in Figure 1where the lower asymptote (false alarm rate) in a yesnodetection task is g 5 1587 corresponding to a z score ofz 5 ndash1 If threshold is defined to be at dcent 5 10 then fromEquation 5 the z score for threshold is z 5 0 correspond-ing to a hit rate of 50 (not quite halfway up the PF) Thisexample with a 50 hit rate corresponds to defining dcentalong the horizontal ROC axis If threshold had been de-fined to be dcent 5 2 in Figure 1 then the probability correctat threshold would be 8413 This example in which boththe hit rate and correct rejection rate are equal (both are8413) corresponds to the ROC negative diagonal

Strasburgerrsquos Suggestion on Specifying Slope A Logarithmic Abscissa

In many of the articles in this special issue a logarithmicabscissa such as decibels is used Many shapes of PFs havebeen used with a log abscissa Most delightful but frus-trating is Strasburgerrsquos (2001a) paper on the PF maximumslope He compares the Weibull logistic Quick cumula-tive normal hyperbolic tangent and signal detection dcentusing a logarithmic abscissa The present section is the out-come of my struggles with a number of issues raised byStrasburger (2001a) and my attempt to clarify them

I will typically express stimulus strength x in thresh-old units

(8)

where a is the threshold Stimulus strength will be ex-pressed in natural logarithmic units y as well as in lin-ear units x

(9)

where y 5 loge(x) is the natural log of the stimulus and Y5 loge(a) is the threshold on the log abscissa

The slope of the psychometric function P( yt) with a log-arithmic abscissa is

and the maximum slope is called bcent by Strasburger (2001a)A very different definition of slope is sometimes used bypsychophysicists the logndashlog slope of the dcent function aslope that is constant at low stimulus strengths for manyPFs The logndashlog dcent slope of the Weibull function is shownin the bottom pair of panels in Figure 1 The low-contrastlogndashlog slope is b for a Weibull PF (Equation 12) and bfor a dcent PF (Equation 20) Strasburger (2001a) shows howthe maximum slope using a probability ordinate is con-nected to the logndashlog slope using a dcent ordinate

A frustrating aspect of Strasburgerrsquos article is that theslope units of P (probability correct per loge) are not fa-miliar Then it dawned on me that there is a simple con-nection between slope with a loge abscissa slopelog 5[dP( yt)](dyt) and slope with a linear abscissa slopelin 5[dP(xt)](dxt) namely

(11)

because dyt dxt 5 [d loge(xt)] (dxt) 5 1xt At thresholdxt 5 1 ( yt 5 0) so at that point Strasburgerrsquos slope with alogarithmic axis is identical to my familiar slope plotted inthreshold units on a linear axis The simple connection be-tween slope on the log and linear abscissas converted me tobeing a strong supporter of using a natural log abscissa

The Weibull and cumulative normal psychometricfunctions To provide a background for Strasburgerrsquos ar-ticle I will discuss three PFs Weibull cumulative nor-mal and dcent as well as their close connections A com-mon parameterization of the PF is given by the Weibullfunction

(12)

where pweib(xt) the PF that goes from 0 to 100 is re-lated to Pweib(xt) the probability of a correct responseby Equation 1 b is the slope and k controls the definitionof threshold pweib(1) 5 1 k is the percent correct atthreshold (xt 5 1) One reason for the Weibullrsquos popular-ity is that it does a good job of fitting actual data In termsof logarithmic units the Weibull function (Equation 12) be-comes

(13)

Panel a of Figure 1 is a plot of Equation 12 (Weibull func-tion as a function of x on a linear abscissa) for the case b 52 and k 5 exp( 1) 5 0368 The choice b 5 2 makes theWeibull an upside-down Gaussian Panel d is the samefunction this time plotted as a function of y correspondingto a natural log abscissa With this choice of k the point ofmaximum slope as a function of yt is the threshold point( yt 5 0) The point of maximum slope at yt 5 0 is markedwith an asterisk in panel d The same point in panel a atxt 5 1 is not the point of maximum slope on a linear ab-scissa because of Equation 11 When plotted as a dcent func-tion [a z-score transform of P(xt)] in panel b the Weibullaccelerates below threshold and decelerates above thresh-

p y ktyt

weib( ) = ( )[ ]1exp

b

p x ktxt

weib ( ) = 1b

slopeslope

lin =( )

=( )

times =dP x

dx

dP y

dy

dy

dx xt

t

t

t

t

t t

log

slope y dPdy

dpdy

tt

t

( ) =

= ( )1 g

y x x y Yt t= ( ) = =log log e e a

x xt = a

(10A)

(10B)

MEASURING THE PSYCHOMETRIC FUNCTION 1431

old in agreement with a wide range of experimental dataThe acceleration and deceleration are most clearly seen bythe slope of dcent in the logndashlog coordinates of panel e Thelogndashlog dcent slope is plotted in panels c and f At xt 5 0 thelogndashlog slope is 2 corresponding to our choice of b 5 2The slope falls surprisingly rapidly as stimulus strength in-creases and the slope is near 1 at xt 5 2 If we had chosenb 5 4 for the Weibull function then the logndashlog dcent slopewould have gone from 4 at xt 5 0 to near 2 at xt 5 2Equation 20 will present a dcent function that captures thisbehavior

Another PF commonly used is based on the cumulativenormal (Equation 3)

(14A)

or

(14B)

where s is the standard error of the underlying Gaussianfunction (its connection to b will be clarified later) Notethat the cumulative normal PF in Equation 14 should beused only with a logarithmic abscissa because y goes to

yen needed for the 0 lower asymptote whereas theWeibull function can be used with both the linear (Equa-tion 12) and the log (Equation 13) abscissa

Two examples of Strasburgerrsquos maximum slope (hisEquations 7 and 17) are

(15)

for the Weibull function (Equation 13) and

(16)

for the cumulative normal (Equation 14) In Equation 15 Iset k in Equation 12 to be k 5 exp( 1) This choiceamounts to defining threshold so that the maximum slopeoccurs at threshold ( yt 5 0) Note that Equations 12ndash16are missing the (1 g) factor that are present in Strasburg-errsquos Equations 7 and 17 because we are dealing with prather than P Equations 15 and 16 provide a clear simpleand close connection between the slopes of the Weibull andcumulative normal functions so I am grateful to Stras-burger for that insight

An example might help in showing how Equation 16works Consider a 2AFC task (g 5 05) assuming a cumu-lative normal PF with s 5 10 According to Equation 16bcent 5 0399 At threshold (assumed for now to be the pointof maximum slope) P(xt 5 1) 5 75 At 10 abovethreshold P(xt 5 11) 75 1 01 bcent 5 07899 which isquite close to the exact value of 7896 This example showshow bcent defined with a natural log abscissa yt is relevantto a linear abscissa xt

Now that the logarithmic abscissa has been introducedthis is a good place to stop and point out that the log ab-scissa is fine for detection but not for discrimination since

the x 5 0 point is often in the middle of the range How-ever the cumulative normal PF is especially relevant to dis-crimination where the PF goes from 0 to 100 andwould be written as

(17A)

or

(17B)

where x 5 PSE is the point of subjective equality Theparameter a is the threshold since dcent(a) 5 zF(a)zF(0) 5 1 The threshold a is also s standard deviationsof the Gaussian probability density function (pdf ) Thereciprocal of a is the PF slope I tend to use the letter sfor a unitless standard deviation as occurs for the loga-rithmic variable y I use a for a standard deviation thathas the units of the stimulus x (like percent contrast) Thecomparison of Equation 14B for detection and Equation17B for discrimination is useful Although Equation 14is used in most of the articles in this special issue whichare concerned with detection tasks the techniques that I willbe discussing are also relevant to discrimination tasks forwhich Equation 17 is used

Threshold and the Weibull FunctionTo illustrate how a dcent 5 1 definition of threshold works

for a Weibull function let us start with a yesno task inwhich the false alarm rate is P(0) 5 1587 correspondingto a z score of zFA 5 1 as in Figure 1 The z score atthreshold (dcent 5 1) is zTh 5 zFA11 5 0 corresponding to aprobability of P(1) 5 50 From Equations 1 and 12 thek value for this definition of threshold is given by k 5[1 P(1)][1 P(0)] 5 05943 The Weibull function be-comes

(18A)

If I had defined dcent 5 2 to be threshold then zTh 5 zFA125 1 leading to k 5 (1 08413)(1 01587) 5 01886giving

(18B)

As another example suppose the false alarm rate in ayesno task is at 50 not uncommon in a signal detectionexperiment with blanks and test stimuli intermixed Thenthreshold at dcent 5 1 would occur at 8413 and the PFwould be

(18C)

with Pweib(0) 5 50 and Pweib(1) 5 8413 This casecorresponds to defining dcent on the ROC vertical intercept

For 2AFC the connection between dcent and z is z 5 dcent205Thus zTh 5 2 05 5 07071 corresponding to Pweib(1) 57602 leading to k 5 04795 in Equation 12 These con-nections will be clarified when an explicit form (Equation20)is given for the dcent function dcent(xt)

P xtxt

weib ( ) = 1 0 5 0 3173 b

P xtxt

weib ( ) = 1 1 0 1587 0 1886( ) b

P xtxt

weib ( ) = 1 1 0 1587 0 5943( ) b

z x xtF ( ) = PSE

a

p x P x xt tF F F( ) = ( ) = aelig

egraveoumloslash

PSEa

cent = =bs p s

12

0 399

cent = =b b bexp( ) 1 0 368

z yy y Y

tt( ) = =s s

p yy

tt

F F( ) =aeligegraveccedil

oumloslashdivides

1432 KLEIN

Complications With Strasburgerrsquos Advocacy of Maximum Slope

Here I will mention three complications implicated inStrasburgerrsquos suggestion of defining slope at the point ofmaximum slope on a logarithmic abscissa

1 As can be seen in Equation 11 the slopes on linearand logarithmic abscissas are related by a factor of 1xtBecause of this extra factor the maximum slope occursat a lower stimulus strength on linear as compared withlog axes This makes the notion of maximum slope lessfundamental However since we are usually interested inthe percent error of threshold estimates it turns out thata logarithmic abscissa is the most relevant axis support-ing Strasburgerrsquos log abscissa definition

2 The maximum slope is not necessarily at threshold(as Strasburger points out) For the Weibull functions de-fined in Equation 13 with k 5 exp( 1) and the cumula-tive normal function in Equation 14 the maximum slope(on a log axis) does occur at threshold However the twopoints are decoupled in the generalized Weibull functiondefined in Equation 12 For the Quick version of theWeibull function (Equation 13 with k 5 05 placingthreshold halfway up the PF) the threshold is below thepoint of maximum slope the derivative of P(xt) atthreshold is

which is slightly different from the maximum slope asgiven by Equation 15 Similarly when threshold is de-fined at dcent 5 1 the threshold is not at the point of max-imum slope People who fit psychometric functions wouldprobably prefer reporting slopes at the detection thresh-old rather than at the point of maximum slope

3 One of the most important considerations in selectinga point at which to measure threshold is the question ofhow to minimize the bias and standard error of the thresh-old estimate The goal of adaptive methods is to place tri-als at one level In order to avoid a threshold bias due to aimproper estimate of PF slope the test level should be atthe defined threshold The variance of the threshold esti-mate when the data are concentrated as a single level (aswith adaptive procedures) is given by Gourevitch and Gal-anter (1967) as

(19)

If the binomial error factor of P(1 P)N were not presentthe optimal placement of trials for measuring thresholdwould be at the point of maximum slope However thepresence of the P(1 P) factor shifts the optimal point toa higher level Wichmann and Hill (2001b) consider howtrial placement affects threshold variance for the methodof constant stimuli This topic will be considered in Sec-tion II

Connecting the Weibull and dcent FunctionsFigures 5ndash7 of Strasburger (2001a) compare the logndash

log dcent slope b with the Weibull PF slope b Strasburgerrsquosconnection of b 5 088b is problematic since the dcent ver-sion of the Weibull PF does not have a fixed logndashlog slopeAn improved dcent representation of the Weibull function isthe topic of the present section A useful parameterizationfor dcent which I shall refer to as the StromeyerndashFoley func-tion was introduced by Stromeyer and Klein (1974) andused extensively by Foley (1994)

(20)

where xt is the stimulus strength in threshold units as inEquation 8 The factors with a in the denominator are pres-ent so that dcent equals unity at threshold (xt 5 1 or x 5 a)At low xt dcent x t

ba The exponent b (the logndashlog slope ofdcent at very low contrasts) controls the amount of facilita-tion near threshold At high xt Equation 20 becomes dcentx t

1 w (1 a) The parameter w is the logndash log slope of thetest threshold versus pedestal contrast function (the tvc orldquodipperrdquo function) at strong pedestals One must be a bitcautious however because in yesno procedures the tvcslope can be decoupled from w if the signal detection ROCcurve has nonunity slope (Stromeyer amp Klein 1974) Theparameter a controls the point at which the dcent function be-gins to saturate Typical values of these unitless parametersare b 5 2 w 5 05 and a 5 06 (Yu Klein amp Levi 2001)The function in Equation 20 accelerates at low contrast(logndashlog slope of b) and decelerates at high contrasts (logndashlog slope of 1 w) in general agreement with a broadrange of data

For 2AFC z 5 dcentIuml2 so from Equation 3 the connec-tion between dcent and probability correct is

(21)

To establish the connection between the PFs specified inEquations12 and 20ndash21 one must first have the two agreeat threshold For 2AFC with the dcent 5 1 the threshold atP 5 7602 can be enforced by choosing k 5 04795 (seethe discussion following Equation 18C) so that Equa-tions 1 and 12 become

(22)

Modifying k as in Equation 22 leaves the Weibull shapeunchanged and shifts only the definition of threshold Ifb 5 106b w 5 1 039b and a 5 0614 then for allvalues of b the Weibull and dcent functions (Equations 21and 22 vs Equation 12) differ by less than 00004 for allstimulus strengths At very low stimulus strengths b 5b The value b 5 106b is a compromise for getting anoverall optimal fit

Strasburger (2001a) is concerned with the same issueIn his Table 1 he reports dcent logndashlog slopes of [8847

P xtxt( ) = 1 1 5 4795( )

b

P xd x

tt( ) = +

cent( )eacute

eumlecircecirc

5 5erf2

cent( ) =d xx

a a xt

tb

tb w+ 1 + 1( )

var( )( )

YP P

N

dP y

dy

t

t

=( )eacute

eumlecircecirc

12

cent =( )

= =b g b g bthresh

dP x

dx

t

t

e( )log ( )

( ) 12

2347 1

MEASURING THE PSYCHOMETRIC FUNCTION 1433

18379 3131 4421] for b 5 [1 2 35 5] If the dcent slopeis taken to be a measure of the dcent exponent b then theratio bb is [08847 09190 08946 08842] for the fourb values Our value of bb 5 106 differs substantiallyfrom 08 (Pelli 1985) and 088 (Strasburger 2001a) Pelli(1985) and Strasburger (2001a) used dcent functions with a 51 in Equation 20 With a 5 0614 our dcent function (Equa-tion 20) starts saturating near threshold in agreement withexperimental data The saturation lowers the effective valueof b I was very surprised to find that the StromeyerndashFoley dcent function did such a good job in matching theWeibull function across the whole range of b and x Fora long time I had thought a different fit would be neededfor each value of b

For the same Weibull function in a yesno method (falsealarm rate 5 50) the parameters b and w are identicalto the 2AFC case above Only the value of a shifts froma 5 0614 (2AFC) to a 5 054 (yesno) In order to makext 5 1 at dcent 5 1 k becomes 03173 (see Equation 18C)instead of 04795 (2AFC) As with 2AFC the differencebetween the Weibull and dcent function is less than 0004(lt04) for all stimulus strengths and all values of b andfor both a linear and a logarithmic abscissa These valuesmean that for both the yes no and the 2AFC methods thedcent function (Equation 20) corresponding to a Weibullfunction with b 5 2 has a low-contrast logndashlog slope ofb 5 212 and a high-contrast logndashlog slope of 1 w 5078 The high-contrast tvc (test contrast vs pedestal con-trast discrimination function sometimes called the ldquodip-perrdquo function) logndashlog slope would be w 5 022

I also carried out a similar analysis asking what cumu-lative normal s is best matched to the Weibull I founds 5 11720b However the match is poor with errors of4 (rather than lt004 in the preceding analysis) Thepoor fit of the two functions makes the fitting proceduresensitive to the weighting used in the fitting procedure Ifone uses a weighting factor of [P(1 P)N] where P is thePF one will find a quite different value for s than if oneignored the weighting or if one chose the weighting on thebasis of the data rather than the fitting function

II COMPARING EXPERIMENTAL TECHNIQUES FOR MEASURING

THE PSYCHOMETRIC FUNCTION

This section inspired by Linschoten et al (2001) is anexamination of various experimental methods for gather-ing data to estimate thresholds Linschoten et al comparethree 2AFC methods for measuring smell and taste thresh-olds a maximum-likelihood adaptive method an upndashdownstaircase method and an ascending method of limits Thespecial aspect of their experiments was that in measuringsmell and taste each 2AFC trial takes between 40 and60 sec (Lew Harvey personal communication) because ofthe long duration needed for the nostrils or mouth to re-cover their sensitivity The purpose of the Linschoten et alpaper is to examine which of the three methods is mostaccurate when the number of trials is low Linschoten et al

conclude that the 2AFC method is a good method for thistask My prior experience with forced choice tasks and lownumber of trials (Manny amp Klein 1985 McKee et al1985) made me wonder whether there might be better meth-ods to use in circumstances in which the number of trialsis limited This topic is considered in Section II

Compare 2AFC to YesNoThe commonly accepted framework for comparing

2AFC and yesno threshold estimates is signal detectiontheory Initially I will make the common assumption thatthe dcent function is a power function (Equation 20 witha 5 1)

(23)

Later in this section the experimentally more accurateform Equation 20 with a 1 will be used The varianceof the threshold estimate will now be calculated for 2AFCand yesno for the simplest case of trials placed at a sin-gle stimulus level y the goal of most adaptive methodsThe PF slope relates ordinate variance to abscissa vari-ance (from Gourevitch amp Galanter 1967)

(24)

where Y 5 y yt is the threshold estimate for a natural logabscissa as given in Equation 9 This formula for var(Y )is called the asymptotic formula in that it becomes exactin the limit of large number of total trials N Wichmannand Hill (2001b) show that for the method of constantstimuli a proper choice of testing levels brings one to theasymptotic region for N 120 (the smallest value thatthey explored) Improper choice of testing levels requiresa larger N in order to get to the asymptotic region Equa-tion 24 is based on having the data at a single level as oc-curs asymptotically in adaptive procedures In addition thedefinition of threshold must be at the point being testedIf levels are tested away from threshold there will be anadditional contribution to the threshold variance (Finney1971 McKee et al 1985) if the slope is a variable param-eter in the fitting procedure Equation 24 reaches its as-ymptotic value more rapidly for adaptive methods with fo-cused placement of trials than with the constant stimulusmethod when slope is a free parameter

Equation 24 needs analytic expressions for the deriva-tive and the variance of dcent The derivative is obtained fromEquation 23

(25)

The variance of dcent for both yesno (dcent 5 zhit 1 zcorrect rejection)and 2AFC (dcent 5 Iuml2zave) is

(26)var var( ) exp var( ) cent( ) = = ( )d z z P2 4 2p

dddy

bdt

cent = cent

var( )var( )

Yd

dddyt

= cent

centaeligegraveccedil

oumloslashdivide

2

cent = = ( )d c bytb

texp

1434 KLEIN

For the yes no case Equation 26 is based on a unityslope ROC curve with the subjectrsquos criterion at the neg-ative diagonal of the ROC curve where the hit rate equalsthe correct rejection rate This would be the most effi-cient operating point The variance for a nonunity slopeROC curve was considered by Klein Stromeyer andGanz (1974) From binomial statistics

(27)

with n 5 N for 2AFC and n 5 N2 for yesno since forthis binary yesno task the number of signal and blanktrials is half the total number of trials

Putting all of these factors together gives

(28)

for both the yesno and the 2AFC cases The only dif-ference between the two cases is the definition of n andthe definition of dcent (dcent2AFC 5 205z and dcentYN 5 2z for asymmetric criterion) When this is expressed in terms ofN (number of trials) and z (z score of probability cor-rect) we get

(29)

for both 2AFC and yesno That is when the percent cor-rects are all equal (P2AFC 5 Phit 5 Pcorrect rejection) thethreshold variance for 2AFC and yesno are equal Theminimum value of var(Y ) is 164(Nb2) occurring at z 5157 (P 5 94) This value of 94 correct is the optimumvalue at which to test for both 2AFC and yesno tasks in-dependent of b

This result that the threshold variance is the same for2AFC and yesno methods seems to disagree with Figure 4of McKee et al (1985) where the standard errors of thethreshold estimates are plotted for 2AFC and yesno asa function of the number of trials The 2AFC standarderror (their Figure 4) is twice that of yesno correspond-ing to a factor of four in variance However the McKeeet al analysis is misleading since all they (we) did for theyesno task was to use Equation 2 to expand the PF to a 0to 100 range thereby doubling the PF slope Thatrescaling is not the way to convert a 2AFC detection taskto the yesno task of the same detectability A signal de-tection methodology is needed for that conversion aswas done in the present analysis

One surprise is that the optimal testing probability isindependent of the PF slope b This finding is a generalproperty of PFs in which the dependence on the slopeparameter has the form xb The 94 level corresponds todcentYN 5 315 and dcent2AFC 5 223 For the commonly usedP 5 75 (z 5 0765 dcentYN 5 135 dcent2AFC 5 0954)var(Y ) 5 408(Nb2) which is about 25 times the opti-mal value obtained by placing the stimuli at 94 Green(1990) had previously pointed out that the 94 and 92levels were the optimal points to test for the cumulative

normal and logit functions respectively because of theminimum threshold estimate variability when one is test-ing at that point on the PF Other PFs can have optimalpoints at slightly lower levels but these are still above thelevels to which adaptive methods are normally directed

It is also useful to compare the 2AFC and yesno at thesame dcent rather than at their optimal points In that case fordcent values fixed at [1 2 3] the ratio of the 2AFC varianceto the yesno variance is [182 136 079] At low dcent thereis an advantage for the 2AFC method and that reverses athigh dcent as is expected from the optimal dcent being higher foryesno than for 2AFC In general one should run the yesno method at dcent levels above 20 (84 correct in the blankand signal intervals corresponds to dcent 5 2)

The equality of the optimal var(Y ) for 2AFC and yesno methods is not only surprising it seems paradoxicalmdashas can be seen by the following gedankenexperiment Sup-pose subjects close their eyes on the first interval of the2AFC trial and base their judgments on just the secondinterval as if they were doing a yesno task since blanksand stimuli are randomly intermixed Instead of sayingldquoyesrdquo or ldquonordquo they would respond ldquoInterval 2rdquo or ldquoInter-val 1rdquo respectively so that the 2AFC task would become ayesno task The paradox is that one would expect the vari-ance to be worse than for the eyes open 2AFC case sinceone is ignoring information However Equation29 showsthat the 2AFC and yesno methods have identical optimalvariance I suspect that the resolution of this paradox is thatin the yesno method with a stable criterion the optimalvariance occurs at a higher dcent value (315) than the 2AFCvalue (223) The dcent power law function that I chose doesnot saturate at large dcent levels giving a benefit to the yesno method

In order to check whether the paradox still holds witha more realistic PF (a dcent function that saturates at largedcent ) I now compare the 2AFC and yesno variance usinga Weibull function with a slope b In order to connect2AFC and yesno I need to use signal detection theoryand the StromeyerndashFoley dcent function (Equation 20) withthe ldquoWeibull parametersrdquo b 5 106b a 5 0614 and w 51 39b Following Equation 22 I pointed out that withthese parameter values the difference between the 2AFCWeibull function and the 2AFC dcent function (converted toa probability ordinate) is less than 0004 After the de-rivative Equation 25 is appropriately modified we findthat the optimal 2AFC variance is 342(Nb2) For the yesno case the optimal variance is 402(Nb2) This result re-solves the paradox The optimal 2AFC variance is about18 lower than the yesno variance so closing onersquos eyesin one of the intervals does indeed hurt

We shouldnrsquot forget that if one counts stimulus presen-tations Np rather than trials N the optimal 2AFC variancebecomes 684(Npb2) as opposed to 402(Npb2) for yesnoThus for the Linschoten experiments with each stimuluspresentation being slow the yesno methodology is moreaccurate for a given number of stimulus presentationseven at low dcent values Kaernbach (1990) compares 2AFC

var( ) exp( )

( )Y z

P P

N bz= ( )2

122

p

var( ) exp( )

( )Y z

P P

n bd= ( ) cent

412

2p

var( )( )

PP P

n= 1

MEASURING THE PSYCHOMETRIC FUNCTION 1435

and yesno adaptive methods with simulations and ac-tual experiments His abscissa is number of presenta-tions and his results are compatible with our findings

The objective yesno method that I have been discussingis inefficient since 50 of the trials are blanks Greaterefficiency is possible if several points on the PF are mea-sured (method of constant stimuli) with the number of blanktrials the same as at the other levels For example if fivelevels of stimuli were measured including blanks then theblanks would be only 20 of the total rather than 50 Forthe 2AFC paradigm the subject would still have to look at50 of the presentations being blanks The yesno eff i-ciency can be further improved by having the subjectrespond to the multiple stimuli with multiple ratings ratherthan a binary yesno response This rating scale methodof constant stimuli is my favorite method and I have beenusing it for 20 years (Klein amp Stromeyer 1980) The mul-tiple ratings allow one to measure the psychometric func-tion shape for dcents as large as 4 (well into the dipper regionwhere stimuli are slightly above threshold) It also allowsone to measure the ROC slope (presence of multiplicativenoise) Simulations of its benefits will be presented inKlein (2002)

Insight into how the number of responses affects thresh-old variance can be obtained by using the cumulative nor-mal PF (Equation 17) considered by Kaernbach (2001b)and Miller and Ulrich (2001) with g 5 0 Typically this isthe case of yesno discrimination but it can also be 2AFCdiscrimination with an ordinate going from 0 to 100as I have discussed in connection with Figure 3 For a cu-mulative normal PF with a binary response and for a sin-gle stimulus level being tested the threshold variance is(Equation 19) as follows

(30)

from Equation 27 and forthcoming Equations 40 and 41The optimal level to test is at P 5 05 so the optimal vari-ance becomes

(31)

If instead of a binary response the observer gives an ana-log response (many many rating categories) then the vari-ance is the familiar connection between standard deviationand standard error

(32)

With four response categories the variance is 119s2Nwhich is closer to the analog than to the binary case Ifstimuli with multiple strengths are intermixed (methodof constant stimuli) then the benefit of going from a bi-nary response to multiple responses is greater than that in-dicated in Equations 31ndash32 (Klein 2002)

A brief list of other problems with the 2AFC method ap-plied to detection comprises the following

1 2AFC requires the subject to memorize and thencompare the results of two subjective responses It is cog-nitively simpler to report each response as it arrives Sub-jects without much experience can easily make mistakes(lapses) simply becoming confused about the order of thestimuli Klein (1985) discusses memory load issues rele-vant to 2AFC

2 The 2AFC methodology makes various types ofanalysis and modeling more difficult because it requiresthat one average over all criteria The response variable in2AFC is the Interval 2 activation minus the Interval 1 ac-tivation (see the abscissa in Figure 2) This variable goesfrom minus infinity to plus infinity whereas the yesnovariable goes from zero to infinity It therefore seems moredifficult to model the effects of probability summation anduncertainty for 2AFC than for yesno

3 Models that relate psychophysical performance to un-derlying processes require information about how the noisevaries with signal strength (multiplicative noise) The rat-ing scale method of constant stimuli not only measures dcentas a function of contrast it also is able to provide an esti-mate of how the variance of the signal distribution (theROC slope) increases with contrast The 2AFC methodlacks that information

4 Both the yesno and 2AFC methods are supposed toeliminate the effects of bias Earlier I pointed out that in myrecent 2AFC discrimination experiments when the stim-ulus is near threshold I find I have a strong bias in favor ofresponding ldquoStimulus 2rdquo Until I learned to compensatefor this bias (not typically done by naiumlve subjects) my dcentwas underestimated As I have discussed it is possible tocompensate for this bias but that is rarely done

Improving the Brownian (Up ndashDown) StaircaseAdaptive methods with a goal of placing trials at an

optimal point on the PF are efficient Leek (2001) pro-vides a nice overview of these methods There are twogeneral categories of adaptive methods those based onmaximum (or mean) likelihood and those based on sim-pler staircase rules The present section is concernedwith the older staircase methods which I will callBrownian methods I used the name ldquoBrownianrdquo becauseof the upndashdown staircasersquos similarity to one-dimensionalBrownian motion in which the direction of each step israndom The familiar upndashdown staircase is Brownianbecause at each trial the decision about whether to go upor down is decided by a random factor governed by prob-ability P(x) I want to make this connection to Brownianmotion because I suspect that there are results in thephysics literature that are relevant to our psychophysicalstaircases I hope these connections get made

A Brownian staircase has a minimal memory of the runhistory it needs to remember the previous step (includingdirection) and the previous number of reversals or numberof trials (used for when to stop and for when to decreasestep size) A typical Brownian rule is to decrease signalstrength by one step (like 1 dB) after a correct responseand to increase it three steps after an incorrect response (a

var( thresh) = s 2

N

var( thresh) = timesp s2

2

N

var(var( )

( )exp thresh) =aeligegraveccedil

oumloslashdivide

= ( )P

N dPdy

P P zN2

22

2 1p s

1436 KLEIN

one downndashthree up rule) I was pleased to learn recentlythat his rule works nicely for an objective yesno task(Kaernbach 1990) as well as 2AFC as discussed earlier

In recent years likelihood methods (Pentland 1980Watson amp Pelli 1983) have become popular and havesomewhat displaced Brownian methods There is a wide-spread feeling that likelihood methods have substantiallysmaller threshold variance than do staircases in whichthresholds are obtained by simply averaging the levelstested Given the importance of this issue for informingresearchers about how best to measure thresholds it issurprising that there have been so few simulations com-paring staircases and optimal methods Pentlandrsquos (1980)results showing very large threshold variance for a stair-case method are so dramatically different from the re-sults of my simulations that I am skeptical The best pre-vious simulations of which I am aware are those of Green(1990) who compared the threshold variance from sev-eral 2AFC staircase methods with the ideal variance(Equation 24) He found that as long as the staircase ruleplaced trials above 80 correct the ratio of observed toideal threshold variance was about 14 This value is thesquare of the inverse of Greenrsquos finding that the ratio ofthe expected to the observed sigma is about 85

In my own simulations (Klein 2002) I found that aslong as the final step size was less than 02s the thresh-old variance from simply averaging over levels was closeto the optimal value Thus the staircase threshold varianceis about the same as the variance found by other optimalmethods One especially interesting finding was that thecommon method of averaging an even number of rever-sal points gave a variance that was about 10 higher thanaveraging over all levels This result made me wonderhow the common practice of averaging reversals ever gotstarted Since Green (1990) averaged over just the reversalpoints I suspect that his results are an overestimate of thestaircase variance In addition since Green specifies stepsize in decibels it is difficult for me to determine whetherhis final step size was small enough Since the thresholdvariance is inversely related to slope it is important to re-late step size to slope rather than to decibels The most nat-ural reference unit is s the standard deviation associatedwith the cumulative normal In summary I suggest that thisgeneral feeling of distrust of the simple Brownian staircaseis misguided and that simple staircases are often as goodas likelihood methods In the section Summary of Advan-tages of Brownian Staircase I point out several advantagesof the simple staircase over the likelihood methods

A qualification should be made to the claim above thataveraging levels of a simple staircase is as good as a fancylikelihood method If the staircase method uses a too smallstep size at the beginning and if the starting point is farfrom threshold then the simple staircase can be inefficientin reaching the threshold region However at the end ofthat run the astute experimenter should notice the largediscrepancy between the starting point and the thresholdand the run should be redone with an improved startingpoint or with a larger initial step size that will be reducedas the staircase proceeds

Increase number of 2AFC response categories In-dependent of the type of adaptive method used the 2AFCtask has a flaw whereby a series of lucky guesses at lowstimulus levels can erroneously drive adaptive methods tolevels that are too low Some adaptive methods with a smallnumber of trials are unable to fully recover from a string oflucky guesses early in the run One solution is to allow thesubject to have more than two responses For reasons thatI have never understood people like to have only two re-sponse categories in a 2AFC task Just as I am reluctant totake part in 2AFC experiments I am even more reluctantto take part in two response experiments The reason issimple I always feel that I have within me more than onebit of information after looking at a stimulus I usually haveat least four or more subjective states (2 bits) for a yesnotask HN LN LY HY where H and L are high and lowconfidence and N and Y are did not and did see it For a2AFC my internal states are H1 L1 L2 H2 for high andlow confidence on Intervals 1 and 2 Kaernbach (2001a) inthis special issue does an excellent job of justifying thelow-confidence response category He provides solid ar-guments simulations and experimental data on how alow-confidence category can minimize the ldquolucky guessrdquoproblem

Kaernbach (2001a) groups the L1 and L2 categoriestogether into a ldquodonrsquot knowrdquo category and argues per-suasively that the inclusion of this third response can in-crease the precision and accuracy of threshold estimateswhile also leaving the subject a happier person He callsit the ldquounforced choicerdquo method Most of his article con-cerns the understandable fear of introducing a responsebias exactly what 2AFC was invented to avoid He has de-veloped an intelligent rule for what stimulus level shouldfollow a ldquodonrsquot knowrdquo response Kaernbach shows withboth simulations and real data that with his method thispotential response bias problem has only a second-ordereffect on threshold estimates (similar to the 2AFC inter-val bias discussed around Equation 7) and its advantagesoutweigh the disadvantages I urge anyone using the 2AFCmethod to read it I conjecture that Kaernbachrsquos method isespecially useful in trimming the lower tail of the distri-bution of threshold estimates and also in reducing the in-terval bias of 2AFC tasks since the low confidence ratingwould tend to be used for trials on which there is the mostuncertainty

Although Kaernbach has managed to convince me ofthe usefulness of his three-response method he might havetrouble convincing others because of the response biasquestion I should like to offer a solution that might quellthe response bias fears Rather than using 2AFC with threeresponses one can increase the number of responses tofourmdashH1 L1 L2 H2mdashas discussed two paragraphsabove The L rating can be used when subjects feel theyare guessing

To illustrate how this staircase would work considerthe case in which we would like the 2AFC staircase to con-verge to about 84 correct A 5 upndash1 down staircase(5 levels up for a wrong response and 1 level down for acorrect response) leads to an average probability correct

MEASURING THE PSYCHOMETRIC FUNCTION 1437

of 56 5 8333 If one had zero information about thetrial and guessed then one would be correct half the timeand the staircase would shift an average of (5 1)2 512 levels Thus in his scheme with a single ldquodonrsquot knowrdquoresponse Kaernbach would have the staircase move up2 levels following a ldquodonrsquot knowrdquo response One mightworry that continued use of the ldquodonrsquot knowrdquo responsewould continuously increase the level so that one wouldget an upward bias But Kaernbach points out that theldquodonrsquot knowrdquo response replaces incorrect responses to agreater extent than it replaces correct responses so thatthe 12 level shift is actually a more negative shift than the15 shift associated with a wrong response For my sug-gestion of four responses the step shifts would be ndash1 1113 and 15 levels for responses HC LC LI and HIwhere H and L stand for high and low confidence and Cand I stand for correct and incorrect A slightly more con-servative set of responses would be ndash1 0 14 15 inwhich case the low-confidence correct judgment leaves thelevel unchanged In all these examples including Kaern-bachrsquos three-response method the low-confidence re-sponse has an average level shift of 12 when one is guess-ing The most important feature of the low confidencerating is that when one is below threshold giving a lowconfidence rating will prevent one from long strings oflucky guesses that bring one to an inappropriately lowlevel This is the situation that can be difficult to overcomeif it occurs in the early phase of a staircase when the stepsize can be large The use of the confidence rating wouldbe especially important for short runs where the extra bitof information from the subject is useful for minimizingerroneous estimates of threshold

Another reason not to be fearful of the multiple-response2AFC method is that after the run is finished one can es-timate threshold by using a maximum likelihood proce-dure rather than by averaging the run locations Standardlikelihood analysis would have to be modified for Kaern-bachrsquos three-response method in order to deal with theldquodonrsquot knowrdquo category For the suggested four-responsemethod one could simply ignore the confidence ratings forthe likelihood analysis My simulations discussed earlierin this section and Greenrsquos (1990) reveal that averagingover the levels (excluding a number of initial levels toavoid the starting point bias) has about as good a preci-sion and accuracy as the best of the likelihood methodsIf the threshold estimates from the likelihood method andfrom the averaging levels method disagree that run couldbe thrown out

Summary of advantages of Brownian staircaseSince there is no substantial variance advantage to thefancier likelihood methods one can pay more attention toa number of advantages associated with simple staircases

1 At the beginning of a run likelihood methods mayhave too little inertia (the sequential stimulus levels jumparound easily) unless a ldquostiff rdquo prior is carefully chosenBy inertia I refer to how difficult it is for the next level todeviate from the present level At the beginning of a runone would like to familiarize the subject with the stimu-

lus by presenting a series of stimuli that gradually increasein difficulty This natural feature of the Brownian stair-case is usually missing in QUEST-like methods Theslightly inefficient first few trials of a simple staircasemay actually be a benefit since it gives the subject somepractice with slightly visible stimuli

2 Toward the end of a maximum likelihood run one canhave the opposite problem of excessive inertia wherebyit is difficult to have substantial shifts of test contrastThis can be a problem if nonstationary effects are presentFor example thresholds can increase in mid-run owingto fatigue or to adaptation if a suprathreshold pedestal ormask is present In an old-fashioned Brownian staircasewith moderate inertia a quick look at the staircase trackshows the change of threshold in mid-run That observa-tion can provide an important clue to possible problemswith the current procedures A method that reveals non-stationarity is especially important for difficult subjectssuch as infants or patients in whom fatigue or inatten-tion can be an important factor

3 Several researchers have told me that the simplicity ofthe staircase rules is itself an advantage It makes writingprograms easier It is nicer for teaching students It alsosimplifies the uncertain business of guessing likelihoodpriors and communicating the methodology when writ-ing articles

Methods That Can Be Used to Estimate Slope as Well as Threshold

There are good reasons why one might want to learnmore about the PF than just the single threshold valueEstimating the PF slope is the first step in this directionTo estimate slope one must test the PF at more than onepoint Methods are available to do this (Kaernbach 2001bKontsevich amp Tyler 1999) and one could even adapt sim-ple Brownian staircase procedures to give better slope es-timates with minimum harm to threshold estimates (seethe comments in Strasburger 2001b)

Probably the best method for getting both thresholds andslopes is the Y (Psi) method of Kontsevich and Tyler(1999) The Y method is not totally novel it borrows tech-niques from many previous researchersmdashas Kontsevichand Tyler point out in a delightful paragraph that both clar-ifies their method and states its ancestry This paragraphsummarizes what is probably the most accurate and preciseapproach to estimating thresholds and slope so I reproduceit here

To summarize the Y method is a combination of solutionsknown from the literature The method updates the poste-rior probability distribution across the sampled space ofthe psychometric functions based on Bayesrsquo rule (Hall1968 Watson amp Pelli 1983) The space of the psychome-tric functions is two-dimensional (King-Smith amp Rose1997 Watson amp Pelli 1983) Evaluation of the psycho-metric function is based on computing the mean of theposterior probability distribution (Emerson 1986 King-Smith et al 1994) The termination rule is based on thenumber of trials as the most practical option (King-Smith

1438 KLEIN

et al 1994 Watson amp Pelli 1983) The placement of eachnew trial is based on one-step ahead minimum search(King-Smith 1984) of the expected entropy cost function(Pelli 1987)

A simplified rough description of the Y trial placementfor 2AFC runs is that initially trials are placed at about the90 correct level to establish a solid threshold Oncethreshold has been roughly established trials are placed atabout the 90 and the 70 levels to pin down the slopeThe precise levels depend on the assumed PF shape

Before one starts measuring slopes however one mustfirst be convinced that knowledge of slope can be impor-tant It is so for several reasons

1 Parametric fitting procedures either require knowl-edge of slope b or must have data that adequately con-strain the slope estimate Part of the discussion that fol-lows will be on how to obtain reliable threshold estimateswhen slope is not well known A tight knowledge of slopecan minimize the error of the threshold estimate

2 Strasburger (2001a 2001b) gives good argumentsfor the desirability of knowing the shape of the PF He isinterested in differences between foveal and peripheralvisual processing and he uses the PF slope as an indica-tion of differences

3 Slopes are different for different stimuli and thiscan lead to misleading results if slope is ignored It mayhelp to describe my experience with the Modelfest pro-ject (Carney et al 2000) to make this issue concreteThis continuing project involves a dozen laboratoriesaround the world We have measured detection thresh-olds for a wide variety of visual stimuli The data areused for developing an improved model of spatial visionSome of the stimuli were simple easily memorized lowspatial frequency patterns that tend to have shallowerslopes because a stimulus-known-exactly template canbe used efficiently and with minimal uncertainty On theother hand tiny Modelfest stimuli can have a good dealof spatial uncertainty and are therefore expected to pro-duce PFs with steeper slopes Because of the slope changethe pattern of thresholds will depend on the level atwhich threshold is measured When this rich dataset isanalyzed with f ilter models that make estimates formechanism bandwidths and mechanism pooling thesedifferent slope-dependent threshold patterns will lead todifferent estimates for the model parameters Several ofus in the Modelfest data gathering group argued that weshould use a data-gathering methodology that wouldallow estimation of PF slope of each of the 45 stimuliHowever I must admit that when I was a subject my pa-tience wore thin and I too chose the quicker runs Al-though I fully understand the attraction of short runs fo-cused on thresholds and ignoring slope I still think thereare good reasons for spreading trials out For one thinginterspersing trials that are slightly visible helps the sub-ject maintain memory of the test pattern

4 Slopes can directly resolve differences between mod-els In my own research for example my colleagues and

I are able to use the PF slope to distinguish long-range fa-cilitation that produces a low-level excitation or noise re-duction (typical with a cross-orientation surround) from ahigher level uncertainty reduction (typical with a same-orientation surround)

Throwing-Out Rules Goodness-of-FitThere are occasions when a staircase goes sour Most

often this happens when the subjectrsquos sensitivity changesin the middle of a run possibly because of fatigue It istherefore good to have a well-specified rule that allowsnonstationary runs to be thrown out The throwing-out rulecould be based on whether the PF slope estimate is toolow or too high Alternatively a standard approach is tolook at the chi-square goodness-of-f it statistic whichwould involve an assumption about the shape of the un-derlying PF I have a commentary in Section III regard-ing biases in the chi-square metric that were found byWichmann and Hill (2001a) Biases make it difficult touse standard chi-square tables The trouble with this ap-proach of basing the goodness-of-fit just on the cumulateddata is that it ignores the sequential history of the run TheBrownian upndashdown staircase makes the sequential historyvividly visible with a plot of sequentially tested levelsWhen fatigue or adaptation sets in one sees the tested lev-els shift from one stable point to another Leek Hanna andMarshall (1991) developed a method of assessing the non-stationarity of a psychometric function over the course ofits measurement using two interleaved tracks EarlierHall (1983) also suggested a technique to address the sameproblem I encourage further research into the develop-ment of a non-stationarity statistic that can be used to dis-card bad runs This statistic would take the sequential his-tory into account as well as the PF shape and the chi-square

The development of a ldquothrowing-outrdquo rule is the ex-treme case of the notion of weighted averaging One im-portant reason for developing good estimators for good-ness-of-fit and good estimators for threshold variance isto be able to optimally combine thresholds from repeatedexperiments I shall return to this issue in connectionwith the Wichmann and Hill (2001a) analysis of bias ingoodness-of-fit metrics

Final Thought The Method Should Depend on the Situation

Also important is that the method chosen should be ap-propriate for the task For example suppose one is usingwell-experienced observers for testing and retesting sim-ilar conditions in which one is looking for small changesin threshold or one is interested in the PF shape In suchcases one has a good estimate of the expected thresholdand the signal detection method of constant stimuli withblanks and with ratings might be best On the other handfor clinical testing in which one is highly uncertain aboutan individualrsquos threshold and criterion stability a 2AFClikelihood or staircase method (with a confidence rating)might be best

MEASURING THE PSYCHOMETRIC FUNCTION 1439

III DATA-FITTING METHODS FOR ESTIMATING THRESHOLD AND SLOPEPLUS GOODNESS-OF-FIT ESTIMATION

This final section of the paper is concerned with whatto do with the data after they have been collected Wehave here the standard Bayesian situation Given the PFit is easy to generate samples of data with confidence in-tervals But we have the inverse situation in which we aregiven the data and need to estimate properties of the PFand get confidence limits for those estimates The pair ofarticles in this special issue by Wichmann and Hill (2001a2001b) as well as the articles by King-Smith et al (1994)Treutwein (1995) and Treutwein and Strasburger (1999)do an outstanding job of introducing the issues and providemany important insights For example Emerson (1986)and King-Smith et al emphasize the advantages of meanlikelihood over maximum likelihood The speed of mod-ern computers makes it just as easy to calculate meanlikelihood as it is to calculate maximum likelihood As an-other example Treutwein and Strasburger (1999) demon-strate the power of using Bayesian priors for estimatingparameters by showing how one can estimate all four pa-rameters of Equation 1A in fewer than 100 trials This im-portant finding deserves further examination

In this section I will focus on four items First I will dis-cuss the relative merits of parametric versus nonparamet-ric methods for estimating the PF properties The paperby Miller and Ulrich (2001) provides a detailed analysis ofa nonparametric method for obtaining all the moments ofthe PF I will discuss some advantages and limitations oftheir method Second is the question of goodness-of-fitWichmann and Hill (2001a) show that when the numberof trials is small the goodness-of-fit can be very biasedI will examine the cause of that bias Third is the issue dis-cussed by Wichmann and Hill (2001a) that of how lapseseffect slope estimates This topic is relevant to Strasburg-errsquos (2001b) data and analysis Finally there is the possiblycontroversial topic of an upward slope bias in adaptivemethods that is relevant to the papers of Kaernbach (2001b)and Strasburger (2001b)

Parametric Versus Nonparametric FitsThe main split between methods for estimating pa-

rameters of the psychometric function is the distinctionbetween parametric and nonparametric methods In aparametric method one uses a PF that involves severalparameters and uses the data to estimate the parametersFor example one might know the shape of the PF and es-timate just the threshold parameter McKee et al (1985)show how slope uncertainty affects threshold uncertaintyin probit analysis The two papers by Wichmann and Hill(2001a 2001b) are an excellent introduction to many is-sues in parametric fitting An example of a nonparamet-ric method for estimating threshold would be to averagethe levels of a Brownian staircase after excluding the firstfour or so reversals in order to avoid the starting level biasBefore I served as guest editor of this special symposium

issue I believed that if one had prior knowledge of the exactPF shape a parametric method for estimating threshold wasprobably better than nonparametric methods After read-ing the articles presented here as well as carrying out myrecent simulations I no longer believe this I think thatin the proper situation nonparametric methods are as goodas parametric methods Furthermore there are situationsin which the shape of the PF is unknown and nonparamet-ric methods can be better

Miller and Ulrichrsquos Article on the Nonparametric SpearmanndashKaumlrber Method of Analysis

The standard way to extract information from psycho-metric data is to assume a functional shape such as a Wei-bull cumulative normal logit dcent and then vary the pa-rameters until likelihood is maximized or chi-square isminimized Miller and Ulrich (2001) propose using theSpearmanndashKaumlrber (SK) nonparametric method whichdoes not require a functional shape They start with a prob-ability density function defined by them as the derivativeof the PF

(33)

where s(i) is the stimulus intensity at the ith level andP(i) is the probability correct at the that level The notationpdf(i) is meant to indicate that it is the pdf for the intervalbetween i ndash 1 and i Miller and Ulrich give the r th momentof this distribution as their Equation 3

(34)

The next four equations are my attempt to clarify their ap-proach I do this because when applicable it is a finemethod that deserves more attention Equation 34 prob-ably came from the mean value theorem of calculus

(35)

where f (s) 5 sr11 and f cent(smid) 5 (r 1 1)smidr is the deriv-

ative of f with respect to s somewhere within the intervalbetween s(i 1) and s(i) For a slowly varying functionsmid would be near the midpoint Given Equation35 Equa-tion 34 becomes

(36)

where Ds 5 s(i) s(i 1) Equation 36 is what one wouldmean by the rth moment of the distribution The case r 50 is important since it gives the area under the pdf

(37)

by using the definition in Equation 33 The summation inEquation 37 gives the area under the pdf For it to be aproper pdf its area must be unity which means that theprobability at the highest level P(imax) must be unity andthe probability at the lowest level Pmin must be zero asMiller and Ulrich (2001) point out

m0 = =aringpdf( ) ( ) ( ) max mini s P i P ii

D

mrr

i

i s s= aring pdf mid( ) D

f s i f s i f s s i s i( ) ( ) ( ) ( ) ( ) [ ] [ ] = cent [ ]1 1mid

mrr r

ri s i s i=

+ [ ]aring + +11

11 1pdf( ) ( ) ( )

pdf( )( ) ( )

( ) ( )i

P i P i

s i s i= 1

1

1440 KLEIN

The next simplest SK moment is for r 5 1

(38)

from Equation 33 This first moment provides an esti-mate of the centroid of the distribution corresponding tothe center of the psychometric function It could be usedas an estimate of threshold for detection or PSE for dis-crimination An alternate estimator would be the medianof the pdf corresponding to the 50 point of the PF (theproper definition of PSE for discrimination)

Miller and Ulrich (2001) use Monte Carlo simulationsto claim that the SK method does a better job of extract-ing properties of the psychometric function than do para-metric methods One must be a bit careful here since intheir parametric fitting they generate the data by usingone PF and then fit it with other PFs But still it is a sur-prising claim if this simple method is so good why havepeople not been using it for years There are two possibleanswers to this question One is that people never thoughtof using it before Miller and Ulrich The other possibil-ity is that it has problems I have concluded that the an-swer is a combination of both factors I think that there isan important place for the SK approach but its limitationsmust be understood That is my next topic

The most important limitation of the SK nonparamet-ric method is that it has only been shown to work well formethod of constant stimulus data in which the PF goesfrom 0 to 1 I question whether it can be applied with thesame confidence to 2AFC and to adaptive procedures withunequal numbers of trials at each level The reason for thislimitation is that the SK method does not offer a simpleway to weight the different samples Let us consider twosituations with unequal weighting adaptive methods and2AFC detection tasks

Adaptive methods place trials near threshold but withvery uneven distribution of number of trials at differentlevels The SK method gives equal weighting to sparselypopulated levels that can be highly variable thus con-tributing to a greater variance of parameter estimates Letus illustrate this point for an extreme case with five levelss 5 [1 2 3 4 5] Suppose the adaptive method placed [11 1 98 1] trials at the five levels and the number correctwere [0 1 1 49 1] giving probabilities [0 1 1 5 1] andpdf 5 [1 0 5 5] The following PEST-like (Taylor ampCreelman 1967) rules could lead to this unusual dataMove down a level if the presently tested level has greaterthan P1 correct move up three levels if the presentlytested level has less than P2 correct P1 can be anynumber less than 33 and P2 can be any number greaterthan 67 The example data would be produced by start-ing at Level 5 and getting four in a row correct followedby an error followed by a wide set possible of alternat-ing responses for which the PEST rules would leave thefourth level as the level to be tested The SK estimate of the

center of the PF is given by Equation 38 to be micro1 5 200Since a pdf with a negative probability is distasteful tosome one can monotonize the PF according to the proce-dure of Miller and Ulrich (which does include the numberof trials at each level) The new probabilities are [0 5151 51 1] with pdf 5 [51 0 0 49] The new estimate ofthreshold is micro1 5 297 However given the data with 4998correct at Level 4 an estimate that includes a weightingaccording to the number of trials at each level would placethe centroid of the distribution very close to micro1 5 4 Thuswe see that because the SK procedure ignores the numberof trials at each level it produces erroneous estimates ofthe PF properties This example with shifting thresholdswas chosen to dramatize the problems of unequal weight-ing and the effect of monotonizing

A similar problem can occur even with the method ofconstant stimuli with equal number of trials at each levelbut with unequal binomial weighting as occurs in 2AFCdata The unequal weighting occurs because of the 50lower asymptote The binomial statistics error bar of dataat 55 correct is much larger than for data at 95 cor-rect Parametric procedures such as probit analysis giveless weighting to the lower data This is why the most ef-ficient placement of trials in 2AFC is above 90 correctas I have discussed in Section II Thus it is expected thatfor 2AFC the SK method would give threshold estimatesless precise than the estimates given by a parametricmethod that includes weighting

I originally thought that the SK method would fail on alltypes of 2AFC data However in the discussion followingEquation 7 I pointed out that for 2AFC discriminationtasks there is a way of extending the data to the 0ndash100range that produces weightings similar to those for a yesnotask Rather than use the standard Equation 2 to deal withthe 50 lower asymptote one should use the method dis-cussed in connection with Figure 3 in which one can usefor the ordinate the percent of correct responses in Inter-val 2 and use the signal strength in Interval 2 minus thatin Interval 1 for the abscissa That procedure produces aPF that goes from 0 to 100 thereby making it suit-able for SK analysis while also removing a possible bias

Now let us examine the SK method as applied to datafor which it is expected to work such as discriminationPFs that go from 0 to 100 These are tasks in which thelocation of the psychometric function is the PSE and theslope is the threshold

A potentially important claim by Miller and Ulrich(2001) is that the SK method performs better than probitanalysis This is a surprising claim since their data aregenerated by a probit PF one would expect probit analy-sis to do well especially since probit analysis does espe-cially well with discrimination PFs that go from 0 to 100To help clarify this issue Miller and I did a number ofsimulations We focused on the case of N 5 40 with 8 tri-als at each of 5 levels The cumulative normal PF (Equa-tion 14) was used with equally spaced z scores and with thelowest and highest probabilities being 1 and 99 Theprobabilities at the sample points are P(i) 5 1 1222

m1

2 21

2

11

2

=

= [ ] +

aring

aring

pdf( )( ) ( )

( ) ( )( ) ( )

is i s i

P i P is i s i

MEASURING THE PSYCHOMETRIC FUNCTION 1441

50 8778 and 99 corresponding to z scores ofz(i) 5 ndash2328 1164 0 1164 and 2328 Nonmonot-onicity is eliminated as Miller and Ulrich suggest (seethe Appendix for Matlab Code 4 for monotonizing) I didMonte Carlo simulations to obtain the standard deviationof the location of the PF micro1 given by Equation38 I foundthat both the SK method and the parametric probit method(assuming a fixed unity slope) gave a standard deviationof between 028 and 029 Miller and Ulrichrsquos result re-ported in their Table 17 gives a standard error of 0071However their Gaussian had a standard deviation of 025whereas my z-score units correspond to a standard devi-ation of unity Thus their standard error must be multipliedby 4 giving a standard error of 0284 in excellent agree-ment with my value So at least in this instance the SKmethod did not do better than the parametric method (withslope fixed) and the surprise mentioned at the beginningof this paragraph did not materialize I did not examineother examples where nonparametric methods might dobetter than the matched parametric method However thesimulations of McKee et al (1985) give some insight intowhy probit analysis with floating slope can do poorly evenon data generated from probit PFs McKee et al showthat when the number of trials is low and if stimulus lev-els near both 0 and 100 are not tested the slope can bevery flat and the predicted threshold can be quite distantIf care is not taken to place a bound on these probit esti-mates one would expect to find deviant threshold esti-mates The SK method on the other hand has built-inbounds for threshold estimates so outliers are avoidedThese bounds could explain why SK can do better thanprobit When the PF slope is uncertain nonparametricmethods might work better than parametric methods adecided advantage for the SK method

It is instructive to also do an analytic calculation of thevariance based on an analysis similar to what was doneearlier If only the ith level of the five had been testedthe variance of the PF location (the PSE in a discrimina-tion task) would have been as follows (Gourevitch ampGalanter 1967)

(39)

where var(P) is given by Equation 27 and PF slope is

(40)

where for the cumulative normal dzi dxi 5 1s where sis the standard deviation of the pdf and

(41)

from Equation 3A By combining these equations one gets

(42)

Note that if all trials were placed at the optimal stimuluslocation of z 5 0 (P 5 5) Equation 42 would becomes2p 2Ni as discussed earlier When all five levels arecombined the total variance is smaller than each individ-ual variance as given by the variance summation formula

(43)

Finally upon plugging in the five values of zi rangingfrom ndash2328 to 12328 and Pi ranging from 1 to 99the standard error of the PF location is

(44)

This value is precisely the value obtained in our MonteCarlo simulations of the nonparametric SK method andalso by the slope fixed parametric probit analysis Thistriple confirmation shows that the SK method is excel-lent in the realm where it applies (equal number of trialsat levels that span 0 to 100) It also confirms our sim-ple analytic formula

The data presented in Miller and Ulrichrsquos Table 17 giveus a fine opportunity to ask about the efficiency of themethod of constant stimuli that they use For the N 5 40example discussed here the variance is

(45)

This value is more than twice the variance of p (2 Ntot)that is obtained for the same PF using adaptive methodsincluding the simple upndashdown staircase that place trialsnear the optimal point

The large variance for the Miller and Ulrich stimulusplacement is not surprising given that of the five stimu-lus levels the two at P 5 1 and 99 are quite inefficientThe inefficient placement of trials is a common problemfor the method of constant stimuli as opposed to adaptivemethods However as I mentioned earlier the method ofconstant stimuli combined with multiple response ratingsin a signal detection framework may be the best of allmethods for detection studies for revealing properties of thesystem near threshold while maintaining a high efficiency

A curious question comes up about the optimal placementof trials for measuring the location of the psychometric func-tion With the original function the optimal placement is atthe steepest part of the function (at P 5 0) However withthe SK method when one is determining the location ofthe pdf one might think that the optimal placement of sam-ples should occur at the points where the pdf is steepestThis contradiction is resolved when one realizes that onlythe original samples are independent The pdf samples areobtained by taking differences of adjacent PF values soneighboring samples are anticorrelated Thus one cannot

var tottot

= =SEN

2 3 24

SE = =var tot12 0 2844

var var tot = aring1 1i

var( )exp

PSE i i i

i

i

P Pz

N= ( ) ( )

2 12

2

ps

dP

dz

zi

i

iaeligegraveccedil

oumloslashdivide

=( )2 2

2

exp

p

dP

dx

dP

dz

dz

dxi

i

i

i

i

i

= times

var( )var

PSE i

i

i

i

P

dP

dx

=( )

aeligegraveccedil

oumloslashdivide

2

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

1422 KLEIN

nonsurprising and for the most part did not shake my pre-vious view of the world I feel comfortable with them andfeel no urge to shower them with words On the other handthere were a number of articles in this issue whose re-sults surprised me They caused me to stop and considerwhether my previous thinking had been wrong or whetherthe article was wrong Among the present articles sixcontained surprises (1) the Miller and Ulrich (2001) non-parametric method for analyzing PFs (2) the Strasburger(2001b) finding of extremely steep psychometric func-tions for letter discrimination (3) the Strasburger (2001a)new definition of psychometric function slope (4) the Wich-mann and Hill (2001a) analysis of biased goodness-of-fitand bias due to lapses (5) the Kaernbach (2001a) modifi-cation to 2AFC and (6) the Kaernbach (2001b) analysisof why staircase methods produce slope estimates that aretoo steep The issues raised in these articles are instruc-tive and they constitute the focus of my commentaryItem 3 is covered in Section I Item 5 will be discussed inSection II and the remainder are discussed in Section IIIA detailed overview of the main conclusions will be pre-sented in the summary at the end of this paper That mightbe a good place to begin

When I began working on this article more and morethreads captured my attention causing the article to be-come uncomfortably large and diffuse In discussing thissituation with the editor I decided to split my article intotwo publications The first of them is the present articlefocused on the nine articles of this special issue and in-cluding a number of overview items useful for comparingthe different types of PFs that the present authors use Thesecond paper will be submitted for publication in Percep-tion amp Psychophysics in the future (Klein 2002)

The articles in this special issue of Perception amp Psycho-physics do not cover all facets of the PF Here I should liketo single out four earlier articles for special mention King-Smith Grigsby Vingrys Benes and Supowit (1994)provide one of the most thoughtful approaches to likeli-hood methods with important new insights Treutweinrsquos(1995) comprehensive well-organized overview of adap-tive methods should be required reading for anyone inter-ested in the PF Kontsevich and Tylerrsquos (1999) adaptivemethod for estimating both threshold and slope is prob-ably the best algorithm available for that task and should belooked at carefully Finally the paper with which I ammost familiar in this general area is McKee Klein andTellerrsquos (1985) investigation of threshold confidence lim-its in probit fits to 2AFC data In looking over these papersI have been struck by how Treutwein (1995) King-Smithet al (1994) and McKee et al (1985) all point out prob-lems with the 2AFC methodology a theme I will con-tinue to address in Section II of this commentary

I TYPES OF PSYCHOMETRIC FUNCTIONS

The Probability-Based (High-Threshold) Correction for Guessing

The PF is commonly written as follows

(1A)

where g 5 P(0) is the lower asymptote 1 l is the upperasymptote p(x) is the PF that goes from 0 to 100 andP(x) is the PF representing the data that goes from g to1 l The stimulus strength x typically goes from 0 to alarge value for detection and from large negative to largepositive values for discrimination For discriminationtasks where P(x) can go from 0 (for negative stimulusvalues) to 100 (for positive values) there is typicallysymmetry between the negative and positive range sothat l 5 g Unless otherwise stated I will for simplicityignore lapses (errors made to perceptible stimuli) andtake l 5 0 so that P(x) becomes

(1B)

In the Section III commentary on Strasburger (2001b) andWichmann and Hill (2001a) I will discuss the benefit ofsetting the lapse rate l to a small value (like l 5 1)rather than 0 (or the 001 value that Strasburger used)to minimize slope bias

When coupled with a high threshold assumption Equa-tion 1B is powerful in connecting different methodologiesThe high threshold assumption is that the observer is in oneof two states detect or not detect The detect state occurswith probability p(x) If the stimulus is not detected thenone guesses with the guess rate g Given this assumptionthe percent correct will be

(1C)

which is identical to Equation 1B The first term corre-sponds to the occasions when one detects the stimulus andthe second term corresponds to the occasions when onedoes not

Equation 1 is often called the correction for guessingtransformation The correction for guessing is clearer ifEquation 1B is rewritten as

(2)

The beauty of Equation 2 together with the high thresholdassumption is that even though g 5 P(0) can change thefundamental PF p(x) is unchanged That is one can alterg by changing the number of alternatives in a forcedchoice task [g 5 1(number of alternatives)] or one canalter the false alarm rate in a yesno task p(x) remains un-changed and recoverable through Equation 2

Unfortunately when one looks at actual data one willdiscover that p(x) does change as g changes for both yesno and forced choice tasks For this and other reasons thehigh threshold assumption has been discredited A mod-ern method signal detection theory for doing the cor-rection for guessing is to do the correction after a z-scoretransformation I was surprised that signal detection the-ory was barely mentioned in any of the articles constitut-ing this special issue I consider this to be sufficiently im-portant that I want to clarify it at the outset Before I canintroduce the newer approach to the correction for guess-ing the connection between probability and z-score isneeded

p xP x P

P( )

( ) ( )

( )=

0

1 0

P x p x p x( ) ( ) ( ) = + [ ]g 1

P x p x( ) ( ) ( ) = +g g1

P x p x( ) ( ) ( ) = +g l g1

MEASURING THE PSYCHOMETRIC FUNCTION 1423

The Connection Between Probability and z-ScoreThe Gaussian distribution and its integral the cumula-

tive normal function play a fundamental role in many ap-proaches to PFs The cumulative normal function F(z)is the function that connects the z-score (z) to probability(prob)

(3A)

The function F(z) and its inverse are available in programssuch as Excel but not in Matlab For Matlab one must use

(3B)

where the error function

is a standard Matlab function I usually check that F( 1) 501587 F(0) 5 05 and F(1) 5 08413 in order to be surethat I am using erf properly The inverse cumulative nor-mal function used to go from prob to z is given by

(4)

Equations 3 and 4 do not specify whether one usesprob 5 p or prob 5 P (see Equation 1 for the distinctionbetween the two) In this commentary both definitionswill be used with the choice depending on how one doesthe correction for guessingmdashour next topic The choiceprob 5 p(x) means that a cumulative normal function isbeing used for the underlying PF and that the correctionfor guessing is done as in Equations 1 and 2 On the otherhand in a yesno task if we choose prob 5 P(x) thenone is doing the z-score transform of the PF data beforethe correction for guessing As is clarified in the nextsection this procedure is a signal detection ldquocorrectionfor guessingrdquo and the PF will be called the dcent functionThe distinction between the two methods for correctionfor guessing has generated confusion and is partly re-sponsible for why many researchers do not appreciatethe simple connection between the PF and dcent

The z-Score Correction for Bias and Signal Detection Theory YesNo

Equation 2 is a common approach to correction forguessing in both yesno and forced choice tasks and itwill be found in many of the articles in this special issueIn a yesno method for detection the lower asymptoteP(0) 5 g is the false alarm rate the probability of say-ing ldquoyesrdquo when a blank stimulus is present In the past oneinstructed subjects to keep g low A few blank catch tri-als were included to encourage subjects to maintain theirlow false alarm rate Today the correction is done using az-score ordinate and it is now called the correction for re-sponse bias or simply the bias correction One uses as

many trials at the zero level as at other levels one encour-ages more false alarms and the framework is called signaldetection theory

The z-score (dcent ) correction for bias provides a direct butnot well appreciated connection between the yesno psy-chometric function and the signal detection dcent The two stepsare as follows (1) Convert the percent correct P(x) to zscores using Equation 4 (2) Do the correction for bias bychoosing the zero point of the ordinate to be the z scorefor the point x 5 0 Finally give the ordinate the name dcentThis procedure can be written as

(5)

For a detection task z(x) is called the z score of the hit rateand z(0) is called the z score of the lower asymptote (g) orthe false alarm rate It is the false alarm rate because thestimulus at x 5 0 is the blank stimulus In order to be ableto do this correction for bias accurately one must put asmany trials at x 5 0 as one puts at the other levels Notethe similarity of Equations 2 and 5 The main differencebetween Equations 2 and 5 is whether one makes the cor-rection in probability or in z score In order to distinguishthis approach from the older yesno approach associatedwith high threshold theory it is often called the objectiveyesno method where ldquoobjectiverdquo means that the responsebias correction of Equation 5 is used

Figure 1 and its associated Matlab Code 1 in the Appen-dix illustrates the process represented in Equation2 Panel ais a Weibull PF on a linear abscissa to be introduced inEquation 12 Panel b is the z score of panel a The lowerasymptote is at z 5 1 corresponding to P 5 1587For now the only important point is that in panel b if onemeasures the curve from the bottom of the plot (z 5 1)then the ordinate becomes dcent because dcent 5 z z(0) 5z 1 1 Panels d and e are the same as panels a and b ex-cept that instead of a linear abscissa they have natural logabscissas More will be said about these figures later

I am ignoring for now the interesting question of whathappens to the shape of the psychometric function as onechanges the false alarm rate g If one uses multiple rat-ings rather than the binary yesno response one ends upwith M 1 PFs for M rating categories and each PF hasa different g For simplicity this paper assumes a unityROC slope which guarantees that the dcent function is inde-pendent of g The ROC slopes can be measured using anobjective yes no method as mentioned in Section II in thelist of advantages of the yesno method over the forcedchoice method

I bring up the dcent function (Equation 5) and signal de-tection theory at the very beginning of this commentarybecause it is an excellent methodology for measuringthresholds efficiently it can easily be extended to thesuprathreshold regime (it does not saturate at P 5 1) andit has a solid theoretical underpinning Yet it is barelymentioned in any of the articles in this issue So thereader needs to keep in mind that there is an alternativeapproach to PFs I would strongly recommend the bookDetection Theory A Userrsquos Guide (Macmillan amp Creel-

cent =d x z x z( ) ( ) ( ) 0

z = =F 1 2 2 1(prob erfinv prob) ( )

erf )( exp x dy yx

= ( )ograve2 0 5 2

0

p

proberf

= =+ aelig

egraveccediloumloslashdivide

F( ) z

z12

2

prob = =aelig

egraveccedil

ouml

oslashdivide

yenograveF( ) ( ) exp z dy

yz

22

0 52

p

1424 KLEIN

man 1991) for anyone interested in the signal detectionapproach to psychophysical experiments (including theeffect of nonunity ROC slopes)

The z-Score Correction for Bias and Signal Detection Theory 2AFC

One might have thought that for 2AFC the connectionbetween the PF and dcent is well establishedmdashnamely (Greenamp Swets 1966)

(6)

where z(x) is the z score of the average of P1(x) for cor-rect judgments in Interval 1 and P2(x) for correct judg-ments in Interval 2 Typically this average P(x) is calcu-

lated by dividing the total number of correct trials by thetotal number of trials It is generally assumed that the2AFC procedure eliminates the effect of response bias onthreshold However in this section I will argue that the dcentas defined in Equation 6 is affected by an interval biaswhen one interval is selected more than the other

I have been thinking a lot about response bias in 2AFCtasks because of my recent experience as a subject in atemporal 2AFC contrast discrimination study where con-trast is defined as the change in luminance divided bythe background luminance In these experiments I noticedthat I had a strong tendency to choose the second intervalmore than the first The second interval typically appearssubjectively to be about 5 higher in contrast than it re-ally is Whether this is a perceptual effect because the in-

cent =d x z x( ) ( ) 2

0 1 2 30

05

1(a)

Wei

bull

with

bet

a =

2

0 1 2 32

0

2

4(b)

z-sc

ore

of W

eibu

ll (d

cent 1

)

0 1 2 305

1

15

2(c)

logl

og s

lope

of d

cent

stimulus in threshold units

2 1 0 10

05

1(d)

2 1 0 12

0

2

4(e)

2 1 0 105

1

15

2

stimulus in natural log units

(f)

dcent o

f Wei

bull

1

3

0

z-sc

ore

of W

eibu

ll (d

cent 1

)lo

glog

slo

pe o

f dcent

Wei

bull

with

bet

a =

2

dcent o

f Wei

bull

1

3

0

Figure 1 Six views of the Weibull function Pweibull 5 1 (1 g )exp( x tb ) where g 5 01587 b 5 2 and xt is the stimulus strength in

threshold units Panels andashc have a linear abscissa with xt 5 1 being the threshold Panels dndashf have a natural log abscissa with yt 5 0being the threshold In panels a and d the ordinate is probability The asterisk in panel d at yt 5 0 is the point of maximum slope on alogarithmic abscissa In panels b and e the ordinate is the z score of panels a and d The lower asymptote is z 5 1 If the ordinate isredefined so that the origin is at the lower asymptote the new ordinate shown on the right of panels b and e is dcent(xt) 5 z(xt) z(0) cor-responding to the signal detection dcent for an objective yesno task In panels c and f the ordinate is the logndashlog slope of dcent At xt 5 0 thelogndashlog slope 5 b The logndashlog slope falls rapidly as the stimulus strength approaches threshold The Matlab program that generated thisfigure is Appendix Code 1

MEASURING THE PSYCHOMETRIC FUNCTION 1425

tervals are too close together in time (800 msec) or a cog-nitive effect does not matter for the present article Whatdoes matter is that this bias produces a downward bias indcent With feedback lots of practice and lots of experiencebeing a subject I was able to reduce this interval bias andequalize the number of times I responded with each in-terval Naiumlve subjects may have a more difficult time re-ducing the bias In this section I show that there is a verysimple method for removing the interval bias by con-verting the 2AFC data to a PF that goes from 0 to 100

The recognition of bias in 2AFC is not new Green andSwets (1966) in their Appendix III34 point out that thebias in choice of interval does result in a downward biasin dcent However they imply that the effect of this bias issmall and can typically be ignored I should like to ques-tion that implication by using one of their own examplesto show that the bias can be substantial

In a run of 200 trials the Green and Swets example(Green amp Swets 1966 p 410) has 95 out of 100 correctwhen the test stimulus is in the second interval (z2 51645) and 50 out of 100 correct (z1 5 0) when the stim-ulus is in the first interval The standard 2AFC way to an-alyze these data would be to average the probabilities(95 1 50) 2 5 725 correct (zcorrect 5 0598) cor-responding to dcent 5 zIuml2 5 0845 However Green andSwets (p 410) point out that according to signal detec-tion theory one should analyze this data by averaging thez scores rather than averaging the probabilities or

(7)

The ratio between these two ways of calculating dcent is11630845 5 1376 Since dcent is approximately linearlyrelated to signal strength in discrimination tasks this 38reduction in dcent corresponds to an erroneous 38 increasein predicted contrast discrimination threshold when onecalculates threshold the standard way Note that if therehad been no bias so that the responses would be approx-imately equally divided across the two intervals then z2z1 and Equation 7 would be identical to the more familiarEquation 6 Since bias is fairly common especially amongnew observers the use of Equation 7 to calculate dcent seemsmuch more reasonable than using Equation 6 It is surpris-ing that the bias correction in Equation 7 is rarely used

Green and Swets (1966) present a different analysis In-stead of comparing dcent values for the biased versus nonbi-ased conditions they convert the dcents back to percent cor-rect The corrected percent correct (corresponding to dcent 51163) is 795 In terms of percent correct the bias seemsto be a small effect shifting percent correct a mere 7 from725 to 795 However the dcent ratio of 138 is a bettermeasure of the bias since it is directly related to the errorin discrimination threshold estimates

A further comment on the magnitude of the bias may beuseful The preceding subsection discussed the criterionbias of yesno methods which contributes linearly to dcent

(Equation 5) The present section discusses the 2AFC in-terval bias that contributes quadratically to dcent Thus forsmall amounts of bias the decrease in dcent is negligibleHowever as I have pointed out the bias can be large enoughto make a significant contribution to dcent

It is instructive to view the bias correction for the fullPF corresponding to this example Cumulative normalPFs are shown in the upper left panel of Figure 2 Thecurves labeled C1 and C2 are the probability correct forthe first and second intervals I1 and I2 The asterisks cor-respond to the stimulus strength used in this example at50 and 95 correct The dotndashdashed line is the aver-age of the two PFs for the individual intervals The slopeof the PF is set by the dotndashdashed line (the averageddata) being at 50 (the lower asymptote) for zero stim-ulus strength The lower left panel is the z-score version ofthe upper panel The dashed line is the average of the twosolid lines for z scores in I1 and I2 This is the signal de-tection method of averaging that Green and Swets (1966)present as the proper way to do the averaging The dotndashdashed line shown in the upper left panel is the z score ofthe average probability Notice that the z score for the av-eraged probability is lower than the averaged z score in-dicating a downward bias in dcent due to the interval bias asdiscussed at the beginning of this section The right pairof panels are the same as the left pair except that insteadof plotting C1 we plot 1 C1 the probability of re-sponding I2 incorrectly The dashed line is half the dif-ference between the two solid lines The other differenceis that in the lower right panel we have multiplied thedashed and dashndashdotted lines by Iuml2 so that these linesare dcent values rather than z scores

The final step in dealing with the 2AFC bias is to flipthe 1 C1 curve horizontally to negative abscissa valuesas in Figure 3 The ordinate is still the probability correctin interval 2 The abscissa becomes the difference instimulus strengths between I2 and I1 The flipped branchis the probability of responding I2 when the I2 stimulusstrength is less than that of I1 (an incorrect response)Figure 3a with the ordinate going from 0 to 100 is theproper way to represent 2AFC discrimination data Theother item that I have changed is the scale on the abscissato show what might happen in a real experiment The or-dinate values of 50 and 95 for the Green and Swets(1966) example have been placed at a contrast differenceof 5 The negative 5 value corresponds to the case inwhich the positive test pattern is in the f irst intervalThreshold corresponds to the inverse of the PF slopeThe bottom panel shows the standard signal detectionrepresentation of the signal in I1 and I2 dcent is the distancebetween these symmetric stimuli in standard deviationunits The Gaussians are centered at 65 The verticalline at 5 is the criterion such that 50 and 95 ofthe area of the two Gaussians is above the criterion Thez-score difference of 1645 between the two Gaussiansmust be divided by Iuml2 to get dcent because each trial hadtwo stimulus presentations with independent informa-

cent =+( )

= =dz z

22

1 6452

1 1632 1

1426 KLEIN

tion for the judgment This procedure identical to Equa-tion 7 gives the same dcent as before

Three Distinctions for Clarifying PFsIn dealing with PFs three distinctions need to be made

yesno versus forced choice detection versus discrimi-nation and constant stimuli versus adaptive methodsThese distinctions are usually clear but I should like topoint out some subtleties

For the forced choice versus yesno distinction thereare two sorts of forced choice tasks The standard versionhas multiple intervals separated spatially or temporallyand the stimulus is in only one of the intervals In the otherversion one of N stimuli is shown and the observer re-

sponds with a number from 1 to N For example Strasburg-er (2001b) presented 1 of 10 letters to the observer in a10AFC task Yesno tasks have some similarity to the lattertype of forced choice task Consider for example a detec-tion experiment in which one of five contrasts (includinga blank) are presented to the observer and the observer re-sponds with numbers from 1 to 5 This would be classifiedas a rating scale method of constant stimuli yesno tasksince only a single stimulus is presented and the rating isbased on a one-dimensional intensity

The detectiondiscrimination distinction is usually basedon whether the reference stimulus is a natural zero pointFor example suppose the task is to detect a high spatial fre-quency test pattern added to a spatially identical reference

0 1 2 30

2

4

6

8

1

C1C2

average

(a)

prob

abili

ty c

orre

ct

0 1 2 30

2

4

6

8

1

1 C1

C2

diff2

(b)

prob

abili

ty o

f I2

resp

onse

0 1 2 31

0

1

2

3

4(c)

C1

C2

z-average

stimulus strength

z-sc

ore

corr

ect

0 1 2 31

0

1

2

3

4

1 C1

C2

biased d cent

(d)

unbiased d cent

stimulus strength

z-sc

ore

for

I2 r

espo

nse

average prob

Figure 2 2AFC psychometric functions with a strong bias in favor of responding Interval 2 (I2) The bias is chosen from aspecific example presented by Green and Swets (1966) such that the observer has 50 and 95 correct when the test is in I1and I2 respectively These points are marked by asterisks The psychometric function being plotted is a cumulative normal Inall panels the abscissa is xt the stimulus strength (a) The psychometric functions for probability correct in I1 and I2 are shownand labeled C1 and C2 The average of the two probabilities labeled average is the dotndashdashed line it is the curve that is usu-ally reported The diamond is at 725 the average percent correct of the two asterisks (b) Same as panel a except that insteadof showing C1 we show 1ndashC1 the probability of saying I2 when the test was in I1 The abscissa is now labeled ldquoprobability ofI2 responserdquo (c) z scores of the three probabilities in panel a An additional dashed line is shown that is the average of the C1and C2 z-score curves The diamond is the z-score of the diamond in panel a and the star is the z-score average of the two panel casterisks (d) The sign of the C2 curve in panel c is flipped to correspond to panel b The dashed and dotndashdashed lines of panel chave been multiplied by Iuml2 in panel d so that they become dcent

MEASURING THE PSYCHOMETRIC FUNCTION 1427

pattern If the reference pattern has zero contrast the taskis detection If the reference pattern has a high contrast thetask is discrimination Klein (1985) discusses these tasks interms of monopolar and bipolar cues For discriminationa bipolar cue must be available whereby the test pattern canbe either positive or negative in relation to the reference Ifone cannot discriminate the negative cue from the positivethen it can be called a detection task

Finally the constant stimuli versus adaptive method dis-tinction is based on the formerrsquos having preassigned testlevels and the latterrsquos having levels that shift to a desiredplacement The output of the constant stimulus method is

a full PF and is thus fully entitled to be included in this spe-cial issue The output of adaptive methods is typically onlya single number the threshold specifying the location butnot the shape of the PF Two of the papers in this issue(Kaernbach 2001b Strasburger 2001b) explore the pos-sibility of also extracting the slope from adaptive data thatconcentrates trials around one level Even though adaptivemethods do not measure much about the PF they are sopopular that they are well represented in this special issue

The 10 rows of Table 1 present the articles in this specialissue (including the present article) Columns 2ndash5 corre-spond to the four categories associated with the first two

Figure 3 2AFC discrimination PF from Figure 2 has been extended to the 0 to 100 range without rescaling the or-dinate In panels a and b the two right-hand panels of Figure 2 have been modified by flipping the sign of the abscissa ofthe negative slope branch where the test is in Interval 1 (I1) The new abscissa is now the stimulus strength in I2 minus thestrength in I1 The abscissa scaling has been modified to be in stimulus units In this example the test stimulus has a con-trast of 5 The ordinate in panel a is the probability that the response is I2 The ordinate in panel b is the z score of thepanel a ordinate The asterisks are at the same points as in Figure 2 Panel c is the standard signal detection picture whennoise is added to the signal The abscissa has been modified to being the activity in I2 minus the activity in I1 The unitsof activation have arbitrarily been chosen to be the same as the units of panels a and b Activity distributions are shownfor stimuli of 5 and 15 units corresponding to the asterisks of panels a and b The subjectrsquos criterion is at 5 units ofactivation The upper abscissa is in z-score units where the Gaussians have unit variance The area under the two distri-butions above the criterion are 50 and 95 in agreement with the probabilities shown in panel a

15 10 5 0 5 10 150

5

1

prob

abili

ty o

f res

pons

e =

2

(a)

15 10 5 0 5 10 152

0

2

4

z-sc

ore

of r

espo

nse

= 2

(b)

contrast of I2 minus contrast of I1

15 10 5 0 5 10 150

05

1

activity in I2 minus activity in I1

responsewhen test in I1

(c)

response when test in I2

00822 0822z-score of activity in I2 minus activity in I1 (unit variance Gaussians)

d cent = 1645

crite

rion

16451645

resp

onse

1428 KLEIN

distinctions The last column classifies the articles accord-ing to the adaptive versus constant stimuli distinction Inorder to open up the full variety of PFs for discussion andto enable a deeper understanding of the relationship amongdifferent types of PFs I will now clarify the interrelation-ship of the various categories of PFs and their connectionto the signal detection approach

Lack of adaptive method for yesno tasks with con-trolled false alarm rate In Section II I will bring up anumber of advantages of the yesno task in comparisonwith to the 2AFC task (see also Kaernbach 1990) Giventhose yesno advantages one may wonder why the 2AFCmethod is so popular The usual answer is that 2AFC hasno response bias However as has been discussed the yesno method allows an unbiased dcent to be calculated and the2AFC method does allow an interval bias that affects dcentAnother reason for the prevalence of 2AFC experimentsis that a multitude of adaptive methods are available for2AFC but barely any available for an objective yesnotask in which the false alarm rate is measured so that dcentcan be calculated In Table 1 with one exception the rowswith adaptive methods are associated with the forcedchoice method The exception is Leek (2001) who dis-cusses adaptive yesno methods in which no blank trialsare presented In that case the false alarm rate is notmeasured so dcent cannot be calculated This type of yesnomethod does not belong in the ldquoobjectiverdquo category of con-cern for the present paper The ldquo1990rdquo entries in Table 1refer to Kaernbachrsquos (1990) description of a staircasemethod for an objective yesno task in which an equalnumber of blanks are intermixed with the signal trialsSince Kaernbach could have used that method for Kaern-bach (2001b) I placed the ldquo1990rdquo entry in his slot

Kaernbachrsquos (1990) yesno staircase rules are simpleThe signal level changes according to a rule such as thefollowing Move down one level for correct responses ahit ltyes|signalgt or a correct rejection ltno|blankgt moveup three levels for wrong responses a miss ltno|signalgtor a false alarm ltyes|blankgt This rule similar to the onedownndashthree up rule used in 2AFC places the trials so thatthe average of the hit rate and correct rejection rate is 75

What is now needed is a mechanism to get the observer toestablish an optimal criterion that equalizes the number ofldquoyesrdquo and ldquonordquo responses (the ROC negative diagonal)This situation is identical to the problem of getting 2AFCobservers to equalize the number of responses to Inter-vals 1 and 2 The quadratic bias in dcent is the same in bothcases The simplest way to get subjects to equalize theirresponses is to give them feedback about any bias in theirresponses With equal ldquoyesrdquo and ldquonordquo responses the 75correct corresponds to a z score of 0674 and a dcent 5 2z 51349 If the subject does not have equal numbers of ldquoyesrdquoand ldquonordquo responses then the dcent would be calculated bydcent 5 zhit zfalse alarm I do hope that Kaernbachrsquos clever yesno objective staircase will be explored by others Oneshould be able to enhance it with ratings and multiple stim-uli (Klein 2002)

Forced choice detection As can be seen in the secondcolumn of Table 1 a popular category in this specialissue is the forced choice method for detection Thismethod is used by Strasburger (2001b) Kaernbach(2001a) Linschoten et al (2001) and Wichmann and Hill(2001a 2001b) These researchers use PFs based on prob-ability ordinates The connection between P and dcent is dif-ferent from the yesno case given by Equation 5 For2AFC signal detection theory provides a simple connec-tion between dcent and probability correct dcent(x) 5 Iuml2 z(x)In the preceding section I discussed the option of averag-ing the probabilities and then taking the z score (highthreshold approach) or averaging the z scores and then cal-culating the probability (signal detection approach) Thesignal detection method is better because it has a strongerempirical basis and avoids bias

For an m-AFC task with m 2 the connection betweendcent and P is more complicated than it is for m 5 2 The con-nection given by a fairly simple integral (Green amp Swets1966 Macmillan amp Creelman 1991) has been tabulatedby Hacker and Ratcliff (1979) and by Macmillan and Creel-man (1991) One problem with these tables is that they arebased on the assumption of unity ROC slope It is knownthat there are many cases in which the ROC slope is notunity (Green amp Swets 1966) so these tables connecting

Table 1Classification of the 10 Articles in This Special Issue

According to Three Distinctions Forced Choice Versus YesNo Detection Versus Discrimination Adaptive Versus Constant Stimuli

Detection Discrimination Adaptive or Source m-AFC YesNo m-AFC YesNo Constant Stimuli

Leek (2001) General loose g 3 loose g AWichmann amp Hill (2001a) 2AFC (3) (3) (3) CWichmann amp Hill (2001b) 2AFC (3) (3) (3) CLinschoten et al (2001) 2AFC AStrasburger (2001a) General 3 3 CStrasburger (2001b) 10AFC AKaernbach (2001a) General 3 AKaernbach (2001b) General (1990) 3 (1990) AMiller amp Ulrich (2001) for g 5 0 3 (for 0ndash100) 3 CKlein (present) General 3 3 3 Both

MEASURING THE PSYCHOMETRIC FUNCTION 1429

dcent and probability correct should be treated cautiouslyW P Banks and I (unpublished) investigated this topic andfound that near the P 5 50 correct point the dependenceof dcent on ROC slope is minimal Away from the 50 pointthe dependence can be strong

YesNo detection Linschoten et al (2001) comparethree methods (limits staircase 2AFC likelihood) for mea-suring thresholds with a small number of trials In themethod of limits one starts with a subthreshold stimulusand gradually increases the strength On each trial the ob-server says ldquoyesrdquo or ldquonordquo with respect to whether or not thestimulus is detected Although this is a classic yesno de-tection method it will not be discussed in this article be-cause there is no control of response bias That is blankswere not intermixed with signals

In Table 1 the Wichmann and Hill (2001a 2001b) arti-cles are marked with 3s in parentheses because althoughthese authors write only about 2AFC they mention that alltheir methods are equally applicable for yesno or m-AFCof any type (detectiondiscrimination) And their Matlabimplementations are fully general All the special issue ar-ticles reporting experimental data used a forced choicemethod This bias in favor of the forced choice methodol-ogy is found not only in this special issue it is widespreadin the psychophysics community Given the advantages ofthe yesno method (see discussion in Section II) I hopethat once yesno adaptive methods are accepted they willbecome the method of choice

m-AFC discrimination It is common practice to rep-resent 2AFC results as a plot of percent correct averagedover all trials versus stimulus strength This PF goes from50 to 100 The asymmetry between the lower andupper asymptotes introduces some inefficiency in thresh-old estimation as will be discussed Another problem withthe standard 50 to 100 plot is that a bias in choice ofinterval will produce an underestimate of dcent as has beenshown earlier Researchers often use the 2AFC method be-cause they believe that it avoids biased threshold estimatesIt is therefore surprising that the relatively simple correc-tion for 2AFC bias discussed earlier is rarely done

The issue of bias in m-AFC also occurs when m 2 InStrasburgerrsquos 10AFC letter discrimination task it is com-mon for subjects to have biases for responding with par-ticular letters when guessing Any imbalance in the re-sponse bias for different letters will result in a reductionof dcent as it did in 2AFC

There are several benefits of viewing the 2AFC dis-crimination data in terms of a PF going from 0 to 100Not only does it provide a simple way of viewing and cal-culating the interval bias it also enables new methods forestimating the PF parameters such as those proposed byMiller and Ulrich (2001) as will be discussed in Sec-tion III In my original comments on the Miller and Ulrichpaper I pointed out that because their nonparametric pro-cedure has uniform weighting of the different PF levelstheir method does not apply to 2AFC tasks The asymmet-ric binomial error bars near the 50 and 100 levelscause the uniform weighting of the Miller and Ulrich ap-

proach to be nonoptimal However I now realize that the2AFC discrimination task can be fit by a cumulative nor-mal going from 0 to 100 Because of that insight Ihave marked the discrimination forced choice column ofTable 1 for the Miller and Ulrich (2001) paper with theproviso that the PF goes from 0 to 100 Owing to thepopularity of 2AFC this modification greatly expands therelevance of their nonparametric approach

YesNo discrimination Kaernbach (2001b) and Millerand Ulrich (2001) offer theoretical articles that examineproperties of PFs that go from P 5 0 to 100 (P 5 pin Equation 1) In both cases the PF is the cumulative nor-mal (Equation 3) Although these PFs with g 5 0 couldbe for a yesno detection task with a zero false alarm rate(not plausible) or a forced choice detection task with an in-finite number of alternatives (not plausible either) I sus-pect that the authors had in mind a yesno discriminationtask (Table 1 col 5) A typical discrimination task in vi-sion is contrast discrimination in which the observer re-sponds to whether the presented contrast is greater than orless than a memorized reference Feedback reinforces thestability of the reference In a typical discrimination taskthe reference is one exemplar from a continuum of stimu-lus strengths If the reference is at a special zero pointrather than being an element of a smooth continuum thetask is no longer a simple discrimination task Zero con-trast would be an example of a special reference Klein(1985) discusses several examples which illustrate how anatural zero can complicate the analysis One might won-der how to connect the PF from the detection regime inwhich the reference is zero contrast to the discriminationregime in which the reference (pedestal) is at a high con-trast The dcent function to be introduced in Equation 20 doesa reasonably good job of fitting data across the full range ofpedestal strength going from detection to discrimination

The connection of the discrimination PF to the signaldetection PF is the same as that for yesno detection givenin Equation 5 dcent(x) 5 z(x) z(0) where z is the z scoreof the probability of a ldquogreater thanrdquo judgment In a dis-crimination task the bias z(0) is the z score of the prob-ability of saying ldquogreaterrdquo when the reference stimulus ispresented The stimulus strength x that gives z(x) 5 0 isthe point of subjective equality (x 5 PSE) If the cumu-lative normal PF of Equation 3 is used then the z scoreis linearly proportional to the stimulus strength z(x) 5(x PSE)threshold where threshold is defined to be thepoint at which dcent 5 1 I will come back to these distinc-tions between detection and discrimination PFs after pre-senting more groundwork regarding thresholds log ab-scissas and PF shapes (see Equations 14 and 17)

Definition of ThresholdThreshold is often defined as the stimulus strength that

produces a probability correct halfway up the PF If humansoperated according to a high-threshold assumption thisdefinition of threshold would be stable across different ex-perimental methods However as I discussed followingEquation 2 high-threshold theory has been discredited

1430 KLEIN

According to the more successful signal detection theory(Green amp Swets 1966) the dcent at the midpoint of the PFchanges according to the number of alternatives in aforced choice method and according to the false alarm ratein a yesno method This variability of dcent with method is agood reason not to define threshold as the halfway pointof the PF

A definition of threshold that is relatively independent ofthe method used for its measurement is to define thresholdas the stimulus strength that gives a fixed value of dcent Thestimulus strength that gives dcent 5 1 (76 correct for 2AFC)is a common definition of threshold Although I will showthat higher dcent levels have the advantage of giving more pre-cise threshold estimates unless otherwise stated I will takethreshold to be at dcent 5 1 for simplicity This definition ap-plies to both yesno and m-AFC tasks and to both detectionand discrimination tasks

As an example consider the case shown in Figure 1where the lower asymptote (false alarm rate) in a yesnodetection task is g 5 1587 corresponding to a z score ofz 5 ndash1 If threshold is defined to be at dcent 5 10 then fromEquation 5 the z score for threshold is z 5 0 correspond-ing to a hit rate of 50 (not quite halfway up the PF) Thisexample with a 50 hit rate corresponds to defining dcentalong the horizontal ROC axis If threshold had been de-fined to be dcent 5 2 in Figure 1 then the probability correctat threshold would be 8413 This example in which boththe hit rate and correct rejection rate are equal (both are8413) corresponds to the ROC negative diagonal

Strasburgerrsquos Suggestion on Specifying Slope A Logarithmic Abscissa

In many of the articles in this special issue a logarithmicabscissa such as decibels is used Many shapes of PFs havebeen used with a log abscissa Most delightful but frus-trating is Strasburgerrsquos (2001a) paper on the PF maximumslope He compares the Weibull logistic Quick cumula-tive normal hyperbolic tangent and signal detection dcentusing a logarithmic abscissa The present section is the out-come of my struggles with a number of issues raised byStrasburger (2001a) and my attempt to clarify them

I will typically express stimulus strength x in thresh-old units

(8)

where a is the threshold Stimulus strength will be ex-pressed in natural logarithmic units y as well as in lin-ear units x

(9)

where y 5 loge(x) is the natural log of the stimulus and Y5 loge(a) is the threshold on the log abscissa

The slope of the psychometric function P( yt) with a log-arithmic abscissa is

and the maximum slope is called bcent by Strasburger (2001a)A very different definition of slope is sometimes used bypsychophysicists the logndashlog slope of the dcent function aslope that is constant at low stimulus strengths for manyPFs The logndashlog dcent slope of the Weibull function is shownin the bottom pair of panels in Figure 1 The low-contrastlogndashlog slope is b for a Weibull PF (Equation 12) and bfor a dcent PF (Equation 20) Strasburger (2001a) shows howthe maximum slope using a probability ordinate is con-nected to the logndashlog slope using a dcent ordinate

A frustrating aspect of Strasburgerrsquos article is that theslope units of P (probability correct per loge) are not fa-miliar Then it dawned on me that there is a simple con-nection between slope with a loge abscissa slopelog 5[dP( yt)](dyt) and slope with a linear abscissa slopelin 5[dP(xt)](dxt) namely

(11)

because dyt dxt 5 [d loge(xt)] (dxt) 5 1xt At thresholdxt 5 1 ( yt 5 0) so at that point Strasburgerrsquos slope with alogarithmic axis is identical to my familiar slope plotted inthreshold units on a linear axis The simple connection be-tween slope on the log and linear abscissas converted me tobeing a strong supporter of using a natural log abscissa

The Weibull and cumulative normal psychometricfunctions To provide a background for Strasburgerrsquos ar-ticle I will discuss three PFs Weibull cumulative nor-mal and dcent as well as their close connections A com-mon parameterization of the PF is given by the Weibullfunction

(12)

where pweib(xt) the PF that goes from 0 to 100 is re-lated to Pweib(xt) the probability of a correct responseby Equation 1 b is the slope and k controls the definitionof threshold pweib(1) 5 1 k is the percent correct atthreshold (xt 5 1) One reason for the Weibullrsquos popular-ity is that it does a good job of fitting actual data In termsof logarithmic units the Weibull function (Equation 12) be-comes

(13)

Panel a of Figure 1 is a plot of Equation 12 (Weibull func-tion as a function of x on a linear abscissa) for the case b 52 and k 5 exp( 1) 5 0368 The choice b 5 2 makes theWeibull an upside-down Gaussian Panel d is the samefunction this time plotted as a function of y correspondingto a natural log abscissa With this choice of k the point ofmaximum slope as a function of yt is the threshold point( yt 5 0) The point of maximum slope at yt 5 0 is markedwith an asterisk in panel d The same point in panel a atxt 5 1 is not the point of maximum slope on a linear ab-scissa because of Equation 11 When plotted as a dcent func-tion [a z-score transform of P(xt)] in panel b the Weibullaccelerates below threshold and decelerates above thresh-

p y ktyt

weib( ) = ( )[ ]1exp

b

p x ktxt

weib ( ) = 1b

slopeslope

lin =( )

=( )

times =dP x

dx

dP y

dy

dy

dx xt

t

t

t

t

t t

log

slope y dPdy

dpdy

tt

t

( ) =

= ( )1 g

y x x y Yt t= ( ) = =log log e e a

x xt = a

(10A)

(10B)

MEASURING THE PSYCHOMETRIC FUNCTION 1431

old in agreement with a wide range of experimental dataThe acceleration and deceleration are most clearly seen bythe slope of dcent in the logndashlog coordinates of panel e Thelogndashlog dcent slope is plotted in panels c and f At xt 5 0 thelogndashlog slope is 2 corresponding to our choice of b 5 2The slope falls surprisingly rapidly as stimulus strength in-creases and the slope is near 1 at xt 5 2 If we had chosenb 5 4 for the Weibull function then the logndashlog dcent slopewould have gone from 4 at xt 5 0 to near 2 at xt 5 2Equation 20 will present a dcent function that captures thisbehavior

Another PF commonly used is based on the cumulativenormal (Equation 3)

(14A)

or

(14B)

where s is the standard error of the underlying Gaussianfunction (its connection to b will be clarified later) Notethat the cumulative normal PF in Equation 14 should beused only with a logarithmic abscissa because y goes to

yen needed for the 0 lower asymptote whereas theWeibull function can be used with both the linear (Equa-tion 12) and the log (Equation 13) abscissa

Two examples of Strasburgerrsquos maximum slope (hisEquations 7 and 17) are

(15)

for the Weibull function (Equation 13) and

(16)

for the cumulative normal (Equation 14) In Equation 15 Iset k in Equation 12 to be k 5 exp( 1) This choiceamounts to defining threshold so that the maximum slopeoccurs at threshold ( yt 5 0) Note that Equations 12ndash16are missing the (1 g) factor that are present in Strasburg-errsquos Equations 7 and 17 because we are dealing with prather than P Equations 15 and 16 provide a clear simpleand close connection between the slopes of the Weibull andcumulative normal functions so I am grateful to Stras-burger for that insight

An example might help in showing how Equation 16works Consider a 2AFC task (g 5 05) assuming a cumu-lative normal PF with s 5 10 According to Equation 16bcent 5 0399 At threshold (assumed for now to be the pointof maximum slope) P(xt 5 1) 5 75 At 10 abovethreshold P(xt 5 11) 75 1 01 bcent 5 07899 which isquite close to the exact value of 7896 This example showshow bcent defined with a natural log abscissa yt is relevantto a linear abscissa xt

Now that the logarithmic abscissa has been introducedthis is a good place to stop and point out that the log ab-scissa is fine for detection but not for discrimination since

the x 5 0 point is often in the middle of the range How-ever the cumulative normal PF is especially relevant to dis-crimination where the PF goes from 0 to 100 andwould be written as

(17A)

or

(17B)

where x 5 PSE is the point of subjective equality Theparameter a is the threshold since dcent(a) 5 zF(a)zF(0) 5 1 The threshold a is also s standard deviationsof the Gaussian probability density function (pdf ) Thereciprocal of a is the PF slope I tend to use the letter sfor a unitless standard deviation as occurs for the loga-rithmic variable y I use a for a standard deviation thathas the units of the stimulus x (like percent contrast) Thecomparison of Equation 14B for detection and Equation17B for discrimination is useful Although Equation 14is used in most of the articles in this special issue whichare concerned with detection tasks the techniques that I willbe discussing are also relevant to discrimination tasks forwhich Equation 17 is used

Threshold and the Weibull FunctionTo illustrate how a dcent 5 1 definition of threshold works

for a Weibull function let us start with a yesno task inwhich the false alarm rate is P(0) 5 1587 correspondingto a z score of zFA 5 1 as in Figure 1 The z score atthreshold (dcent 5 1) is zTh 5 zFA11 5 0 corresponding to aprobability of P(1) 5 50 From Equations 1 and 12 thek value for this definition of threshold is given by k 5[1 P(1)][1 P(0)] 5 05943 The Weibull function be-comes

(18A)

If I had defined dcent 5 2 to be threshold then zTh 5 zFA125 1 leading to k 5 (1 08413)(1 01587) 5 01886giving

(18B)

As another example suppose the false alarm rate in ayesno task is at 50 not uncommon in a signal detectionexperiment with blanks and test stimuli intermixed Thenthreshold at dcent 5 1 would occur at 8413 and the PFwould be

(18C)

with Pweib(0) 5 50 and Pweib(1) 5 8413 This casecorresponds to defining dcent on the ROC vertical intercept

For 2AFC the connection between dcent and z is z 5 dcent205Thus zTh 5 2 05 5 07071 corresponding to Pweib(1) 57602 leading to k 5 04795 in Equation 12 These con-nections will be clarified when an explicit form (Equation20)is given for the dcent function dcent(xt)

P xtxt

weib ( ) = 1 0 5 0 3173 b

P xtxt

weib ( ) = 1 1 0 1587 0 1886( ) b

P xtxt

weib ( ) = 1 1 0 1587 0 5943( ) b

z x xtF ( ) = PSE

a

p x P x xt tF F F( ) = ( ) = aelig

egraveoumloslash

PSEa

cent = =bs p s

12

0 399

cent = =b b bexp( ) 1 0 368

z yy y Y

tt( ) = =s s

p yy

tt

F F( ) =aeligegraveccedil

oumloslashdivides

1432 KLEIN

Complications With Strasburgerrsquos Advocacy of Maximum Slope

Here I will mention three complications implicated inStrasburgerrsquos suggestion of defining slope at the point ofmaximum slope on a logarithmic abscissa

1 As can be seen in Equation 11 the slopes on linearand logarithmic abscissas are related by a factor of 1xtBecause of this extra factor the maximum slope occursat a lower stimulus strength on linear as compared withlog axes This makes the notion of maximum slope lessfundamental However since we are usually interested inthe percent error of threshold estimates it turns out thata logarithmic abscissa is the most relevant axis support-ing Strasburgerrsquos log abscissa definition

2 The maximum slope is not necessarily at threshold(as Strasburger points out) For the Weibull functions de-fined in Equation 13 with k 5 exp( 1) and the cumula-tive normal function in Equation 14 the maximum slope(on a log axis) does occur at threshold However the twopoints are decoupled in the generalized Weibull functiondefined in Equation 12 For the Quick version of theWeibull function (Equation 13 with k 5 05 placingthreshold halfway up the PF) the threshold is below thepoint of maximum slope the derivative of P(xt) atthreshold is

which is slightly different from the maximum slope asgiven by Equation 15 Similarly when threshold is de-fined at dcent 5 1 the threshold is not at the point of max-imum slope People who fit psychometric functions wouldprobably prefer reporting slopes at the detection thresh-old rather than at the point of maximum slope

3 One of the most important considerations in selectinga point at which to measure threshold is the question ofhow to minimize the bias and standard error of the thresh-old estimate The goal of adaptive methods is to place tri-als at one level In order to avoid a threshold bias due to aimproper estimate of PF slope the test level should be atthe defined threshold The variance of the threshold esti-mate when the data are concentrated as a single level (aswith adaptive procedures) is given by Gourevitch and Gal-anter (1967) as

(19)

If the binomial error factor of P(1 P)N were not presentthe optimal placement of trials for measuring thresholdwould be at the point of maximum slope However thepresence of the P(1 P) factor shifts the optimal point toa higher level Wichmann and Hill (2001b) consider howtrial placement affects threshold variance for the methodof constant stimuli This topic will be considered in Sec-tion II

Connecting the Weibull and dcent FunctionsFigures 5ndash7 of Strasburger (2001a) compare the logndash

log dcent slope b with the Weibull PF slope b Strasburgerrsquosconnection of b 5 088b is problematic since the dcent ver-sion of the Weibull PF does not have a fixed logndashlog slopeAn improved dcent representation of the Weibull function isthe topic of the present section A useful parameterizationfor dcent which I shall refer to as the StromeyerndashFoley func-tion was introduced by Stromeyer and Klein (1974) andused extensively by Foley (1994)

(20)

where xt is the stimulus strength in threshold units as inEquation 8 The factors with a in the denominator are pres-ent so that dcent equals unity at threshold (xt 5 1 or x 5 a)At low xt dcent x t

ba The exponent b (the logndashlog slope ofdcent at very low contrasts) controls the amount of facilita-tion near threshold At high xt Equation 20 becomes dcentx t

1 w (1 a) The parameter w is the logndash log slope of thetest threshold versus pedestal contrast function (the tvc orldquodipperrdquo function) at strong pedestals One must be a bitcautious however because in yesno procedures the tvcslope can be decoupled from w if the signal detection ROCcurve has nonunity slope (Stromeyer amp Klein 1974) Theparameter a controls the point at which the dcent function be-gins to saturate Typical values of these unitless parametersare b 5 2 w 5 05 and a 5 06 (Yu Klein amp Levi 2001)The function in Equation 20 accelerates at low contrast(logndashlog slope of b) and decelerates at high contrasts (logndashlog slope of 1 w) in general agreement with a broadrange of data

For 2AFC z 5 dcentIuml2 so from Equation 3 the connec-tion between dcent and probability correct is

(21)

To establish the connection between the PFs specified inEquations12 and 20ndash21 one must first have the two agreeat threshold For 2AFC with the dcent 5 1 the threshold atP 5 7602 can be enforced by choosing k 5 04795 (seethe discussion following Equation 18C) so that Equa-tions 1 and 12 become

(22)

Modifying k as in Equation 22 leaves the Weibull shapeunchanged and shifts only the definition of threshold Ifb 5 106b w 5 1 039b and a 5 0614 then for allvalues of b the Weibull and dcent functions (Equations 21and 22 vs Equation 12) differ by less than 00004 for allstimulus strengths At very low stimulus strengths b 5b The value b 5 106b is a compromise for getting anoverall optimal fit

Strasburger (2001a) is concerned with the same issueIn his Table 1 he reports dcent logndashlog slopes of [8847

P xtxt( ) = 1 1 5 4795( )

b

P xd x

tt( ) = +

cent( )eacute

eumlecircecirc

5 5erf2

cent( ) =d xx

a a xt

tb

tb w+ 1 + 1( )

var( )( )

YP P

N

dP y

dy

t

t

=( )eacute

eumlecircecirc

12

cent =( )

= =b g b g bthresh

dP x

dx

t

t

e( )log ( )

( ) 12

2347 1

MEASURING THE PSYCHOMETRIC FUNCTION 1433

18379 3131 4421] for b 5 [1 2 35 5] If the dcent slopeis taken to be a measure of the dcent exponent b then theratio bb is [08847 09190 08946 08842] for the fourb values Our value of bb 5 106 differs substantiallyfrom 08 (Pelli 1985) and 088 (Strasburger 2001a) Pelli(1985) and Strasburger (2001a) used dcent functions with a 51 in Equation 20 With a 5 0614 our dcent function (Equa-tion 20) starts saturating near threshold in agreement withexperimental data The saturation lowers the effective valueof b I was very surprised to find that the StromeyerndashFoley dcent function did such a good job in matching theWeibull function across the whole range of b and x Fora long time I had thought a different fit would be neededfor each value of b

For the same Weibull function in a yesno method (falsealarm rate 5 50) the parameters b and w are identicalto the 2AFC case above Only the value of a shifts froma 5 0614 (2AFC) to a 5 054 (yesno) In order to makext 5 1 at dcent 5 1 k becomes 03173 (see Equation 18C)instead of 04795 (2AFC) As with 2AFC the differencebetween the Weibull and dcent function is less than 0004(lt04) for all stimulus strengths and all values of b andfor both a linear and a logarithmic abscissa These valuesmean that for both the yes no and the 2AFC methods thedcent function (Equation 20) corresponding to a Weibullfunction with b 5 2 has a low-contrast logndashlog slope ofb 5 212 and a high-contrast logndashlog slope of 1 w 5078 The high-contrast tvc (test contrast vs pedestal con-trast discrimination function sometimes called the ldquodip-perrdquo function) logndashlog slope would be w 5 022

I also carried out a similar analysis asking what cumu-lative normal s is best matched to the Weibull I founds 5 11720b However the match is poor with errors of4 (rather than lt004 in the preceding analysis) Thepoor fit of the two functions makes the fitting proceduresensitive to the weighting used in the fitting procedure Ifone uses a weighting factor of [P(1 P)N] where P is thePF one will find a quite different value for s than if oneignored the weighting or if one chose the weighting on thebasis of the data rather than the fitting function

II COMPARING EXPERIMENTAL TECHNIQUES FOR MEASURING

THE PSYCHOMETRIC FUNCTION

This section inspired by Linschoten et al (2001) is anexamination of various experimental methods for gather-ing data to estimate thresholds Linschoten et al comparethree 2AFC methods for measuring smell and taste thresh-olds a maximum-likelihood adaptive method an upndashdownstaircase method and an ascending method of limits Thespecial aspect of their experiments was that in measuringsmell and taste each 2AFC trial takes between 40 and60 sec (Lew Harvey personal communication) because ofthe long duration needed for the nostrils or mouth to re-cover their sensitivity The purpose of the Linschoten et alpaper is to examine which of the three methods is mostaccurate when the number of trials is low Linschoten et al

conclude that the 2AFC method is a good method for thistask My prior experience with forced choice tasks and lownumber of trials (Manny amp Klein 1985 McKee et al1985) made me wonder whether there might be better meth-ods to use in circumstances in which the number of trialsis limited This topic is considered in Section II

Compare 2AFC to YesNoThe commonly accepted framework for comparing

2AFC and yesno threshold estimates is signal detectiontheory Initially I will make the common assumption thatthe dcent function is a power function (Equation 20 witha 5 1)

(23)

Later in this section the experimentally more accurateform Equation 20 with a 1 will be used The varianceof the threshold estimate will now be calculated for 2AFCand yesno for the simplest case of trials placed at a sin-gle stimulus level y the goal of most adaptive methodsThe PF slope relates ordinate variance to abscissa vari-ance (from Gourevitch amp Galanter 1967)

(24)

where Y 5 y yt is the threshold estimate for a natural logabscissa as given in Equation 9 This formula for var(Y )is called the asymptotic formula in that it becomes exactin the limit of large number of total trials N Wichmannand Hill (2001b) show that for the method of constantstimuli a proper choice of testing levels brings one to theasymptotic region for N 120 (the smallest value thatthey explored) Improper choice of testing levels requiresa larger N in order to get to the asymptotic region Equa-tion 24 is based on having the data at a single level as oc-curs asymptotically in adaptive procedures In addition thedefinition of threshold must be at the point being testedIf levels are tested away from threshold there will be anadditional contribution to the threshold variance (Finney1971 McKee et al 1985) if the slope is a variable param-eter in the fitting procedure Equation 24 reaches its as-ymptotic value more rapidly for adaptive methods with fo-cused placement of trials than with the constant stimulusmethod when slope is a free parameter

Equation 24 needs analytic expressions for the deriva-tive and the variance of dcent The derivative is obtained fromEquation 23

(25)

The variance of dcent for both yesno (dcent 5 zhit 1 zcorrect rejection)and 2AFC (dcent 5 Iuml2zave) is

(26)var var( ) exp var( ) cent( ) = = ( )d z z P2 4 2p

dddy

bdt

cent = cent

var( )var( )

Yd

dddyt

= cent

centaeligegraveccedil

oumloslashdivide

2

cent = = ( )d c bytb

texp

1434 KLEIN

For the yes no case Equation 26 is based on a unityslope ROC curve with the subjectrsquos criterion at the neg-ative diagonal of the ROC curve where the hit rate equalsthe correct rejection rate This would be the most effi-cient operating point The variance for a nonunity slopeROC curve was considered by Klein Stromeyer andGanz (1974) From binomial statistics

(27)

with n 5 N for 2AFC and n 5 N2 for yesno since forthis binary yesno task the number of signal and blanktrials is half the total number of trials

Putting all of these factors together gives

(28)

for both the yesno and the 2AFC cases The only dif-ference between the two cases is the definition of n andthe definition of dcent (dcent2AFC 5 205z and dcentYN 5 2z for asymmetric criterion) When this is expressed in terms ofN (number of trials) and z (z score of probability cor-rect) we get

(29)

for both 2AFC and yesno That is when the percent cor-rects are all equal (P2AFC 5 Phit 5 Pcorrect rejection) thethreshold variance for 2AFC and yesno are equal Theminimum value of var(Y ) is 164(Nb2) occurring at z 5157 (P 5 94) This value of 94 correct is the optimumvalue at which to test for both 2AFC and yesno tasks in-dependent of b

This result that the threshold variance is the same for2AFC and yesno methods seems to disagree with Figure 4of McKee et al (1985) where the standard errors of thethreshold estimates are plotted for 2AFC and yesno asa function of the number of trials The 2AFC standarderror (their Figure 4) is twice that of yesno correspond-ing to a factor of four in variance However the McKeeet al analysis is misleading since all they (we) did for theyesno task was to use Equation 2 to expand the PF to a 0to 100 range thereby doubling the PF slope Thatrescaling is not the way to convert a 2AFC detection taskto the yesno task of the same detectability A signal de-tection methodology is needed for that conversion aswas done in the present analysis

One surprise is that the optimal testing probability isindependent of the PF slope b This finding is a generalproperty of PFs in which the dependence on the slopeparameter has the form xb The 94 level corresponds todcentYN 5 315 and dcent2AFC 5 223 For the commonly usedP 5 75 (z 5 0765 dcentYN 5 135 dcent2AFC 5 0954)var(Y ) 5 408(Nb2) which is about 25 times the opti-mal value obtained by placing the stimuli at 94 Green(1990) had previously pointed out that the 94 and 92levels were the optimal points to test for the cumulative

normal and logit functions respectively because of theminimum threshold estimate variability when one is test-ing at that point on the PF Other PFs can have optimalpoints at slightly lower levels but these are still above thelevels to which adaptive methods are normally directed

It is also useful to compare the 2AFC and yesno at thesame dcent rather than at their optimal points In that case fordcent values fixed at [1 2 3] the ratio of the 2AFC varianceto the yesno variance is [182 136 079] At low dcent thereis an advantage for the 2AFC method and that reverses athigh dcent as is expected from the optimal dcent being higher foryesno than for 2AFC In general one should run the yesno method at dcent levels above 20 (84 correct in the blankand signal intervals corresponds to dcent 5 2)

The equality of the optimal var(Y ) for 2AFC and yesno methods is not only surprising it seems paradoxicalmdashas can be seen by the following gedankenexperiment Sup-pose subjects close their eyes on the first interval of the2AFC trial and base their judgments on just the secondinterval as if they were doing a yesno task since blanksand stimuli are randomly intermixed Instead of sayingldquoyesrdquo or ldquonordquo they would respond ldquoInterval 2rdquo or ldquoInter-val 1rdquo respectively so that the 2AFC task would become ayesno task The paradox is that one would expect the vari-ance to be worse than for the eyes open 2AFC case sinceone is ignoring information However Equation29 showsthat the 2AFC and yesno methods have identical optimalvariance I suspect that the resolution of this paradox is thatin the yesno method with a stable criterion the optimalvariance occurs at a higher dcent value (315) than the 2AFCvalue (223) The dcent power law function that I chose doesnot saturate at large dcent levels giving a benefit to the yesno method

In order to check whether the paradox still holds witha more realistic PF (a dcent function that saturates at largedcent ) I now compare the 2AFC and yesno variance usinga Weibull function with a slope b In order to connect2AFC and yesno I need to use signal detection theoryand the StromeyerndashFoley dcent function (Equation 20) withthe ldquoWeibull parametersrdquo b 5 106b a 5 0614 and w 51 39b Following Equation 22 I pointed out that withthese parameter values the difference between the 2AFCWeibull function and the 2AFC dcent function (converted toa probability ordinate) is less than 0004 After the de-rivative Equation 25 is appropriately modified we findthat the optimal 2AFC variance is 342(Nb2) For the yesno case the optimal variance is 402(Nb2) This result re-solves the paradox The optimal 2AFC variance is about18 lower than the yesno variance so closing onersquos eyesin one of the intervals does indeed hurt

We shouldnrsquot forget that if one counts stimulus presen-tations Np rather than trials N the optimal 2AFC variancebecomes 684(Npb2) as opposed to 402(Npb2) for yesnoThus for the Linschoten experiments with each stimuluspresentation being slow the yesno methodology is moreaccurate for a given number of stimulus presentationseven at low dcent values Kaernbach (1990) compares 2AFC

var( ) exp( )

( )Y z

P P

N bz= ( )2

122

p

var( ) exp( )

( )Y z

P P

n bd= ( ) cent

412

2p

var( )( )

PP P

n= 1

MEASURING THE PSYCHOMETRIC FUNCTION 1435

and yesno adaptive methods with simulations and ac-tual experiments His abscissa is number of presenta-tions and his results are compatible with our findings

The objective yesno method that I have been discussingis inefficient since 50 of the trials are blanks Greaterefficiency is possible if several points on the PF are mea-sured (method of constant stimuli) with the number of blanktrials the same as at the other levels For example if fivelevels of stimuli were measured including blanks then theblanks would be only 20 of the total rather than 50 Forthe 2AFC paradigm the subject would still have to look at50 of the presentations being blanks The yesno eff i-ciency can be further improved by having the subjectrespond to the multiple stimuli with multiple ratings ratherthan a binary yesno response This rating scale methodof constant stimuli is my favorite method and I have beenusing it for 20 years (Klein amp Stromeyer 1980) The mul-tiple ratings allow one to measure the psychometric func-tion shape for dcents as large as 4 (well into the dipper regionwhere stimuli are slightly above threshold) It also allowsone to measure the ROC slope (presence of multiplicativenoise) Simulations of its benefits will be presented inKlein (2002)

Insight into how the number of responses affects thresh-old variance can be obtained by using the cumulative nor-mal PF (Equation 17) considered by Kaernbach (2001b)and Miller and Ulrich (2001) with g 5 0 Typically this isthe case of yesno discrimination but it can also be 2AFCdiscrimination with an ordinate going from 0 to 100as I have discussed in connection with Figure 3 For a cu-mulative normal PF with a binary response and for a sin-gle stimulus level being tested the threshold variance is(Equation 19) as follows

(30)

from Equation 27 and forthcoming Equations 40 and 41The optimal level to test is at P 5 05 so the optimal vari-ance becomes

(31)

If instead of a binary response the observer gives an ana-log response (many many rating categories) then the vari-ance is the familiar connection between standard deviationand standard error

(32)

With four response categories the variance is 119s2Nwhich is closer to the analog than to the binary case Ifstimuli with multiple strengths are intermixed (methodof constant stimuli) then the benefit of going from a bi-nary response to multiple responses is greater than that in-dicated in Equations 31ndash32 (Klein 2002)

A brief list of other problems with the 2AFC method ap-plied to detection comprises the following

1 2AFC requires the subject to memorize and thencompare the results of two subjective responses It is cog-nitively simpler to report each response as it arrives Sub-jects without much experience can easily make mistakes(lapses) simply becoming confused about the order of thestimuli Klein (1985) discusses memory load issues rele-vant to 2AFC

2 The 2AFC methodology makes various types ofanalysis and modeling more difficult because it requiresthat one average over all criteria The response variable in2AFC is the Interval 2 activation minus the Interval 1 ac-tivation (see the abscissa in Figure 2) This variable goesfrom minus infinity to plus infinity whereas the yesnovariable goes from zero to infinity It therefore seems moredifficult to model the effects of probability summation anduncertainty for 2AFC than for yesno

3 Models that relate psychophysical performance to un-derlying processes require information about how the noisevaries with signal strength (multiplicative noise) The rat-ing scale method of constant stimuli not only measures dcentas a function of contrast it also is able to provide an esti-mate of how the variance of the signal distribution (theROC slope) increases with contrast The 2AFC methodlacks that information

4 Both the yesno and 2AFC methods are supposed toeliminate the effects of bias Earlier I pointed out that in myrecent 2AFC discrimination experiments when the stim-ulus is near threshold I find I have a strong bias in favor ofresponding ldquoStimulus 2rdquo Until I learned to compensatefor this bias (not typically done by naiumlve subjects) my dcentwas underestimated As I have discussed it is possible tocompensate for this bias but that is rarely done

Improving the Brownian (Up ndashDown) StaircaseAdaptive methods with a goal of placing trials at an

optimal point on the PF are efficient Leek (2001) pro-vides a nice overview of these methods There are twogeneral categories of adaptive methods those based onmaximum (or mean) likelihood and those based on sim-pler staircase rules The present section is concernedwith the older staircase methods which I will callBrownian methods I used the name ldquoBrownianrdquo becauseof the upndashdown staircasersquos similarity to one-dimensionalBrownian motion in which the direction of each step israndom The familiar upndashdown staircase is Brownianbecause at each trial the decision about whether to go upor down is decided by a random factor governed by prob-ability P(x) I want to make this connection to Brownianmotion because I suspect that there are results in thephysics literature that are relevant to our psychophysicalstaircases I hope these connections get made

A Brownian staircase has a minimal memory of the runhistory it needs to remember the previous step (includingdirection) and the previous number of reversals or numberof trials (used for when to stop and for when to decreasestep size) A typical Brownian rule is to decrease signalstrength by one step (like 1 dB) after a correct responseand to increase it three steps after an incorrect response (a

var( thresh) = s 2

N

var( thresh) = timesp s2

2

N

var(var( )

( )exp thresh) =aeligegraveccedil

oumloslashdivide

= ( )P

N dPdy

P P zN2

22

2 1p s

1436 KLEIN

one downndashthree up rule) I was pleased to learn recentlythat his rule works nicely for an objective yesno task(Kaernbach 1990) as well as 2AFC as discussed earlier

In recent years likelihood methods (Pentland 1980Watson amp Pelli 1983) have become popular and havesomewhat displaced Brownian methods There is a wide-spread feeling that likelihood methods have substantiallysmaller threshold variance than do staircases in whichthresholds are obtained by simply averaging the levelstested Given the importance of this issue for informingresearchers about how best to measure thresholds it issurprising that there have been so few simulations com-paring staircases and optimal methods Pentlandrsquos (1980)results showing very large threshold variance for a stair-case method are so dramatically different from the re-sults of my simulations that I am skeptical The best pre-vious simulations of which I am aware are those of Green(1990) who compared the threshold variance from sev-eral 2AFC staircase methods with the ideal variance(Equation 24) He found that as long as the staircase ruleplaced trials above 80 correct the ratio of observed toideal threshold variance was about 14 This value is thesquare of the inverse of Greenrsquos finding that the ratio ofthe expected to the observed sigma is about 85

In my own simulations (Klein 2002) I found that aslong as the final step size was less than 02s the thresh-old variance from simply averaging over levels was closeto the optimal value Thus the staircase threshold varianceis about the same as the variance found by other optimalmethods One especially interesting finding was that thecommon method of averaging an even number of rever-sal points gave a variance that was about 10 higher thanaveraging over all levels This result made me wonderhow the common practice of averaging reversals ever gotstarted Since Green (1990) averaged over just the reversalpoints I suspect that his results are an overestimate of thestaircase variance In addition since Green specifies stepsize in decibels it is difficult for me to determine whetherhis final step size was small enough Since the thresholdvariance is inversely related to slope it is important to re-late step size to slope rather than to decibels The most nat-ural reference unit is s the standard deviation associatedwith the cumulative normal In summary I suggest that thisgeneral feeling of distrust of the simple Brownian staircaseis misguided and that simple staircases are often as goodas likelihood methods In the section Summary of Advan-tages of Brownian Staircase I point out several advantagesof the simple staircase over the likelihood methods

A qualification should be made to the claim above thataveraging levels of a simple staircase is as good as a fancylikelihood method If the staircase method uses a too smallstep size at the beginning and if the starting point is farfrom threshold then the simple staircase can be inefficientin reaching the threshold region However at the end ofthat run the astute experimenter should notice the largediscrepancy between the starting point and the thresholdand the run should be redone with an improved startingpoint or with a larger initial step size that will be reducedas the staircase proceeds

Increase number of 2AFC response categories In-dependent of the type of adaptive method used the 2AFCtask has a flaw whereby a series of lucky guesses at lowstimulus levels can erroneously drive adaptive methods tolevels that are too low Some adaptive methods with a smallnumber of trials are unable to fully recover from a string oflucky guesses early in the run One solution is to allow thesubject to have more than two responses For reasons thatI have never understood people like to have only two re-sponse categories in a 2AFC task Just as I am reluctant totake part in 2AFC experiments I am even more reluctantto take part in two response experiments The reason issimple I always feel that I have within me more than onebit of information after looking at a stimulus I usually haveat least four or more subjective states (2 bits) for a yesnotask HN LN LY HY where H and L are high and lowconfidence and N and Y are did not and did see it For a2AFC my internal states are H1 L1 L2 H2 for high andlow confidence on Intervals 1 and 2 Kaernbach (2001a) inthis special issue does an excellent job of justifying thelow-confidence response category He provides solid ar-guments simulations and experimental data on how alow-confidence category can minimize the ldquolucky guessrdquoproblem

Kaernbach (2001a) groups the L1 and L2 categoriestogether into a ldquodonrsquot knowrdquo category and argues per-suasively that the inclusion of this third response can in-crease the precision and accuracy of threshold estimateswhile also leaving the subject a happier person He callsit the ldquounforced choicerdquo method Most of his article con-cerns the understandable fear of introducing a responsebias exactly what 2AFC was invented to avoid He has de-veloped an intelligent rule for what stimulus level shouldfollow a ldquodonrsquot knowrdquo response Kaernbach shows withboth simulations and real data that with his method thispotential response bias problem has only a second-ordereffect on threshold estimates (similar to the 2AFC inter-val bias discussed around Equation 7) and its advantagesoutweigh the disadvantages I urge anyone using the 2AFCmethod to read it I conjecture that Kaernbachrsquos method isespecially useful in trimming the lower tail of the distri-bution of threshold estimates and also in reducing the in-terval bias of 2AFC tasks since the low confidence ratingwould tend to be used for trials on which there is the mostuncertainty

Although Kaernbach has managed to convince me ofthe usefulness of his three-response method he might havetrouble convincing others because of the response biasquestion I should like to offer a solution that might quellthe response bias fears Rather than using 2AFC with threeresponses one can increase the number of responses tofourmdashH1 L1 L2 H2mdashas discussed two paragraphsabove The L rating can be used when subjects feel theyare guessing

To illustrate how this staircase would work considerthe case in which we would like the 2AFC staircase to con-verge to about 84 correct A 5 upndash1 down staircase(5 levels up for a wrong response and 1 level down for acorrect response) leads to an average probability correct

MEASURING THE PSYCHOMETRIC FUNCTION 1437

of 56 5 8333 If one had zero information about thetrial and guessed then one would be correct half the timeand the staircase would shift an average of (5 1)2 512 levels Thus in his scheme with a single ldquodonrsquot knowrdquoresponse Kaernbach would have the staircase move up2 levels following a ldquodonrsquot knowrdquo response One mightworry that continued use of the ldquodonrsquot knowrdquo responsewould continuously increase the level so that one wouldget an upward bias But Kaernbach points out that theldquodonrsquot knowrdquo response replaces incorrect responses to agreater extent than it replaces correct responses so thatthe 12 level shift is actually a more negative shift than the15 shift associated with a wrong response For my sug-gestion of four responses the step shifts would be ndash1 1113 and 15 levels for responses HC LC LI and HIwhere H and L stand for high and low confidence and Cand I stand for correct and incorrect A slightly more con-servative set of responses would be ndash1 0 14 15 inwhich case the low-confidence correct judgment leaves thelevel unchanged In all these examples including Kaern-bachrsquos three-response method the low-confidence re-sponse has an average level shift of 12 when one is guess-ing The most important feature of the low confidencerating is that when one is below threshold giving a lowconfidence rating will prevent one from long strings oflucky guesses that bring one to an inappropriately lowlevel This is the situation that can be difficult to overcomeif it occurs in the early phase of a staircase when the stepsize can be large The use of the confidence rating wouldbe especially important for short runs where the extra bitof information from the subject is useful for minimizingerroneous estimates of threshold

Another reason not to be fearful of the multiple-response2AFC method is that after the run is finished one can es-timate threshold by using a maximum likelihood proce-dure rather than by averaging the run locations Standardlikelihood analysis would have to be modified for Kaern-bachrsquos three-response method in order to deal with theldquodonrsquot knowrdquo category For the suggested four-responsemethod one could simply ignore the confidence ratings forthe likelihood analysis My simulations discussed earlierin this section and Greenrsquos (1990) reveal that averagingover the levels (excluding a number of initial levels toavoid the starting point bias) has about as good a preci-sion and accuracy as the best of the likelihood methodsIf the threshold estimates from the likelihood method andfrom the averaging levels method disagree that run couldbe thrown out

Summary of advantages of Brownian staircaseSince there is no substantial variance advantage to thefancier likelihood methods one can pay more attention toa number of advantages associated with simple staircases

1 At the beginning of a run likelihood methods mayhave too little inertia (the sequential stimulus levels jumparound easily) unless a ldquostiff rdquo prior is carefully chosenBy inertia I refer to how difficult it is for the next level todeviate from the present level At the beginning of a runone would like to familiarize the subject with the stimu-

lus by presenting a series of stimuli that gradually increasein difficulty This natural feature of the Brownian stair-case is usually missing in QUEST-like methods Theslightly inefficient first few trials of a simple staircasemay actually be a benefit since it gives the subject somepractice with slightly visible stimuli

2 Toward the end of a maximum likelihood run one canhave the opposite problem of excessive inertia wherebyit is difficult to have substantial shifts of test contrastThis can be a problem if nonstationary effects are presentFor example thresholds can increase in mid-run owingto fatigue or to adaptation if a suprathreshold pedestal ormask is present In an old-fashioned Brownian staircasewith moderate inertia a quick look at the staircase trackshows the change of threshold in mid-run That observa-tion can provide an important clue to possible problemswith the current procedures A method that reveals non-stationarity is especially important for difficult subjectssuch as infants or patients in whom fatigue or inatten-tion can be an important factor

3 Several researchers have told me that the simplicity ofthe staircase rules is itself an advantage It makes writingprograms easier It is nicer for teaching students It alsosimplifies the uncertain business of guessing likelihoodpriors and communicating the methodology when writ-ing articles

Methods That Can Be Used to Estimate Slope as Well as Threshold

There are good reasons why one might want to learnmore about the PF than just the single threshold valueEstimating the PF slope is the first step in this directionTo estimate slope one must test the PF at more than onepoint Methods are available to do this (Kaernbach 2001bKontsevich amp Tyler 1999) and one could even adapt sim-ple Brownian staircase procedures to give better slope es-timates with minimum harm to threshold estimates (seethe comments in Strasburger 2001b)

Probably the best method for getting both thresholds andslopes is the Y (Psi) method of Kontsevich and Tyler(1999) The Y method is not totally novel it borrows tech-niques from many previous researchersmdashas Kontsevichand Tyler point out in a delightful paragraph that both clar-ifies their method and states its ancestry This paragraphsummarizes what is probably the most accurate and preciseapproach to estimating thresholds and slope so I reproduceit here

To summarize the Y method is a combination of solutionsknown from the literature The method updates the poste-rior probability distribution across the sampled space ofthe psychometric functions based on Bayesrsquo rule (Hall1968 Watson amp Pelli 1983) The space of the psychome-tric functions is two-dimensional (King-Smith amp Rose1997 Watson amp Pelli 1983) Evaluation of the psycho-metric function is based on computing the mean of theposterior probability distribution (Emerson 1986 King-Smith et al 1994) The termination rule is based on thenumber of trials as the most practical option (King-Smith

1438 KLEIN

et al 1994 Watson amp Pelli 1983) The placement of eachnew trial is based on one-step ahead minimum search(King-Smith 1984) of the expected entropy cost function(Pelli 1987)

A simplified rough description of the Y trial placementfor 2AFC runs is that initially trials are placed at about the90 correct level to establish a solid threshold Oncethreshold has been roughly established trials are placed atabout the 90 and the 70 levels to pin down the slopeThe precise levels depend on the assumed PF shape

Before one starts measuring slopes however one mustfirst be convinced that knowledge of slope can be impor-tant It is so for several reasons

1 Parametric fitting procedures either require knowl-edge of slope b or must have data that adequately con-strain the slope estimate Part of the discussion that fol-lows will be on how to obtain reliable threshold estimateswhen slope is not well known A tight knowledge of slopecan minimize the error of the threshold estimate

2 Strasburger (2001a 2001b) gives good argumentsfor the desirability of knowing the shape of the PF He isinterested in differences between foveal and peripheralvisual processing and he uses the PF slope as an indica-tion of differences

3 Slopes are different for different stimuli and thiscan lead to misleading results if slope is ignored It mayhelp to describe my experience with the Modelfest pro-ject (Carney et al 2000) to make this issue concreteThis continuing project involves a dozen laboratoriesaround the world We have measured detection thresh-olds for a wide variety of visual stimuli The data areused for developing an improved model of spatial visionSome of the stimuli were simple easily memorized lowspatial frequency patterns that tend to have shallowerslopes because a stimulus-known-exactly template canbe used efficiently and with minimal uncertainty On theother hand tiny Modelfest stimuli can have a good dealof spatial uncertainty and are therefore expected to pro-duce PFs with steeper slopes Because of the slope changethe pattern of thresholds will depend on the level atwhich threshold is measured When this rich dataset isanalyzed with f ilter models that make estimates formechanism bandwidths and mechanism pooling thesedifferent slope-dependent threshold patterns will lead todifferent estimates for the model parameters Several ofus in the Modelfest data gathering group argued that weshould use a data-gathering methodology that wouldallow estimation of PF slope of each of the 45 stimuliHowever I must admit that when I was a subject my pa-tience wore thin and I too chose the quicker runs Al-though I fully understand the attraction of short runs fo-cused on thresholds and ignoring slope I still think thereare good reasons for spreading trials out For one thinginterspersing trials that are slightly visible helps the sub-ject maintain memory of the test pattern

4 Slopes can directly resolve differences between mod-els In my own research for example my colleagues and

I are able to use the PF slope to distinguish long-range fa-cilitation that produces a low-level excitation or noise re-duction (typical with a cross-orientation surround) from ahigher level uncertainty reduction (typical with a same-orientation surround)

Throwing-Out Rules Goodness-of-FitThere are occasions when a staircase goes sour Most

often this happens when the subjectrsquos sensitivity changesin the middle of a run possibly because of fatigue It istherefore good to have a well-specified rule that allowsnonstationary runs to be thrown out The throwing-out rulecould be based on whether the PF slope estimate is toolow or too high Alternatively a standard approach is tolook at the chi-square goodness-of-f it statistic whichwould involve an assumption about the shape of the un-derlying PF I have a commentary in Section III regard-ing biases in the chi-square metric that were found byWichmann and Hill (2001a) Biases make it difficult touse standard chi-square tables The trouble with this ap-proach of basing the goodness-of-fit just on the cumulateddata is that it ignores the sequential history of the run TheBrownian upndashdown staircase makes the sequential historyvividly visible with a plot of sequentially tested levelsWhen fatigue or adaptation sets in one sees the tested lev-els shift from one stable point to another Leek Hanna andMarshall (1991) developed a method of assessing the non-stationarity of a psychometric function over the course ofits measurement using two interleaved tracks EarlierHall (1983) also suggested a technique to address the sameproblem I encourage further research into the develop-ment of a non-stationarity statistic that can be used to dis-card bad runs This statistic would take the sequential his-tory into account as well as the PF shape and the chi-square

The development of a ldquothrowing-outrdquo rule is the ex-treme case of the notion of weighted averaging One im-portant reason for developing good estimators for good-ness-of-fit and good estimators for threshold variance isto be able to optimally combine thresholds from repeatedexperiments I shall return to this issue in connectionwith the Wichmann and Hill (2001a) analysis of bias ingoodness-of-fit metrics

Final Thought The Method Should Depend on the Situation

Also important is that the method chosen should be ap-propriate for the task For example suppose one is usingwell-experienced observers for testing and retesting sim-ilar conditions in which one is looking for small changesin threshold or one is interested in the PF shape In suchcases one has a good estimate of the expected thresholdand the signal detection method of constant stimuli withblanks and with ratings might be best On the other handfor clinical testing in which one is highly uncertain aboutan individualrsquos threshold and criterion stability a 2AFClikelihood or staircase method (with a confidence rating)might be best

MEASURING THE PSYCHOMETRIC FUNCTION 1439

III DATA-FITTING METHODS FOR ESTIMATING THRESHOLD AND SLOPEPLUS GOODNESS-OF-FIT ESTIMATION

This final section of the paper is concerned with whatto do with the data after they have been collected Wehave here the standard Bayesian situation Given the PFit is easy to generate samples of data with confidence in-tervals But we have the inverse situation in which we aregiven the data and need to estimate properties of the PFand get confidence limits for those estimates The pair ofarticles in this special issue by Wichmann and Hill (2001a2001b) as well as the articles by King-Smith et al (1994)Treutwein (1995) and Treutwein and Strasburger (1999)do an outstanding job of introducing the issues and providemany important insights For example Emerson (1986)and King-Smith et al emphasize the advantages of meanlikelihood over maximum likelihood The speed of mod-ern computers makes it just as easy to calculate meanlikelihood as it is to calculate maximum likelihood As an-other example Treutwein and Strasburger (1999) demon-strate the power of using Bayesian priors for estimatingparameters by showing how one can estimate all four pa-rameters of Equation 1A in fewer than 100 trials This im-portant finding deserves further examination

In this section I will focus on four items First I will dis-cuss the relative merits of parametric versus nonparamet-ric methods for estimating the PF properties The paperby Miller and Ulrich (2001) provides a detailed analysis ofa nonparametric method for obtaining all the moments ofthe PF I will discuss some advantages and limitations oftheir method Second is the question of goodness-of-fitWichmann and Hill (2001a) show that when the numberof trials is small the goodness-of-fit can be very biasedI will examine the cause of that bias Third is the issue dis-cussed by Wichmann and Hill (2001a) that of how lapseseffect slope estimates This topic is relevant to Strasburg-errsquos (2001b) data and analysis Finally there is the possiblycontroversial topic of an upward slope bias in adaptivemethods that is relevant to the papers of Kaernbach (2001b)and Strasburger (2001b)

Parametric Versus Nonparametric FitsThe main split between methods for estimating pa-

rameters of the psychometric function is the distinctionbetween parametric and nonparametric methods In aparametric method one uses a PF that involves severalparameters and uses the data to estimate the parametersFor example one might know the shape of the PF and es-timate just the threshold parameter McKee et al (1985)show how slope uncertainty affects threshold uncertaintyin probit analysis The two papers by Wichmann and Hill(2001a 2001b) are an excellent introduction to many is-sues in parametric fitting An example of a nonparamet-ric method for estimating threshold would be to averagethe levels of a Brownian staircase after excluding the firstfour or so reversals in order to avoid the starting level biasBefore I served as guest editor of this special symposium

issue I believed that if one had prior knowledge of the exactPF shape a parametric method for estimating threshold wasprobably better than nonparametric methods After read-ing the articles presented here as well as carrying out myrecent simulations I no longer believe this I think thatin the proper situation nonparametric methods are as goodas parametric methods Furthermore there are situationsin which the shape of the PF is unknown and nonparamet-ric methods can be better

Miller and Ulrichrsquos Article on the Nonparametric SpearmanndashKaumlrber Method of Analysis

The standard way to extract information from psycho-metric data is to assume a functional shape such as a Wei-bull cumulative normal logit dcent and then vary the pa-rameters until likelihood is maximized or chi-square isminimized Miller and Ulrich (2001) propose using theSpearmanndashKaumlrber (SK) nonparametric method whichdoes not require a functional shape They start with a prob-ability density function defined by them as the derivativeof the PF

(33)

where s(i) is the stimulus intensity at the ith level andP(i) is the probability correct at the that level The notationpdf(i) is meant to indicate that it is the pdf for the intervalbetween i ndash 1 and i Miller and Ulrich give the r th momentof this distribution as their Equation 3

(34)

The next four equations are my attempt to clarify their ap-proach I do this because when applicable it is a finemethod that deserves more attention Equation 34 prob-ably came from the mean value theorem of calculus

(35)

where f (s) 5 sr11 and f cent(smid) 5 (r 1 1)smidr is the deriv-

ative of f with respect to s somewhere within the intervalbetween s(i 1) and s(i) For a slowly varying functionsmid would be near the midpoint Given Equation35 Equa-tion 34 becomes

(36)

where Ds 5 s(i) s(i 1) Equation 36 is what one wouldmean by the rth moment of the distribution The case r 50 is important since it gives the area under the pdf

(37)

by using the definition in Equation 33 The summation inEquation 37 gives the area under the pdf For it to be aproper pdf its area must be unity which means that theprobability at the highest level P(imax) must be unity andthe probability at the lowest level Pmin must be zero asMiller and Ulrich (2001) point out

m0 = =aringpdf( ) ( ) ( ) max mini s P i P ii

D

mrr

i

i s s= aring pdf mid( ) D

f s i f s i f s s i s i( ) ( ) ( ) ( ) ( ) [ ] [ ] = cent [ ]1 1mid

mrr r

ri s i s i=

+ [ ]aring + +11

11 1pdf( ) ( ) ( )

pdf( )( ) ( )

( ) ( )i

P i P i

s i s i= 1

1

1440 KLEIN

The next simplest SK moment is for r 5 1

(38)

from Equation 33 This first moment provides an esti-mate of the centroid of the distribution corresponding tothe center of the psychometric function It could be usedas an estimate of threshold for detection or PSE for dis-crimination An alternate estimator would be the medianof the pdf corresponding to the 50 point of the PF (theproper definition of PSE for discrimination)

Miller and Ulrich (2001) use Monte Carlo simulationsto claim that the SK method does a better job of extract-ing properties of the psychometric function than do para-metric methods One must be a bit careful here since intheir parametric fitting they generate the data by usingone PF and then fit it with other PFs But still it is a sur-prising claim if this simple method is so good why havepeople not been using it for years There are two possibleanswers to this question One is that people never thoughtof using it before Miller and Ulrich The other possibil-ity is that it has problems I have concluded that the an-swer is a combination of both factors I think that there isan important place for the SK approach but its limitationsmust be understood That is my next topic

The most important limitation of the SK nonparamet-ric method is that it has only been shown to work well formethod of constant stimulus data in which the PF goesfrom 0 to 1 I question whether it can be applied with thesame confidence to 2AFC and to adaptive procedures withunequal numbers of trials at each level The reason for thislimitation is that the SK method does not offer a simpleway to weight the different samples Let us consider twosituations with unequal weighting adaptive methods and2AFC detection tasks

Adaptive methods place trials near threshold but withvery uneven distribution of number of trials at differentlevels The SK method gives equal weighting to sparselypopulated levels that can be highly variable thus con-tributing to a greater variance of parameter estimates Letus illustrate this point for an extreme case with five levelss 5 [1 2 3 4 5] Suppose the adaptive method placed [11 1 98 1] trials at the five levels and the number correctwere [0 1 1 49 1] giving probabilities [0 1 1 5 1] andpdf 5 [1 0 5 5] The following PEST-like (Taylor ampCreelman 1967) rules could lead to this unusual dataMove down a level if the presently tested level has greaterthan P1 correct move up three levels if the presentlytested level has less than P2 correct P1 can be anynumber less than 33 and P2 can be any number greaterthan 67 The example data would be produced by start-ing at Level 5 and getting four in a row correct followedby an error followed by a wide set possible of alternat-ing responses for which the PEST rules would leave thefourth level as the level to be tested The SK estimate of the

center of the PF is given by Equation 38 to be micro1 5 200Since a pdf with a negative probability is distasteful tosome one can monotonize the PF according to the proce-dure of Miller and Ulrich (which does include the numberof trials at each level) The new probabilities are [0 5151 51 1] with pdf 5 [51 0 0 49] The new estimate ofthreshold is micro1 5 297 However given the data with 4998correct at Level 4 an estimate that includes a weightingaccording to the number of trials at each level would placethe centroid of the distribution very close to micro1 5 4 Thuswe see that because the SK procedure ignores the numberof trials at each level it produces erroneous estimates ofthe PF properties This example with shifting thresholdswas chosen to dramatize the problems of unequal weight-ing and the effect of monotonizing

A similar problem can occur even with the method ofconstant stimuli with equal number of trials at each levelbut with unequal binomial weighting as occurs in 2AFCdata The unequal weighting occurs because of the 50lower asymptote The binomial statistics error bar of dataat 55 correct is much larger than for data at 95 cor-rect Parametric procedures such as probit analysis giveless weighting to the lower data This is why the most ef-ficient placement of trials in 2AFC is above 90 correctas I have discussed in Section II Thus it is expected thatfor 2AFC the SK method would give threshold estimatesless precise than the estimates given by a parametricmethod that includes weighting

I originally thought that the SK method would fail on alltypes of 2AFC data However in the discussion followingEquation 7 I pointed out that for 2AFC discriminationtasks there is a way of extending the data to the 0ndash100range that produces weightings similar to those for a yesnotask Rather than use the standard Equation 2 to deal withthe 50 lower asymptote one should use the method dis-cussed in connection with Figure 3 in which one can usefor the ordinate the percent of correct responses in Inter-val 2 and use the signal strength in Interval 2 minus thatin Interval 1 for the abscissa That procedure produces aPF that goes from 0 to 100 thereby making it suit-able for SK analysis while also removing a possible bias

Now let us examine the SK method as applied to datafor which it is expected to work such as discriminationPFs that go from 0 to 100 These are tasks in which thelocation of the psychometric function is the PSE and theslope is the threshold

A potentially important claim by Miller and Ulrich(2001) is that the SK method performs better than probitanalysis This is a surprising claim since their data aregenerated by a probit PF one would expect probit analy-sis to do well especially since probit analysis does espe-cially well with discrimination PFs that go from 0 to 100To help clarify this issue Miller and I did a number ofsimulations We focused on the case of N 5 40 with 8 tri-als at each of 5 levels The cumulative normal PF (Equa-tion 14) was used with equally spaced z scores and with thelowest and highest probabilities being 1 and 99 Theprobabilities at the sample points are P(i) 5 1 1222

m1

2 21

2

11

2

=

= [ ] +

aring

aring

pdf( )( ) ( )

( ) ( )( ) ( )

is i s i

P i P is i s i

MEASURING THE PSYCHOMETRIC FUNCTION 1441

50 8778 and 99 corresponding to z scores ofz(i) 5 ndash2328 1164 0 1164 and 2328 Nonmonot-onicity is eliminated as Miller and Ulrich suggest (seethe Appendix for Matlab Code 4 for monotonizing) I didMonte Carlo simulations to obtain the standard deviationof the location of the PF micro1 given by Equation38 I foundthat both the SK method and the parametric probit method(assuming a fixed unity slope) gave a standard deviationof between 028 and 029 Miller and Ulrichrsquos result re-ported in their Table 17 gives a standard error of 0071However their Gaussian had a standard deviation of 025whereas my z-score units correspond to a standard devi-ation of unity Thus their standard error must be multipliedby 4 giving a standard error of 0284 in excellent agree-ment with my value So at least in this instance the SKmethod did not do better than the parametric method (withslope fixed) and the surprise mentioned at the beginningof this paragraph did not materialize I did not examineother examples where nonparametric methods might dobetter than the matched parametric method However thesimulations of McKee et al (1985) give some insight intowhy probit analysis with floating slope can do poorly evenon data generated from probit PFs McKee et al showthat when the number of trials is low and if stimulus lev-els near both 0 and 100 are not tested the slope can bevery flat and the predicted threshold can be quite distantIf care is not taken to place a bound on these probit esti-mates one would expect to find deviant threshold esti-mates The SK method on the other hand has built-inbounds for threshold estimates so outliers are avoidedThese bounds could explain why SK can do better thanprobit When the PF slope is uncertain nonparametricmethods might work better than parametric methods adecided advantage for the SK method

It is instructive to also do an analytic calculation of thevariance based on an analysis similar to what was doneearlier If only the ith level of the five had been testedthe variance of the PF location (the PSE in a discrimina-tion task) would have been as follows (Gourevitch ampGalanter 1967)

(39)

where var(P) is given by Equation 27 and PF slope is

(40)

where for the cumulative normal dzi dxi 5 1s where sis the standard deviation of the pdf and

(41)

from Equation 3A By combining these equations one gets

(42)

Note that if all trials were placed at the optimal stimuluslocation of z 5 0 (P 5 5) Equation 42 would becomes2p 2Ni as discussed earlier When all five levels arecombined the total variance is smaller than each individ-ual variance as given by the variance summation formula

(43)

Finally upon plugging in the five values of zi rangingfrom ndash2328 to 12328 and Pi ranging from 1 to 99the standard error of the PF location is

(44)

This value is precisely the value obtained in our MonteCarlo simulations of the nonparametric SK method andalso by the slope fixed parametric probit analysis Thistriple confirmation shows that the SK method is excel-lent in the realm where it applies (equal number of trialsat levels that span 0 to 100) It also confirms our sim-ple analytic formula

The data presented in Miller and Ulrichrsquos Table 17 giveus a fine opportunity to ask about the efficiency of themethod of constant stimuli that they use For the N 5 40example discussed here the variance is

(45)

This value is more than twice the variance of p (2 Ntot)that is obtained for the same PF using adaptive methodsincluding the simple upndashdown staircase that place trialsnear the optimal point

The large variance for the Miller and Ulrich stimulusplacement is not surprising given that of the five stimu-lus levels the two at P 5 1 and 99 are quite inefficientThe inefficient placement of trials is a common problemfor the method of constant stimuli as opposed to adaptivemethods However as I mentioned earlier the method ofconstant stimuli combined with multiple response ratingsin a signal detection framework may be the best of allmethods for detection studies for revealing properties of thesystem near threshold while maintaining a high efficiency

A curious question comes up about the optimal placementof trials for measuring the location of the psychometric func-tion With the original function the optimal placement is atthe steepest part of the function (at P 5 0) However withthe SK method when one is determining the location ofthe pdf one might think that the optimal placement of sam-ples should occur at the points where the pdf is steepestThis contradiction is resolved when one realizes that onlythe original samples are independent The pdf samples areobtained by taking differences of adjacent PF values soneighboring samples are anticorrelated Thus one cannot

var tottot

= =SEN

2 3 24

SE = =var tot12 0 2844

var var tot = aring1 1i

var( )exp

PSE i i i

i

i

P Pz

N= ( ) ( )

2 12

2

ps

dP

dz

zi

i

iaeligegraveccedil

oumloslashdivide

=( )2 2

2

exp

p

dP

dx

dP

dz

dz

dxi

i

i

i

i

i

= times

var( )var

PSE i

i

i

i

P

dP

dx

=( )

aeligegraveccedil

oumloslashdivide

2

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

MEASURING THE PSYCHOMETRIC FUNCTION 1423

The Connection Between Probability and z-ScoreThe Gaussian distribution and its integral the cumula-

tive normal function play a fundamental role in many ap-proaches to PFs The cumulative normal function F(z)is the function that connects the z-score (z) to probability(prob)

(3A)

The function F(z) and its inverse are available in programssuch as Excel but not in Matlab For Matlab one must use

(3B)

where the error function

is a standard Matlab function I usually check that F( 1) 501587 F(0) 5 05 and F(1) 5 08413 in order to be surethat I am using erf properly The inverse cumulative nor-mal function used to go from prob to z is given by

(4)

Equations 3 and 4 do not specify whether one usesprob 5 p or prob 5 P (see Equation 1 for the distinctionbetween the two) In this commentary both definitionswill be used with the choice depending on how one doesthe correction for guessingmdashour next topic The choiceprob 5 p(x) means that a cumulative normal function isbeing used for the underlying PF and that the correctionfor guessing is done as in Equations 1 and 2 On the otherhand in a yesno task if we choose prob 5 P(x) thenone is doing the z-score transform of the PF data beforethe correction for guessing As is clarified in the nextsection this procedure is a signal detection ldquocorrectionfor guessingrdquo and the PF will be called the dcent functionThe distinction between the two methods for correctionfor guessing has generated confusion and is partly re-sponsible for why many researchers do not appreciatethe simple connection between the PF and dcent

The z-Score Correction for Bias and Signal Detection Theory YesNo

Equation 2 is a common approach to correction forguessing in both yesno and forced choice tasks and itwill be found in many of the articles in this special issueIn a yesno method for detection the lower asymptoteP(0) 5 g is the false alarm rate the probability of say-ing ldquoyesrdquo when a blank stimulus is present In the past oneinstructed subjects to keep g low A few blank catch tri-als were included to encourage subjects to maintain theirlow false alarm rate Today the correction is done using az-score ordinate and it is now called the correction for re-sponse bias or simply the bias correction One uses as

many trials at the zero level as at other levels one encour-ages more false alarms and the framework is called signaldetection theory

The z-score (dcent ) correction for bias provides a direct butnot well appreciated connection between the yesno psy-chometric function and the signal detection dcent The two stepsare as follows (1) Convert the percent correct P(x) to zscores using Equation 4 (2) Do the correction for bias bychoosing the zero point of the ordinate to be the z scorefor the point x 5 0 Finally give the ordinate the name dcentThis procedure can be written as

(5)

For a detection task z(x) is called the z score of the hit rateand z(0) is called the z score of the lower asymptote (g) orthe false alarm rate It is the false alarm rate because thestimulus at x 5 0 is the blank stimulus In order to be ableto do this correction for bias accurately one must put asmany trials at x 5 0 as one puts at the other levels Notethe similarity of Equations 2 and 5 The main differencebetween Equations 2 and 5 is whether one makes the cor-rection in probability or in z score In order to distinguishthis approach from the older yesno approach associatedwith high threshold theory it is often called the objectiveyesno method where ldquoobjectiverdquo means that the responsebias correction of Equation 5 is used

Figure 1 and its associated Matlab Code 1 in the Appen-dix illustrates the process represented in Equation2 Panel ais a Weibull PF on a linear abscissa to be introduced inEquation 12 Panel b is the z score of panel a The lowerasymptote is at z 5 1 corresponding to P 5 1587For now the only important point is that in panel b if onemeasures the curve from the bottom of the plot (z 5 1)then the ordinate becomes dcent because dcent 5 z z(0) 5z 1 1 Panels d and e are the same as panels a and b ex-cept that instead of a linear abscissa they have natural logabscissas More will be said about these figures later

I am ignoring for now the interesting question of whathappens to the shape of the psychometric function as onechanges the false alarm rate g If one uses multiple rat-ings rather than the binary yesno response one ends upwith M 1 PFs for M rating categories and each PF hasa different g For simplicity this paper assumes a unityROC slope which guarantees that the dcent function is inde-pendent of g The ROC slopes can be measured using anobjective yes no method as mentioned in Section II in thelist of advantages of the yesno method over the forcedchoice method

I bring up the dcent function (Equation 5) and signal de-tection theory at the very beginning of this commentarybecause it is an excellent methodology for measuringthresholds efficiently it can easily be extended to thesuprathreshold regime (it does not saturate at P 5 1) andit has a solid theoretical underpinning Yet it is barelymentioned in any of the articles in this issue So thereader needs to keep in mind that there is an alternativeapproach to PFs I would strongly recommend the bookDetection Theory A Userrsquos Guide (Macmillan amp Creel-

cent =d x z x z( ) ( ) ( ) 0

z = =F 1 2 2 1(prob erfinv prob) ( )

erf )( exp x dy yx

= ( )ograve2 0 5 2

0

p

proberf

= =+ aelig

egraveccediloumloslashdivide

F( ) z

z12

2

prob = =aelig

egraveccedil

ouml

oslashdivide

yenograveF( ) ( ) exp z dy

yz

22

0 52

p

1424 KLEIN

man 1991) for anyone interested in the signal detectionapproach to psychophysical experiments (including theeffect of nonunity ROC slopes)

The z-Score Correction for Bias and Signal Detection Theory 2AFC

One might have thought that for 2AFC the connectionbetween the PF and dcent is well establishedmdashnamely (Greenamp Swets 1966)

(6)

where z(x) is the z score of the average of P1(x) for cor-rect judgments in Interval 1 and P2(x) for correct judg-ments in Interval 2 Typically this average P(x) is calcu-

lated by dividing the total number of correct trials by thetotal number of trials It is generally assumed that the2AFC procedure eliminates the effect of response bias onthreshold However in this section I will argue that the dcentas defined in Equation 6 is affected by an interval biaswhen one interval is selected more than the other

I have been thinking a lot about response bias in 2AFCtasks because of my recent experience as a subject in atemporal 2AFC contrast discrimination study where con-trast is defined as the change in luminance divided bythe background luminance In these experiments I noticedthat I had a strong tendency to choose the second intervalmore than the first The second interval typically appearssubjectively to be about 5 higher in contrast than it re-ally is Whether this is a perceptual effect because the in-

cent =d x z x( ) ( ) 2

0 1 2 30

05

1(a)

Wei

bull

with

bet

a =

2

0 1 2 32

0

2

4(b)

z-sc

ore

of W

eibu

ll (d

cent 1

)

0 1 2 305

1

15

2(c)

logl

og s

lope

of d

cent

stimulus in threshold units

2 1 0 10

05

1(d)

2 1 0 12

0

2

4(e)

2 1 0 105

1

15

2

stimulus in natural log units

(f)

dcent o

f Wei

bull

1

3

0

z-sc

ore

of W

eibu

ll (d

cent 1

)lo

glog

slo

pe o

f dcent

Wei

bull

with

bet

a =

2

dcent o

f Wei

bull

1

3

0

Figure 1 Six views of the Weibull function Pweibull 5 1 (1 g )exp( x tb ) where g 5 01587 b 5 2 and xt is the stimulus strength in

threshold units Panels andashc have a linear abscissa with xt 5 1 being the threshold Panels dndashf have a natural log abscissa with yt 5 0being the threshold In panels a and d the ordinate is probability The asterisk in panel d at yt 5 0 is the point of maximum slope on alogarithmic abscissa In panels b and e the ordinate is the z score of panels a and d The lower asymptote is z 5 1 If the ordinate isredefined so that the origin is at the lower asymptote the new ordinate shown on the right of panels b and e is dcent(xt) 5 z(xt) z(0) cor-responding to the signal detection dcent for an objective yesno task In panels c and f the ordinate is the logndashlog slope of dcent At xt 5 0 thelogndashlog slope 5 b The logndashlog slope falls rapidly as the stimulus strength approaches threshold The Matlab program that generated thisfigure is Appendix Code 1

MEASURING THE PSYCHOMETRIC FUNCTION 1425

tervals are too close together in time (800 msec) or a cog-nitive effect does not matter for the present article Whatdoes matter is that this bias produces a downward bias indcent With feedback lots of practice and lots of experiencebeing a subject I was able to reduce this interval bias andequalize the number of times I responded with each in-terval Naiumlve subjects may have a more difficult time re-ducing the bias In this section I show that there is a verysimple method for removing the interval bias by con-verting the 2AFC data to a PF that goes from 0 to 100

The recognition of bias in 2AFC is not new Green andSwets (1966) in their Appendix III34 point out that thebias in choice of interval does result in a downward biasin dcent However they imply that the effect of this bias issmall and can typically be ignored I should like to ques-tion that implication by using one of their own examplesto show that the bias can be substantial

In a run of 200 trials the Green and Swets example(Green amp Swets 1966 p 410) has 95 out of 100 correctwhen the test stimulus is in the second interval (z2 51645) and 50 out of 100 correct (z1 5 0) when the stim-ulus is in the first interval The standard 2AFC way to an-alyze these data would be to average the probabilities(95 1 50) 2 5 725 correct (zcorrect 5 0598) cor-responding to dcent 5 zIuml2 5 0845 However Green andSwets (p 410) point out that according to signal detec-tion theory one should analyze this data by averaging thez scores rather than averaging the probabilities or

(7)

The ratio between these two ways of calculating dcent is11630845 5 1376 Since dcent is approximately linearlyrelated to signal strength in discrimination tasks this 38reduction in dcent corresponds to an erroneous 38 increasein predicted contrast discrimination threshold when onecalculates threshold the standard way Note that if therehad been no bias so that the responses would be approx-imately equally divided across the two intervals then z2z1 and Equation 7 would be identical to the more familiarEquation 6 Since bias is fairly common especially amongnew observers the use of Equation 7 to calculate dcent seemsmuch more reasonable than using Equation 6 It is surpris-ing that the bias correction in Equation 7 is rarely used

Green and Swets (1966) present a different analysis In-stead of comparing dcent values for the biased versus nonbi-ased conditions they convert the dcents back to percent cor-rect The corrected percent correct (corresponding to dcent 51163) is 795 In terms of percent correct the bias seemsto be a small effect shifting percent correct a mere 7 from725 to 795 However the dcent ratio of 138 is a bettermeasure of the bias since it is directly related to the errorin discrimination threshold estimates

A further comment on the magnitude of the bias may beuseful The preceding subsection discussed the criterionbias of yesno methods which contributes linearly to dcent

(Equation 5) The present section discusses the 2AFC in-terval bias that contributes quadratically to dcent Thus forsmall amounts of bias the decrease in dcent is negligibleHowever as I have pointed out the bias can be large enoughto make a significant contribution to dcent

It is instructive to view the bias correction for the fullPF corresponding to this example Cumulative normalPFs are shown in the upper left panel of Figure 2 Thecurves labeled C1 and C2 are the probability correct forthe first and second intervals I1 and I2 The asterisks cor-respond to the stimulus strength used in this example at50 and 95 correct The dotndashdashed line is the aver-age of the two PFs for the individual intervals The slopeof the PF is set by the dotndashdashed line (the averageddata) being at 50 (the lower asymptote) for zero stim-ulus strength The lower left panel is the z-score version ofthe upper panel The dashed line is the average of the twosolid lines for z scores in I1 and I2 This is the signal de-tection method of averaging that Green and Swets (1966)present as the proper way to do the averaging The dotndashdashed line shown in the upper left panel is the z score ofthe average probability Notice that the z score for the av-eraged probability is lower than the averaged z score in-dicating a downward bias in dcent due to the interval bias asdiscussed at the beginning of this section The right pairof panels are the same as the left pair except that insteadof plotting C1 we plot 1 C1 the probability of re-sponding I2 incorrectly The dashed line is half the dif-ference between the two solid lines The other differenceis that in the lower right panel we have multiplied thedashed and dashndashdotted lines by Iuml2 so that these linesare dcent values rather than z scores

The final step in dealing with the 2AFC bias is to flipthe 1 C1 curve horizontally to negative abscissa valuesas in Figure 3 The ordinate is still the probability correctin interval 2 The abscissa becomes the difference instimulus strengths between I2 and I1 The flipped branchis the probability of responding I2 when the I2 stimulusstrength is less than that of I1 (an incorrect response)Figure 3a with the ordinate going from 0 to 100 is theproper way to represent 2AFC discrimination data Theother item that I have changed is the scale on the abscissato show what might happen in a real experiment The or-dinate values of 50 and 95 for the Green and Swets(1966) example have been placed at a contrast differenceof 5 The negative 5 value corresponds to the case inwhich the positive test pattern is in the f irst intervalThreshold corresponds to the inverse of the PF slopeThe bottom panel shows the standard signal detectionrepresentation of the signal in I1 and I2 dcent is the distancebetween these symmetric stimuli in standard deviationunits The Gaussians are centered at 65 The verticalline at 5 is the criterion such that 50 and 95 ofthe area of the two Gaussians is above the criterion Thez-score difference of 1645 between the two Gaussiansmust be divided by Iuml2 to get dcent because each trial hadtwo stimulus presentations with independent informa-

cent =+( )

= =dz z

22

1 6452

1 1632 1

1426 KLEIN

tion for the judgment This procedure identical to Equa-tion 7 gives the same dcent as before

Three Distinctions for Clarifying PFsIn dealing with PFs three distinctions need to be made

yesno versus forced choice detection versus discrimi-nation and constant stimuli versus adaptive methodsThese distinctions are usually clear but I should like topoint out some subtleties

For the forced choice versus yesno distinction thereare two sorts of forced choice tasks The standard versionhas multiple intervals separated spatially or temporallyand the stimulus is in only one of the intervals In the otherversion one of N stimuli is shown and the observer re-

sponds with a number from 1 to N For example Strasburg-er (2001b) presented 1 of 10 letters to the observer in a10AFC task Yesno tasks have some similarity to the lattertype of forced choice task Consider for example a detec-tion experiment in which one of five contrasts (includinga blank) are presented to the observer and the observer re-sponds with numbers from 1 to 5 This would be classifiedas a rating scale method of constant stimuli yesno tasksince only a single stimulus is presented and the rating isbased on a one-dimensional intensity

The detectiondiscrimination distinction is usually basedon whether the reference stimulus is a natural zero pointFor example suppose the task is to detect a high spatial fre-quency test pattern added to a spatially identical reference

0 1 2 30

2

4

6

8

1

C1C2

average

(a)

prob

abili

ty c

orre

ct

0 1 2 30

2

4

6

8

1

1 C1

C2

diff2

(b)

prob

abili

ty o

f I2

resp

onse

0 1 2 31

0

1

2

3

4(c)

C1

C2

z-average

stimulus strength

z-sc

ore

corr

ect

0 1 2 31

0

1

2

3

4

1 C1

C2

biased d cent

(d)

unbiased d cent

stimulus strength

z-sc

ore

for

I2 r

espo

nse

average prob

Figure 2 2AFC psychometric functions with a strong bias in favor of responding Interval 2 (I2) The bias is chosen from aspecific example presented by Green and Swets (1966) such that the observer has 50 and 95 correct when the test is in I1and I2 respectively These points are marked by asterisks The psychometric function being plotted is a cumulative normal Inall panels the abscissa is xt the stimulus strength (a) The psychometric functions for probability correct in I1 and I2 are shownand labeled C1 and C2 The average of the two probabilities labeled average is the dotndashdashed line it is the curve that is usu-ally reported The diamond is at 725 the average percent correct of the two asterisks (b) Same as panel a except that insteadof showing C1 we show 1ndashC1 the probability of saying I2 when the test was in I1 The abscissa is now labeled ldquoprobability ofI2 responserdquo (c) z scores of the three probabilities in panel a An additional dashed line is shown that is the average of the C1and C2 z-score curves The diamond is the z-score of the diamond in panel a and the star is the z-score average of the two panel casterisks (d) The sign of the C2 curve in panel c is flipped to correspond to panel b The dashed and dotndashdashed lines of panel chave been multiplied by Iuml2 in panel d so that they become dcent

MEASURING THE PSYCHOMETRIC FUNCTION 1427

pattern If the reference pattern has zero contrast the taskis detection If the reference pattern has a high contrast thetask is discrimination Klein (1985) discusses these tasks interms of monopolar and bipolar cues For discriminationa bipolar cue must be available whereby the test pattern canbe either positive or negative in relation to the reference Ifone cannot discriminate the negative cue from the positivethen it can be called a detection task

Finally the constant stimuli versus adaptive method dis-tinction is based on the formerrsquos having preassigned testlevels and the latterrsquos having levels that shift to a desiredplacement The output of the constant stimulus method is

a full PF and is thus fully entitled to be included in this spe-cial issue The output of adaptive methods is typically onlya single number the threshold specifying the location butnot the shape of the PF Two of the papers in this issue(Kaernbach 2001b Strasburger 2001b) explore the pos-sibility of also extracting the slope from adaptive data thatconcentrates trials around one level Even though adaptivemethods do not measure much about the PF they are sopopular that they are well represented in this special issue

The 10 rows of Table 1 present the articles in this specialissue (including the present article) Columns 2ndash5 corre-spond to the four categories associated with the first two

Figure 3 2AFC discrimination PF from Figure 2 has been extended to the 0 to 100 range without rescaling the or-dinate In panels a and b the two right-hand panels of Figure 2 have been modified by flipping the sign of the abscissa ofthe negative slope branch where the test is in Interval 1 (I1) The new abscissa is now the stimulus strength in I2 minus thestrength in I1 The abscissa scaling has been modified to be in stimulus units In this example the test stimulus has a con-trast of 5 The ordinate in panel a is the probability that the response is I2 The ordinate in panel b is the z score of thepanel a ordinate The asterisks are at the same points as in Figure 2 Panel c is the standard signal detection picture whennoise is added to the signal The abscissa has been modified to being the activity in I2 minus the activity in I1 The unitsof activation have arbitrarily been chosen to be the same as the units of panels a and b Activity distributions are shownfor stimuli of 5 and 15 units corresponding to the asterisks of panels a and b The subjectrsquos criterion is at 5 units ofactivation The upper abscissa is in z-score units where the Gaussians have unit variance The area under the two distri-butions above the criterion are 50 and 95 in agreement with the probabilities shown in panel a

15 10 5 0 5 10 150

5

1

prob

abili

ty o

f res

pons

e =

2

(a)

15 10 5 0 5 10 152

0

2

4

z-sc

ore

of r

espo

nse

= 2

(b)

contrast of I2 minus contrast of I1

15 10 5 0 5 10 150

05

1

activity in I2 minus activity in I1

responsewhen test in I1

(c)

response when test in I2

00822 0822z-score of activity in I2 minus activity in I1 (unit variance Gaussians)

d cent = 1645

crite

rion

16451645

resp

onse

1428 KLEIN

distinctions The last column classifies the articles accord-ing to the adaptive versus constant stimuli distinction Inorder to open up the full variety of PFs for discussion andto enable a deeper understanding of the relationship amongdifferent types of PFs I will now clarify the interrelation-ship of the various categories of PFs and their connectionto the signal detection approach

Lack of adaptive method for yesno tasks with con-trolled false alarm rate In Section II I will bring up anumber of advantages of the yesno task in comparisonwith to the 2AFC task (see also Kaernbach 1990) Giventhose yesno advantages one may wonder why the 2AFCmethod is so popular The usual answer is that 2AFC hasno response bias However as has been discussed the yesno method allows an unbiased dcent to be calculated and the2AFC method does allow an interval bias that affects dcentAnother reason for the prevalence of 2AFC experimentsis that a multitude of adaptive methods are available for2AFC but barely any available for an objective yesnotask in which the false alarm rate is measured so that dcentcan be calculated In Table 1 with one exception the rowswith adaptive methods are associated with the forcedchoice method The exception is Leek (2001) who dis-cusses adaptive yesno methods in which no blank trialsare presented In that case the false alarm rate is notmeasured so dcent cannot be calculated This type of yesnomethod does not belong in the ldquoobjectiverdquo category of con-cern for the present paper The ldquo1990rdquo entries in Table 1refer to Kaernbachrsquos (1990) description of a staircasemethod for an objective yesno task in which an equalnumber of blanks are intermixed with the signal trialsSince Kaernbach could have used that method for Kaern-bach (2001b) I placed the ldquo1990rdquo entry in his slot

Kaernbachrsquos (1990) yesno staircase rules are simpleThe signal level changes according to a rule such as thefollowing Move down one level for correct responses ahit ltyes|signalgt or a correct rejection ltno|blankgt moveup three levels for wrong responses a miss ltno|signalgtor a false alarm ltyes|blankgt This rule similar to the onedownndashthree up rule used in 2AFC places the trials so thatthe average of the hit rate and correct rejection rate is 75

What is now needed is a mechanism to get the observer toestablish an optimal criterion that equalizes the number ofldquoyesrdquo and ldquonordquo responses (the ROC negative diagonal)This situation is identical to the problem of getting 2AFCobservers to equalize the number of responses to Inter-vals 1 and 2 The quadratic bias in dcent is the same in bothcases The simplest way to get subjects to equalize theirresponses is to give them feedback about any bias in theirresponses With equal ldquoyesrdquo and ldquonordquo responses the 75correct corresponds to a z score of 0674 and a dcent 5 2z 51349 If the subject does not have equal numbers of ldquoyesrdquoand ldquonordquo responses then the dcent would be calculated bydcent 5 zhit zfalse alarm I do hope that Kaernbachrsquos clever yesno objective staircase will be explored by others Oneshould be able to enhance it with ratings and multiple stim-uli (Klein 2002)

Forced choice detection As can be seen in the secondcolumn of Table 1 a popular category in this specialissue is the forced choice method for detection Thismethod is used by Strasburger (2001b) Kaernbach(2001a) Linschoten et al (2001) and Wichmann and Hill(2001a 2001b) These researchers use PFs based on prob-ability ordinates The connection between P and dcent is dif-ferent from the yesno case given by Equation 5 For2AFC signal detection theory provides a simple connec-tion between dcent and probability correct dcent(x) 5 Iuml2 z(x)In the preceding section I discussed the option of averag-ing the probabilities and then taking the z score (highthreshold approach) or averaging the z scores and then cal-culating the probability (signal detection approach) Thesignal detection method is better because it has a strongerempirical basis and avoids bias

For an m-AFC task with m 2 the connection betweendcent and P is more complicated than it is for m 5 2 The con-nection given by a fairly simple integral (Green amp Swets1966 Macmillan amp Creelman 1991) has been tabulatedby Hacker and Ratcliff (1979) and by Macmillan and Creel-man (1991) One problem with these tables is that they arebased on the assumption of unity ROC slope It is knownthat there are many cases in which the ROC slope is notunity (Green amp Swets 1966) so these tables connecting

Table 1Classification of the 10 Articles in This Special Issue

According to Three Distinctions Forced Choice Versus YesNo Detection Versus Discrimination Adaptive Versus Constant Stimuli

Detection Discrimination Adaptive or Source m-AFC YesNo m-AFC YesNo Constant Stimuli

Leek (2001) General loose g 3 loose g AWichmann amp Hill (2001a) 2AFC (3) (3) (3) CWichmann amp Hill (2001b) 2AFC (3) (3) (3) CLinschoten et al (2001) 2AFC AStrasburger (2001a) General 3 3 CStrasburger (2001b) 10AFC AKaernbach (2001a) General 3 AKaernbach (2001b) General (1990) 3 (1990) AMiller amp Ulrich (2001) for g 5 0 3 (for 0ndash100) 3 CKlein (present) General 3 3 3 Both

MEASURING THE PSYCHOMETRIC FUNCTION 1429

dcent and probability correct should be treated cautiouslyW P Banks and I (unpublished) investigated this topic andfound that near the P 5 50 correct point the dependenceof dcent on ROC slope is minimal Away from the 50 pointthe dependence can be strong

YesNo detection Linschoten et al (2001) comparethree methods (limits staircase 2AFC likelihood) for mea-suring thresholds with a small number of trials In themethod of limits one starts with a subthreshold stimulusand gradually increases the strength On each trial the ob-server says ldquoyesrdquo or ldquonordquo with respect to whether or not thestimulus is detected Although this is a classic yesno de-tection method it will not be discussed in this article be-cause there is no control of response bias That is blankswere not intermixed with signals

In Table 1 the Wichmann and Hill (2001a 2001b) arti-cles are marked with 3s in parentheses because althoughthese authors write only about 2AFC they mention that alltheir methods are equally applicable for yesno or m-AFCof any type (detectiondiscrimination) And their Matlabimplementations are fully general All the special issue ar-ticles reporting experimental data used a forced choicemethod This bias in favor of the forced choice methodol-ogy is found not only in this special issue it is widespreadin the psychophysics community Given the advantages ofthe yesno method (see discussion in Section II) I hopethat once yesno adaptive methods are accepted they willbecome the method of choice

m-AFC discrimination It is common practice to rep-resent 2AFC results as a plot of percent correct averagedover all trials versus stimulus strength This PF goes from50 to 100 The asymmetry between the lower andupper asymptotes introduces some inefficiency in thresh-old estimation as will be discussed Another problem withthe standard 50 to 100 plot is that a bias in choice ofinterval will produce an underestimate of dcent as has beenshown earlier Researchers often use the 2AFC method be-cause they believe that it avoids biased threshold estimatesIt is therefore surprising that the relatively simple correc-tion for 2AFC bias discussed earlier is rarely done

The issue of bias in m-AFC also occurs when m 2 InStrasburgerrsquos 10AFC letter discrimination task it is com-mon for subjects to have biases for responding with par-ticular letters when guessing Any imbalance in the re-sponse bias for different letters will result in a reductionof dcent as it did in 2AFC

There are several benefits of viewing the 2AFC dis-crimination data in terms of a PF going from 0 to 100Not only does it provide a simple way of viewing and cal-culating the interval bias it also enables new methods forestimating the PF parameters such as those proposed byMiller and Ulrich (2001) as will be discussed in Sec-tion III In my original comments on the Miller and Ulrichpaper I pointed out that because their nonparametric pro-cedure has uniform weighting of the different PF levelstheir method does not apply to 2AFC tasks The asymmet-ric binomial error bars near the 50 and 100 levelscause the uniform weighting of the Miller and Ulrich ap-

proach to be nonoptimal However I now realize that the2AFC discrimination task can be fit by a cumulative nor-mal going from 0 to 100 Because of that insight Ihave marked the discrimination forced choice column ofTable 1 for the Miller and Ulrich (2001) paper with theproviso that the PF goes from 0 to 100 Owing to thepopularity of 2AFC this modification greatly expands therelevance of their nonparametric approach

YesNo discrimination Kaernbach (2001b) and Millerand Ulrich (2001) offer theoretical articles that examineproperties of PFs that go from P 5 0 to 100 (P 5 pin Equation 1) In both cases the PF is the cumulative nor-mal (Equation 3) Although these PFs with g 5 0 couldbe for a yesno detection task with a zero false alarm rate(not plausible) or a forced choice detection task with an in-finite number of alternatives (not plausible either) I sus-pect that the authors had in mind a yesno discriminationtask (Table 1 col 5) A typical discrimination task in vi-sion is contrast discrimination in which the observer re-sponds to whether the presented contrast is greater than orless than a memorized reference Feedback reinforces thestability of the reference In a typical discrimination taskthe reference is one exemplar from a continuum of stimu-lus strengths If the reference is at a special zero pointrather than being an element of a smooth continuum thetask is no longer a simple discrimination task Zero con-trast would be an example of a special reference Klein(1985) discusses several examples which illustrate how anatural zero can complicate the analysis One might won-der how to connect the PF from the detection regime inwhich the reference is zero contrast to the discriminationregime in which the reference (pedestal) is at a high con-trast The dcent function to be introduced in Equation 20 doesa reasonably good job of fitting data across the full range ofpedestal strength going from detection to discrimination

The connection of the discrimination PF to the signaldetection PF is the same as that for yesno detection givenin Equation 5 dcent(x) 5 z(x) z(0) where z is the z scoreof the probability of a ldquogreater thanrdquo judgment In a dis-crimination task the bias z(0) is the z score of the prob-ability of saying ldquogreaterrdquo when the reference stimulus ispresented The stimulus strength x that gives z(x) 5 0 isthe point of subjective equality (x 5 PSE) If the cumu-lative normal PF of Equation 3 is used then the z scoreis linearly proportional to the stimulus strength z(x) 5(x PSE)threshold where threshold is defined to be thepoint at which dcent 5 1 I will come back to these distinc-tions between detection and discrimination PFs after pre-senting more groundwork regarding thresholds log ab-scissas and PF shapes (see Equations 14 and 17)

Definition of ThresholdThreshold is often defined as the stimulus strength that

produces a probability correct halfway up the PF If humansoperated according to a high-threshold assumption thisdefinition of threshold would be stable across different ex-perimental methods However as I discussed followingEquation 2 high-threshold theory has been discredited

1430 KLEIN

According to the more successful signal detection theory(Green amp Swets 1966) the dcent at the midpoint of the PFchanges according to the number of alternatives in aforced choice method and according to the false alarm ratein a yesno method This variability of dcent with method is agood reason not to define threshold as the halfway pointof the PF

A definition of threshold that is relatively independent ofthe method used for its measurement is to define thresholdas the stimulus strength that gives a fixed value of dcent Thestimulus strength that gives dcent 5 1 (76 correct for 2AFC)is a common definition of threshold Although I will showthat higher dcent levels have the advantage of giving more pre-cise threshold estimates unless otherwise stated I will takethreshold to be at dcent 5 1 for simplicity This definition ap-plies to both yesno and m-AFC tasks and to both detectionand discrimination tasks

As an example consider the case shown in Figure 1where the lower asymptote (false alarm rate) in a yesnodetection task is g 5 1587 corresponding to a z score ofz 5 ndash1 If threshold is defined to be at dcent 5 10 then fromEquation 5 the z score for threshold is z 5 0 correspond-ing to a hit rate of 50 (not quite halfway up the PF) Thisexample with a 50 hit rate corresponds to defining dcentalong the horizontal ROC axis If threshold had been de-fined to be dcent 5 2 in Figure 1 then the probability correctat threshold would be 8413 This example in which boththe hit rate and correct rejection rate are equal (both are8413) corresponds to the ROC negative diagonal

Strasburgerrsquos Suggestion on Specifying Slope A Logarithmic Abscissa

In many of the articles in this special issue a logarithmicabscissa such as decibels is used Many shapes of PFs havebeen used with a log abscissa Most delightful but frus-trating is Strasburgerrsquos (2001a) paper on the PF maximumslope He compares the Weibull logistic Quick cumula-tive normal hyperbolic tangent and signal detection dcentusing a logarithmic abscissa The present section is the out-come of my struggles with a number of issues raised byStrasburger (2001a) and my attempt to clarify them

I will typically express stimulus strength x in thresh-old units

(8)

where a is the threshold Stimulus strength will be ex-pressed in natural logarithmic units y as well as in lin-ear units x

(9)

where y 5 loge(x) is the natural log of the stimulus and Y5 loge(a) is the threshold on the log abscissa

The slope of the psychometric function P( yt) with a log-arithmic abscissa is

and the maximum slope is called bcent by Strasburger (2001a)A very different definition of slope is sometimes used bypsychophysicists the logndashlog slope of the dcent function aslope that is constant at low stimulus strengths for manyPFs The logndashlog dcent slope of the Weibull function is shownin the bottom pair of panels in Figure 1 The low-contrastlogndashlog slope is b for a Weibull PF (Equation 12) and bfor a dcent PF (Equation 20) Strasburger (2001a) shows howthe maximum slope using a probability ordinate is con-nected to the logndashlog slope using a dcent ordinate

A frustrating aspect of Strasburgerrsquos article is that theslope units of P (probability correct per loge) are not fa-miliar Then it dawned on me that there is a simple con-nection between slope with a loge abscissa slopelog 5[dP( yt)](dyt) and slope with a linear abscissa slopelin 5[dP(xt)](dxt) namely

(11)

because dyt dxt 5 [d loge(xt)] (dxt) 5 1xt At thresholdxt 5 1 ( yt 5 0) so at that point Strasburgerrsquos slope with alogarithmic axis is identical to my familiar slope plotted inthreshold units on a linear axis The simple connection be-tween slope on the log and linear abscissas converted me tobeing a strong supporter of using a natural log abscissa

The Weibull and cumulative normal psychometricfunctions To provide a background for Strasburgerrsquos ar-ticle I will discuss three PFs Weibull cumulative nor-mal and dcent as well as their close connections A com-mon parameterization of the PF is given by the Weibullfunction

(12)

where pweib(xt) the PF that goes from 0 to 100 is re-lated to Pweib(xt) the probability of a correct responseby Equation 1 b is the slope and k controls the definitionof threshold pweib(1) 5 1 k is the percent correct atthreshold (xt 5 1) One reason for the Weibullrsquos popular-ity is that it does a good job of fitting actual data In termsof logarithmic units the Weibull function (Equation 12) be-comes

(13)

Panel a of Figure 1 is a plot of Equation 12 (Weibull func-tion as a function of x on a linear abscissa) for the case b 52 and k 5 exp( 1) 5 0368 The choice b 5 2 makes theWeibull an upside-down Gaussian Panel d is the samefunction this time plotted as a function of y correspondingto a natural log abscissa With this choice of k the point ofmaximum slope as a function of yt is the threshold point( yt 5 0) The point of maximum slope at yt 5 0 is markedwith an asterisk in panel d The same point in panel a atxt 5 1 is not the point of maximum slope on a linear ab-scissa because of Equation 11 When plotted as a dcent func-tion [a z-score transform of P(xt)] in panel b the Weibullaccelerates below threshold and decelerates above thresh-

p y ktyt

weib( ) = ( )[ ]1exp

b

p x ktxt

weib ( ) = 1b

slopeslope

lin =( )

=( )

times =dP x

dx

dP y

dy

dy

dx xt

t

t

t

t

t t

log

slope y dPdy

dpdy

tt

t

( ) =

= ( )1 g

y x x y Yt t= ( ) = =log log e e a

x xt = a

(10A)

(10B)

MEASURING THE PSYCHOMETRIC FUNCTION 1431

old in agreement with a wide range of experimental dataThe acceleration and deceleration are most clearly seen bythe slope of dcent in the logndashlog coordinates of panel e Thelogndashlog dcent slope is plotted in panels c and f At xt 5 0 thelogndashlog slope is 2 corresponding to our choice of b 5 2The slope falls surprisingly rapidly as stimulus strength in-creases and the slope is near 1 at xt 5 2 If we had chosenb 5 4 for the Weibull function then the logndashlog dcent slopewould have gone from 4 at xt 5 0 to near 2 at xt 5 2Equation 20 will present a dcent function that captures thisbehavior

Another PF commonly used is based on the cumulativenormal (Equation 3)

(14A)

or

(14B)

where s is the standard error of the underlying Gaussianfunction (its connection to b will be clarified later) Notethat the cumulative normal PF in Equation 14 should beused only with a logarithmic abscissa because y goes to

yen needed for the 0 lower asymptote whereas theWeibull function can be used with both the linear (Equa-tion 12) and the log (Equation 13) abscissa

Two examples of Strasburgerrsquos maximum slope (hisEquations 7 and 17) are

(15)

for the Weibull function (Equation 13) and

(16)

for the cumulative normal (Equation 14) In Equation 15 Iset k in Equation 12 to be k 5 exp( 1) This choiceamounts to defining threshold so that the maximum slopeoccurs at threshold ( yt 5 0) Note that Equations 12ndash16are missing the (1 g) factor that are present in Strasburg-errsquos Equations 7 and 17 because we are dealing with prather than P Equations 15 and 16 provide a clear simpleand close connection between the slopes of the Weibull andcumulative normal functions so I am grateful to Stras-burger for that insight

An example might help in showing how Equation 16works Consider a 2AFC task (g 5 05) assuming a cumu-lative normal PF with s 5 10 According to Equation 16bcent 5 0399 At threshold (assumed for now to be the pointof maximum slope) P(xt 5 1) 5 75 At 10 abovethreshold P(xt 5 11) 75 1 01 bcent 5 07899 which isquite close to the exact value of 7896 This example showshow bcent defined with a natural log abscissa yt is relevantto a linear abscissa xt

Now that the logarithmic abscissa has been introducedthis is a good place to stop and point out that the log ab-scissa is fine for detection but not for discrimination since

the x 5 0 point is often in the middle of the range How-ever the cumulative normal PF is especially relevant to dis-crimination where the PF goes from 0 to 100 andwould be written as

(17A)

or

(17B)

where x 5 PSE is the point of subjective equality Theparameter a is the threshold since dcent(a) 5 zF(a)zF(0) 5 1 The threshold a is also s standard deviationsof the Gaussian probability density function (pdf ) Thereciprocal of a is the PF slope I tend to use the letter sfor a unitless standard deviation as occurs for the loga-rithmic variable y I use a for a standard deviation thathas the units of the stimulus x (like percent contrast) Thecomparison of Equation 14B for detection and Equation17B for discrimination is useful Although Equation 14is used in most of the articles in this special issue whichare concerned with detection tasks the techniques that I willbe discussing are also relevant to discrimination tasks forwhich Equation 17 is used

Threshold and the Weibull FunctionTo illustrate how a dcent 5 1 definition of threshold works

for a Weibull function let us start with a yesno task inwhich the false alarm rate is P(0) 5 1587 correspondingto a z score of zFA 5 1 as in Figure 1 The z score atthreshold (dcent 5 1) is zTh 5 zFA11 5 0 corresponding to aprobability of P(1) 5 50 From Equations 1 and 12 thek value for this definition of threshold is given by k 5[1 P(1)][1 P(0)] 5 05943 The Weibull function be-comes

(18A)

If I had defined dcent 5 2 to be threshold then zTh 5 zFA125 1 leading to k 5 (1 08413)(1 01587) 5 01886giving

(18B)

As another example suppose the false alarm rate in ayesno task is at 50 not uncommon in a signal detectionexperiment with blanks and test stimuli intermixed Thenthreshold at dcent 5 1 would occur at 8413 and the PFwould be

(18C)

with Pweib(0) 5 50 and Pweib(1) 5 8413 This casecorresponds to defining dcent on the ROC vertical intercept

For 2AFC the connection between dcent and z is z 5 dcent205Thus zTh 5 2 05 5 07071 corresponding to Pweib(1) 57602 leading to k 5 04795 in Equation 12 These con-nections will be clarified when an explicit form (Equation20)is given for the dcent function dcent(xt)

P xtxt

weib ( ) = 1 0 5 0 3173 b

P xtxt

weib ( ) = 1 1 0 1587 0 1886( ) b

P xtxt

weib ( ) = 1 1 0 1587 0 5943( ) b

z x xtF ( ) = PSE

a

p x P x xt tF F F( ) = ( ) = aelig

egraveoumloslash

PSEa

cent = =bs p s

12

0 399

cent = =b b bexp( ) 1 0 368

z yy y Y

tt( ) = =s s

p yy

tt

F F( ) =aeligegraveccedil

oumloslashdivides

1432 KLEIN

Complications With Strasburgerrsquos Advocacy of Maximum Slope

Here I will mention three complications implicated inStrasburgerrsquos suggestion of defining slope at the point ofmaximum slope on a logarithmic abscissa

1 As can be seen in Equation 11 the slopes on linearand logarithmic abscissas are related by a factor of 1xtBecause of this extra factor the maximum slope occursat a lower stimulus strength on linear as compared withlog axes This makes the notion of maximum slope lessfundamental However since we are usually interested inthe percent error of threshold estimates it turns out thata logarithmic abscissa is the most relevant axis support-ing Strasburgerrsquos log abscissa definition

2 The maximum slope is not necessarily at threshold(as Strasburger points out) For the Weibull functions de-fined in Equation 13 with k 5 exp( 1) and the cumula-tive normal function in Equation 14 the maximum slope(on a log axis) does occur at threshold However the twopoints are decoupled in the generalized Weibull functiondefined in Equation 12 For the Quick version of theWeibull function (Equation 13 with k 5 05 placingthreshold halfway up the PF) the threshold is below thepoint of maximum slope the derivative of P(xt) atthreshold is

which is slightly different from the maximum slope asgiven by Equation 15 Similarly when threshold is de-fined at dcent 5 1 the threshold is not at the point of max-imum slope People who fit psychometric functions wouldprobably prefer reporting slopes at the detection thresh-old rather than at the point of maximum slope

3 One of the most important considerations in selectinga point at which to measure threshold is the question ofhow to minimize the bias and standard error of the thresh-old estimate The goal of adaptive methods is to place tri-als at one level In order to avoid a threshold bias due to aimproper estimate of PF slope the test level should be atthe defined threshold The variance of the threshold esti-mate when the data are concentrated as a single level (aswith adaptive procedures) is given by Gourevitch and Gal-anter (1967) as

(19)

If the binomial error factor of P(1 P)N were not presentthe optimal placement of trials for measuring thresholdwould be at the point of maximum slope However thepresence of the P(1 P) factor shifts the optimal point toa higher level Wichmann and Hill (2001b) consider howtrial placement affects threshold variance for the methodof constant stimuli This topic will be considered in Sec-tion II

Connecting the Weibull and dcent FunctionsFigures 5ndash7 of Strasburger (2001a) compare the logndash

log dcent slope b with the Weibull PF slope b Strasburgerrsquosconnection of b 5 088b is problematic since the dcent ver-sion of the Weibull PF does not have a fixed logndashlog slopeAn improved dcent representation of the Weibull function isthe topic of the present section A useful parameterizationfor dcent which I shall refer to as the StromeyerndashFoley func-tion was introduced by Stromeyer and Klein (1974) andused extensively by Foley (1994)

(20)

where xt is the stimulus strength in threshold units as inEquation 8 The factors with a in the denominator are pres-ent so that dcent equals unity at threshold (xt 5 1 or x 5 a)At low xt dcent x t

ba The exponent b (the logndashlog slope ofdcent at very low contrasts) controls the amount of facilita-tion near threshold At high xt Equation 20 becomes dcentx t

1 w (1 a) The parameter w is the logndash log slope of thetest threshold versus pedestal contrast function (the tvc orldquodipperrdquo function) at strong pedestals One must be a bitcautious however because in yesno procedures the tvcslope can be decoupled from w if the signal detection ROCcurve has nonunity slope (Stromeyer amp Klein 1974) Theparameter a controls the point at which the dcent function be-gins to saturate Typical values of these unitless parametersare b 5 2 w 5 05 and a 5 06 (Yu Klein amp Levi 2001)The function in Equation 20 accelerates at low contrast(logndashlog slope of b) and decelerates at high contrasts (logndashlog slope of 1 w) in general agreement with a broadrange of data

For 2AFC z 5 dcentIuml2 so from Equation 3 the connec-tion between dcent and probability correct is

(21)

To establish the connection between the PFs specified inEquations12 and 20ndash21 one must first have the two agreeat threshold For 2AFC with the dcent 5 1 the threshold atP 5 7602 can be enforced by choosing k 5 04795 (seethe discussion following Equation 18C) so that Equa-tions 1 and 12 become

(22)

Modifying k as in Equation 22 leaves the Weibull shapeunchanged and shifts only the definition of threshold Ifb 5 106b w 5 1 039b and a 5 0614 then for allvalues of b the Weibull and dcent functions (Equations 21and 22 vs Equation 12) differ by less than 00004 for allstimulus strengths At very low stimulus strengths b 5b The value b 5 106b is a compromise for getting anoverall optimal fit

Strasburger (2001a) is concerned with the same issueIn his Table 1 he reports dcent logndashlog slopes of [8847

P xtxt( ) = 1 1 5 4795( )

b

P xd x

tt( ) = +

cent( )eacute

eumlecircecirc

5 5erf2

cent( ) =d xx

a a xt

tb

tb w+ 1 + 1( )

var( )( )

YP P

N

dP y

dy

t

t

=( )eacute

eumlecircecirc

12

cent =( )

= =b g b g bthresh

dP x

dx

t

t

e( )log ( )

( ) 12

2347 1

MEASURING THE PSYCHOMETRIC FUNCTION 1433

18379 3131 4421] for b 5 [1 2 35 5] If the dcent slopeis taken to be a measure of the dcent exponent b then theratio bb is [08847 09190 08946 08842] for the fourb values Our value of bb 5 106 differs substantiallyfrom 08 (Pelli 1985) and 088 (Strasburger 2001a) Pelli(1985) and Strasburger (2001a) used dcent functions with a 51 in Equation 20 With a 5 0614 our dcent function (Equa-tion 20) starts saturating near threshold in agreement withexperimental data The saturation lowers the effective valueof b I was very surprised to find that the StromeyerndashFoley dcent function did such a good job in matching theWeibull function across the whole range of b and x Fora long time I had thought a different fit would be neededfor each value of b

For the same Weibull function in a yesno method (falsealarm rate 5 50) the parameters b and w are identicalto the 2AFC case above Only the value of a shifts froma 5 0614 (2AFC) to a 5 054 (yesno) In order to makext 5 1 at dcent 5 1 k becomes 03173 (see Equation 18C)instead of 04795 (2AFC) As with 2AFC the differencebetween the Weibull and dcent function is less than 0004(lt04) for all stimulus strengths and all values of b andfor both a linear and a logarithmic abscissa These valuesmean that for both the yes no and the 2AFC methods thedcent function (Equation 20) corresponding to a Weibullfunction with b 5 2 has a low-contrast logndashlog slope ofb 5 212 and a high-contrast logndashlog slope of 1 w 5078 The high-contrast tvc (test contrast vs pedestal con-trast discrimination function sometimes called the ldquodip-perrdquo function) logndashlog slope would be w 5 022

I also carried out a similar analysis asking what cumu-lative normal s is best matched to the Weibull I founds 5 11720b However the match is poor with errors of4 (rather than lt004 in the preceding analysis) Thepoor fit of the two functions makes the fitting proceduresensitive to the weighting used in the fitting procedure Ifone uses a weighting factor of [P(1 P)N] where P is thePF one will find a quite different value for s than if oneignored the weighting or if one chose the weighting on thebasis of the data rather than the fitting function

II COMPARING EXPERIMENTAL TECHNIQUES FOR MEASURING

THE PSYCHOMETRIC FUNCTION

This section inspired by Linschoten et al (2001) is anexamination of various experimental methods for gather-ing data to estimate thresholds Linschoten et al comparethree 2AFC methods for measuring smell and taste thresh-olds a maximum-likelihood adaptive method an upndashdownstaircase method and an ascending method of limits Thespecial aspect of their experiments was that in measuringsmell and taste each 2AFC trial takes between 40 and60 sec (Lew Harvey personal communication) because ofthe long duration needed for the nostrils or mouth to re-cover their sensitivity The purpose of the Linschoten et alpaper is to examine which of the three methods is mostaccurate when the number of trials is low Linschoten et al

conclude that the 2AFC method is a good method for thistask My prior experience with forced choice tasks and lownumber of trials (Manny amp Klein 1985 McKee et al1985) made me wonder whether there might be better meth-ods to use in circumstances in which the number of trialsis limited This topic is considered in Section II

Compare 2AFC to YesNoThe commonly accepted framework for comparing

2AFC and yesno threshold estimates is signal detectiontheory Initially I will make the common assumption thatthe dcent function is a power function (Equation 20 witha 5 1)

(23)

Later in this section the experimentally more accurateform Equation 20 with a 1 will be used The varianceof the threshold estimate will now be calculated for 2AFCand yesno for the simplest case of trials placed at a sin-gle stimulus level y the goal of most adaptive methodsThe PF slope relates ordinate variance to abscissa vari-ance (from Gourevitch amp Galanter 1967)

(24)

where Y 5 y yt is the threshold estimate for a natural logabscissa as given in Equation 9 This formula for var(Y )is called the asymptotic formula in that it becomes exactin the limit of large number of total trials N Wichmannand Hill (2001b) show that for the method of constantstimuli a proper choice of testing levels brings one to theasymptotic region for N 120 (the smallest value thatthey explored) Improper choice of testing levels requiresa larger N in order to get to the asymptotic region Equa-tion 24 is based on having the data at a single level as oc-curs asymptotically in adaptive procedures In addition thedefinition of threshold must be at the point being testedIf levels are tested away from threshold there will be anadditional contribution to the threshold variance (Finney1971 McKee et al 1985) if the slope is a variable param-eter in the fitting procedure Equation 24 reaches its as-ymptotic value more rapidly for adaptive methods with fo-cused placement of trials than with the constant stimulusmethod when slope is a free parameter

Equation 24 needs analytic expressions for the deriva-tive and the variance of dcent The derivative is obtained fromEquation 23

(25)

The variance of dcent for both yesno (dcent 5 zhit 1 zcorrect rejection)and 2AFC (dcent 5 Iuml2zave) is

(26)var var( ) exp var( ) cent( ) = = ( )d z z P2 4 2p

dddy

bdt

cent = cent

var( )var( )

Yd

dddyt

= cent

centaeligegraveccedil

oumloslashdivide

2

cent = = ( )d c bytb

texp

1434 KLEIN

For the yes no case Equation 26 is based on a unityslope ROC curve with the subjectrsquos criterion at the neg-ative diagonal of the ROC curve where the hit rate equalsthe correct rejection rate This would be the most effi-cient operating point The variance for a nonunity slopeROC curve was considered by Klein Stromeyer andGanz (1974) From binomial statistics

(27)

with n 5 N for 2AFC and n 5 N2 for yesno since forthis binary yesno task the number of signal and blanktrials is half the total number of trials

Putting all of these factors together gives

(28)

for both the yesno and the 2AFC cases The only dif-ference between the two cases is the definition of n andthe definition of dcent (dcent2AFC 5 205z and dcentYN 5 2z for asymmetric criterion) When this is expressed in terms ofN (number of trials) and z (z score of probability cor-rect) we get

(29)

for both 2AFC and yesno That is when the percent cor-rects are all equal (P2AFC 5 Phit 5 Pcorrect rejection) thethreshold variance for 2AFC and yesno are equal Theminimum value of var(Y ) is 164(Nb2) occurring at z 5157 (P 5 94) This value of 94 correct is the optimumvalue at which to test for both 2AFC and yesno tasks in-dependent of b

This result that the threshold variance is the same for2AFC and yesno methods seems to disagree with Figure 4of McKee et al (1985) where the standard errors of thethreshold estimates are plotted for 2AFC and yesno asa function of the number of trials The 2AFC standarderror (their Figure 4) is twice that of yesno correspond-ing to a factor of four in variance However the McKeeet al analysis is misleading since all they (we) did for theyesno task was to use Equation 2 to expand the PF to a 0to 100 range thereby doubling the PF slope Thatrescaling is not the way to convert a 2AFC detection taskto the yesno task of the same detectability A signal de-tection methodology is needed for that conversion aswas done in the present analysis

One surprise is that the optimal testing probability isindependent of the PF slope b This finding is a generalproperty of PFs in which the dependence on the slopeparameter has the form xb The 94 level corresponds todcentYN 5 315 and dcent2AFC 5 223 For the commonly usedP 5 75 (z 5 0765 dcentYN 5 135 dcent2AFC 5 0954)var(Y ) 5 408(Nb2) which is about 25 times the opti-mal value obtained by placing the stimuli at 94 Green(1990) had previously pointed out that the 94 and 92levels were the optimal points to test for the cumulative

normal and logit functions respectively because of theminimum threshold estimate variability when one is test-ing at that point on the PF Other PFs can have optimalpoints at slightly lower levels but these are still above thelevels to which adaptive methods are normally directed

It is also useful to compare the 2AFC and yesno at thesame dcent rather than at their optimal points In that case fordcent values fixed at [1 2 3] the ratio of the 2AFC varianceto the yesno variance is [182 136 079] At low dcent thereis an advantage for the 2AFC method and that reverses athigh dcent as is expected from the optimal dcent being higher foryesno than for 2AFC In general one should run the yesno method at dcent levels above 20 (84 correct in the blankand signal intervals corresponds to dcent 5 2)

The equality of the optimal var(Y ) for 2AFC and yesno methods is not only surprising it seems paradoxicalmdashas can be seen by the following gedankenexperiment Sup-pose subjects close their eyes on the first interval of the2AFC trial and base their judgments on just the secondinterval as if they were doing a yesno task since blanksand stimuli are randomly intermixed Instead of sayingldquoyesrdquo or ldquonordquo they would respond ldquoInterval 2rdquo or ldquoInter-val 1rdquo respectively so that the 2AFC task would become ayesno task The paradox is that one would expect the vari-ance to be worse than for the eyes open 2AFC case sinceone is ignoring information However Equation29 showsthat the 2AFC and yesno methods have identical optimalvariance I suspect that the resolution of this paradox is thatin the yesno method with a stable criterion the optimalvariance occurs at a higher dcent value (315) than the 2AFCvalue (223) The dcent power law function that I chose doesnot saturate at large dcent levels giving a benefit to the yesno method

In order to check whether the paradox still holds witha more realistic PF (a dcent function that saturates at largedcent ) I now compare the 2AFC and yesno variance usinga Weibull function with a slope b In order to connect2AFC and yesno I need to use signal detection theoryand the StromeyerndashFoley dcent function (Equation 20) withthe ldquoWeibull parametersrdquo b 5 106b a 5 0614 and w 51 39b Following Equation 22 I pointed out that withthese parameter values the difference between the 2AFCWeibull function and the 2AFC dcent function (converted toa probability ordinate) is less than 0004 After the de-rivative Equation 25 is appropriately modified we findthat the optimal 2AFC variance is 342(Nb2) For the yesno case the optimal variance is 402(Nb2) This result re-solves the paradox The optimal 2AFC variance is about18 lower than the yesno variance so closing onersquos eyesin one of the intervals does indeed hurt

We shouldnrsquot forget that if one counts stimulus presen-tations Np rather than trials N the optimal 2AFC variancebecomes 684(Npb2) as opposed to 402(Npb2) for yesnoThus for the Linschoten experiments with each stimuluspresentation being slow the yesno methodology is moreaccurate for a given number of stimulus presentationseven at low dcent values Kaernbach (1990) compares 2AFC

var( ) exp( )

( )Y z

P P

N bz= ( )2

122

p

var( ) exp( )

( )Y z

P P

n bd= ( ) cent

412

2p

var( )( )

PP P

n= 1

MEASURING THE PSYCHOMETRIC FUNCTION 1435

and yesno adaptive methods with simulations and ac-tual experiments His abscissa is number of presenta-tions and his results are compatible with our findings

The objective yesno method that I have been discussingis inefficient since 50 of the trials are blanks Greaterefficiency is possible if several points on the PF are mea-sured (method of constant stimuli) with the number of blanktrials the same as at the other levels For example if fivelevels of stimuli were measured including blanks then theblanks would be only 20 of the total rather than 50 Forthe 2AFC paradigm the subject would still have to look at50 of the presentations being blanks The yesno eff i-ciency can be further improved by having the subjectrespond to the multiple stimuli with multiple ratings ratherthan a binary yesno response This rating scale methodof constant stimuli is my favorite method and I have beenusing it for 20 years (Klein amp Stromeyer 1980) The mul-tiple ratings allow one to measure the psychometric func-tion shape for dcents as large as 4 (well into the dipper regionwhere stimuli are slightly above threshold) It also allowsone to measure the ROC slope (presence of multiplicativenoise) Simulations of its benefits will be presented inKlein (2002)

Insight into how the number of responses affects thresh-old variance can be obtained by using the cumulative nor-mal PF (Equation 17) considered by Kaernbach (2001b)and Miller and Ulrich (2001) with g 5 0 Typically this isthe case of yesno discrimination but it can also be 2AFCdiscrimination with an ordinate going from 0 to 100as I have discussed in connection with Figure 3 For a cu-mulative normal PF with a binary response and for a sin-gle stimulus level being tested the threshold variance is(Equation 19) as follows

(30)

from Equation 27 and forthcoming Equations 40 and 41The optimal level to test is at P 5 05 so the optimal vari-ance becomes

(31)

If instead of a binary response the observer gives an ana-log response (many many rating categories) then the vari-ance is the familiar connection between standard deviationand standard error

(32)

With four response categories the variance is 119s2Nwhich is closer to the analog than to the binary case Ifstimuli with multiple strengths are intermixed (methodof constant stimuli) then the benefit of going from a bi-nary response to multiple responses is greater than that in-dicated in Equations 31ndash32 (Klein 2002)

A brief list of other problems with the 2AFC method ap-plied to detection comprises the following

1 2AFC requires the subject to memorize and thencompare the results of two subjective responses It is cog-nitively simpler to report each response as it arrives Sub-jects without much experience can easily make mistakes(lapses) simply becoming confused about the order of thestimuli Klein (1985) discusses memory load issues rele-vant to 2AFC

2 The 2AFC methodology makes various types ofanalysis and modeling more difficult because it requiresthat one average over all criteria The response variable in2AFC is the Interval 2 activation minus the Interval 1 ac-tivation (see the abscissa in Figure 2) This variable goesfrom minus infinity to plus infinity whereas the yesnovariable goes from zero to infinity It therefore seems moredifficult to model the effects of probability summation anduncertainty for 2AFC than for yesno

3 Models that relate psychophysical performance to un-derlying processes require information about how the noisevaries with signal strength (multiplicative noise) The rat-ing scale method of constant stimuli not only measures dcentas a function of contrast it also is able to provide an esti-mate of how the variance of the signal distribution (theROC slope) increases with contrast The 2AFC methodlacks that information

4 Both the yesno and 2AFC methods are supposed toeliminate the effects of bias Earlier I pointed out that in myrecent 2AFC discrimination experiments when the stim-ulus is near threshold I find I have a strong bias in favor ofresponding ldquoStimulus 2rdquo Until I learned to compensatefor this bias (not typically done by naiumlve subjects) my dcentwas underestimated As I have discussed it is possible tocompensate for this bias but that is rarely done

Improving the Brownian (Up ndashDown) StaircaseAdaptive methods with a goal of placing trials at an

optimal point on the PF are efficient Leek (2001) pro-vides a nice overview of these methods There are twogeneral categories of adaptive methods those based onmaximum (or mean) likelihood and those based on sim-pler staircase rules The present section is concernedwith the older staircase methods which I will callBrownian methods I used the name ldquoBrownianrdquo becauseof the upndashdown staircasersquos similarity to one-dimensionalBrownian motion in which the direction of each step israndom The familiar upndashdown staircase is Brownianbecause at each trial the decision about whether to go upor down is decided by a random factor governed by prob-ability P(x) I want to make this connection to Brownianmotion because I suspect that there are results in thephysics literature that are relevant to our psychophysicalstaircases I hope these connections get made

A Brownian staircase has a minimal memory of the runhistory it needs to remember the previous step (includingdirection) and the previous number of reversals or numberof trials (used for when to stop and for when to decreasestep size) A typical Brownian rule is to decrease signalstrength by one step (like 1 dB) after a correct responseand to increase it three steps after an incorrect response (a

var( thresh) = s 2

N

var( thresh) = timesp s2

2

N

var(var( )

( )exp thresh) =aeligegraveccedil

oumloslashdivide

= ( )P

N dPdy

P P zN2

22

2 1p s

1436 KLEIN

one downndashthree up rule) I was pleased to learn recentlythat his rule works nicely for an objective yesno task(Kaernbach 1990) as well as 2AFC as discussed earlier

In recent years likelihood methods (Pentland 1980Watson amp Pelli 1983) have become popular and havesomewhat displaced Brownian methods There is a wide-spread feeling that likelihood methods have substantiallysmaller threshold variance than do staircases in whichthresholds are obtained by simply averaging the levelstested Given the importance of this issue for informingresearchers about how best to measure thresholds it issurprising that there have been so few simulations com-paring staircases and optimal methods Pentlandrsquos (1980)results showing very large threshold variance for a stair-case method are so dramatically different from the re-sults of my simulations that I am skeptical The best pre-vious simulations of which I am aware are those of Green(1990) who compared the threshold variance from sev-eral 2AFC staircase methods with the ideal variance(Equation 24) He found that as long as the staircase ruleplaced trials above 80 correct the ratio of observed toideal threshold variance was about 14 This value is thesquare of the inverse of Greenrsquos finding that the ratio ofthe expected to the observed sigma is about 85

In my own simulations (Klein 2002) I found that aslong as the final step size was less than 02s the thresh-old variance from simply averaging over levels was closeto the optimal value Thus the staircase threshold varianceis about the same as the variance found by other optimalmethods One especially interesting finding was that thecommon method of averaging an even number of rever-sal points gave a variance that was about 10 higher thanaveraging over all levels This result made me wonderhow the common practice of averaging reversals ever gotstarted Since Green (1990) averaged over just the reversalpoints I suspect that his results are an overestimate of thestaircase variance In addition since Green specifies stepsize in decibels it is difficult for me to determine whetherhis final step size was small enough Since the thresholdvariance is inversely related to slope it is important to re-late step size to slope rather than to decibels The most nat-ural reference unit is s the standard deviation associatedwith the cumulative normal In summary I suggest that thisgeneral feeling of distrust of the simple Brownian staircaseis misguided and that simple staircases are often as goodas likelihood methods In the section Summary of Advan-tages of Brownian Staircase I point out several advantagesof the simple staircase over the likelihood methods

A qualification should be made to the claim above thataveraging levels of a simple staircase is as good as a fancylikelihood method If the staircase method uses a too smallstep size at the beginning and if the starting point is farfrom threshold then the simple staircase can be inefficientin reaching the threshold region However at the end ofthat run the astute experimenter should notice the largediscrepancy between the starting point and the thresholdand the run should be redone with an improved startingpoint or with a larger initial step size that will be reducedas the staircase proceeds

Increase number of 2AFC response categories In-dependent of the type of adaptive method used the 2AFCtask has a flaw whereby a series of lucky guesses at lowstimulus levels can erroneously drive adaptive methods tolevels that are too low Some adaptive methods with a smallnumber of trials are unable to fully recover from a string oflucky guesses early in the run One solution is to allow thesubject to have more than two responses For reasons thatI have never understood people like to have only two re-sponse categories in a 2AFC task Just as I am reluctant totake part in 2AFC experiments I am even more reluctantto take part in two response experiments The reason issimple I always feel that I have within me more than onebit of information after looking at a stimulus I usually haveat least four or more subjective states (2 bits) for a yesnotask HN LN LY HY where H and L are high and lowconfidence and N and Y are did not and did see it For a2AFC my internal states are H1 L1 L2 H2 for high andlow confidence on Intervals 1 and 2 Kaernbach (2001a) inthis special issue does an excellent job of justifying thelow-confidence response category He provides solid ar-guments simulations and experimental data on how alow-confidence category can minimize the ldquolucky guessrdquoproblem

Kaernbach (2001a) groups the L1 and L2 categoriestogether into a ldquodonrsquot knowrdquo category and argues per-suasively that the inclusion of this third response can in-crease the precision and accuracy of threshold estimateswhile also leaving the subject a happier person He callsit the ldquounforced choicerdquo method Most of his article con-cerns the understandable fear of introducing a responsebias exactly what 2AFC was invented to avoid He has de-veloped an intelligent rule for what stimulus level shouldfollow a ldquodonrsquot knowrdquo response Kaernbach shows withboth simulations and real data that with his method thispotential response bias problem has only a second-ordereffect on threshold estimates (similar to the 2AFC inter-val bias discussed around Equation 7) and its advantagesoutweigh the disadvantages I urge anyone using the 2AFCmethod to read it I conjecture that Kaernbachrsquos method isespecially useful in trimming the lower tail of the distri-bution of threshold estimates and also in reducing the in-terval bias of 2AFC tasks since the low confidence ratingwould tend to be used for trials on which there is the mostuncertainty

Although Kaernbach has managed to convince me ofthe usefulness of his three-response method he might havetrouble convincing others because of the response biasquestion I should like to offer a solution that might quellthe response bias fears Rather than using 2AFC with threeresponses one can increase the number of responses tofourmdashH1 L1 L2 H2mdashas discussed two paragraphsabove The L rating can be used when subjects feel theyare guessing

To illustrate how this staircase would work considerthe case in which we would like the 2AFC staircase to con-verge to about 84 correct A 5 upndash1 down staircase(5 levels up for a wrong response and 1 level down for acorrect response) leads to an average probability correct

MEASURING THE PSYCHOMETRIC FUNCTION 1437

of 56 5 8333 If one had zero information about thetrial and guessed then one would be correct half the timeand the staircase would shift an average of (5 1)2 512 levels Thus in his scheme with a single ldquodonrsquot knowrdquoresponse Kaernbach would have the staircase move up2 levels following a ldquodonrsquot knowrdquo response One mightworry that continued use of the ldquodonrsquot knowrdquo responsewould continuously increase the level so that one wouldget an upward bias But Kaernbach points out that theldquodonrsquot knowrdquo response replaces incorrect responses to agreater extent than it replaces correct responses so thatthe 12 level shift is actually a more negative shift than the15 shift associated with a wrong response For my sug-gestion of four responses the step shifts would be ndash1 1113 and 15 levels for responses HC LC LI and HIwhere H and L stand for high and low confidence and Cand I stand for correct and incorrect A slightly more con-servative set of responses would be ndash1 0 14 15 inwhich case the low-confidence correct judgment leaves thelevel unchanged In all these examples including Kaern-bachrsquos three-response method the low-confidence re-sponse has an average level shift of 12 when one is guess-ing The most important feature of the low confidencerating is that when one is below threshold giving a lowconfidence rating will prevent one from long strings oflucky guesses that bring one to an inappropriately lowlevel This is the situation that can be difficult to overcomeif it occurs in the early phase of a staircase when the stepsize can be large The use of the confidence rating wouldbe especially important for short runs where the extra bitof information from the subject is useful for minimizingerroneous estimates of threshold

Another reason not to be fearful of the multiple-response2AFC method is that after the run is finished one can es-timate threshold by using a maximum likelihood proce-dure rather than by averaging the run locations Standardlikelihood analysis would have to be modified for Kaern-bachrsquos three-response method in order to deal with theldquodonrsquot knowrdquo category For the suggested four-responsemethod one could simply ignore the confidence ratings forthe likelihood analysis My simulations discussed earlierin this section and Greenrsquos (1990) reveal that averagingover the levels (excluding a number of initial levels toavoid the starting point bias) has about as good a preci-sion and accuracy as the best of the likelihood methodsIf the threshold estimates from the likelihood method andfrom the averaging levels method disagree that run couldbe thrown out

Summary of advantages of Brownian staircaseSince there is no substantial variance advantage to thefancier likelihood methods one can pay more attention toa number of advantages associated with simple staircases

1 At the beginning of a run likelihood methods mayhave too little inertia (the sequential stimulus levels jumparound easily) unless a ldquostiff rdquo prior is carefully chosenBy inertia I refer to how difficult it is for the next level todeviate from the present level At the beginning of a runone would like to familiarize the subject with the stimu-

lus by presenting a series of stimuli that gradually increasein difficulty This natural feature of the Brownian stair-case is usually missing in QUEST-like methods Theslightly inefficient first few trials of a simple staircasemay actually be a benefit since it gives the subject somepractice with slightly visible stimuli

2 Toward the end of a maximum likelihood run one canhave the opposite problem of excessive inertia wherebyit is difficult to have substantial shifts of test contrastThis can be a problem if nonstationary effects are presentFor example thresholds can increase in mid-run owingto fatigue or to adaptation if a suprathreshold pedestal ormask is present In an old-fashioned Brownian staircasewith moderate inertia a quick look at the staircase trackshows the change of threshold in mid-run That observa-tion can provide an important clue to possible problemswith the current procedures A method that reveals non-stationarity is especially important for difficult subjectssuch as infants or patients in whom fatigue or inatten-tion can be an important factor

3 Several researchers have told me that the simplicity ofthe staircase rules is itself an advantage It makes writingprograms easier It is nicer for teaching students It alsosimplifies the uncertain business of guessing likelihoodpriors and communicating the methodology when writ-ing articles

Methods That Can Be Used to Estimate Slope as Well as Threshold

There are good reasons why one might want to learnmore about the PF than just the single threshold valueEstimating the PF slope is the first step in this directionTo estimate slope one must test the PF at more than onepoint Methods are available to do this (Kaernbach 2001bKontsevich amp Tyler 1999) and one could even adapt sim-ple Brownian staircase procedures to give better slope es-timates with minimum harm to threshold estimates (seethe comments in Strasburger 2001b)

Probably the best method for getting both thresholds andslopes is the Y (Psi) method of Kontsevich and Tyler(1999) The Y method is not totally novel it borrows tech-niques from many previous researchersmdashas Kontsevichand Tyler point out in a delightful paragraph that both clar-ifies their method and states its ancestry This paragraphsummarizes what is probably the most accurate and preciseapproach to estimating thresholds and slope so I reproduceit here

To summarize the Y method is a combination of solutionsknown from the literature The method updates the poste-rior probability distribution across the sampled space ofthe psychometric functions based on Bayesrsquo rule (Hall1968 Watson amp Pelli 1983) The space of the psychome-tric functions is two-dimensional (King-Smith amp Rose1997 Watson amp Pelli 1983) Evaluation of the psycho-metric function is based on computing the mean of theposterior probability distribution (Emerson 1986 King-Smith et al 1994) The termination rule is based on thenumber of trials as the most practical option (King-Smith

1438 KLEIN

et al 1994 Watson amp Pelli 1983) The placement of eachnew trial is based on one-step ahead minimum search(King-Smith 1984) of the expected entropy cost function(Pelli 1987)

A simplified rough description of the Y trial placementfor 2AFC runs is that initially trials are placed at about the90 correct level to establish a solid threshold Oncethreshold has been roughly established trials are placed atabout the 90 and the 70 levels to pin down the slopeThe precise levels depend on the assumed PF shape

Before one starts measuring slopes however one mustfirst be convinced that knowledge of slope can be impor-tant It is so for several reasons

1 Parametric fitting procedures either require knowl-edge of slope b or must have data that adequately con-strain the slope estimate Part of the discussion that fol-lows will be on how to obtain reliable threshold estimateswhen slope is not well known A tight knowledge of slopecan minimize the error of the threshold estimate

2 Strasburger (2001a 2001b) gives good argumentsfor the desirability of knowing the shape of the PF He isinterested in differences between foveal and peripheralvisual processing and he uses the PF slope as an indica-tion of differences

3 Slopes are different for different stimuli and thiscan lead to misleading results if slope is ignored It mayhelp to describe my experience with the Modelfest pro-ject (Carney et al 2000) to make this issue concreteThis continuing project involves a dozen laboratoriesaround the world We have measured detection thresh-olds for a wide variety of visual stimuli The data areused for developing an improved model of spatial visionSome of the stimuli were simple easily memorized lowspatial frequency patterns that tend to have shallowerslopes because a stimulus-known-exactly template canbe used efficiently and with minimal uncertainty On theother hand tiny Modelfest stimuli can have a good dealof spatial uncertainty and are therefore expected to pro-duce PFs with steeper slopes Because of the slope changethe pattern of thresholds will depend on the level atwhich threshold is measured When this rich dataset isanalyzed with f ilter models that make estimates formechanism bandwidths and mechanism pooling thesedifferent slope-dependent threshold patterns will lead todifferent estimates for the model parameters Several ofus in the Modelfest data gathering group argued that weshould use a data-gathering methodology that wouldallow estimation of PF slope of each of the 45 stimuliHowever I must admit that when I was a subject my pa-tience wore thin and I too chose the quicker runs Al-though I fully understand the attraction of short runs fo-cused on thresholds and ignoring slope I still think thereare good reasons for spreading trials out For one thinginterspersing trials that are slightly visible helps the sub-ject maintain memory of the test pattern

4 Slopes can directly resolve differences between mod-els In my own research for example my colleagues and

I are able to use the PF slope to distinguish long-range fa-cilitation that produces a low-level excitation or noise re-duction (typical with a cross-orientation surround) from ahigher level uncertainty reduction (typical with a same-orientation surround)

Throwing-Out Rules Goodness-of-FitThere are occasions when a staircase goes sour Most

often this happens when the subjectrsquos sensitivity changesin the middle of a run possibly because of fatigue It istherefore good to have a well-specified rule that allowsnonstationary runs to be thrown out The throwing-out rulecould be based on whether the PF slope estimate is toolow or too high Alternatively a standard approach is tolook at the chi-square goodness-of-f it statistic whichwould involve an assumption about the shape of the un-derlying PF I have a commentary in Section III regard-ing biases in the chi-square metric that were found byWichmann and Hill (2001a) Biases make it difficult touse standard chi-square tables The trouble with this ap-proach of basing the goodness-of-fit just on the cumulateddata is that it ignores the sequential history of the run TheBrownian upndashdown staircase makes the sequential historyvividly visible with a plot of sequentially tested levelsWhen fatigue or adaptation sets in one sees the tested lev-els shift from one stable point to another Leek Hanna andMarshall (1991) developed a method of assessing the non-stationarity of a psychometric function over the course ofits measurement using two interleaved tracks EarlierHall (1983) also suggested a technique to address the sameproblem I encourage further research into the develop-ment of a non-stationarity statistic that can be used to dis-card bad runs This statistic would take the sequential his-tory into account as well as the PF shape and the chi-square

The development of a ldquothrowing-outrdquo rule is the ex-treme case of the notion of weighted averaging One im-portant reason for developing good estimators for good-ness-of-fit and good estimators for threshold variance isto be able to optimally combine thresholds from repeatedexperiments I shall return to this issue in connectionwith the Wichmann and Hill (2001a) analysis of bias ingoodness-of-fit metrics

Final Thought The Method Should Depend on the Situation

Also important is that the method chosen should be ap-propriate for the task For example suppose one is usingwell-experienced observers for testing and retesting sim-ilar conditions in which one is looking for small changesin threshold or one is interested in the PF shape In suchcases one has a good estimate of the expected thresholdand the signal detection method of constant stimuli withblanks and with ratings might be best On the other handfor clinical testing in which one is highly uncertain aboutan individualrsquos threshold and criterion stability a 2AFClikelihood or staircase method (with a confidence rating)might be best

MEASURING THE PSYCHOMETRIC FUNCTION 1439

III DATA-FITTING METHODS FOR ESTIMATING THRESHOLD AND SLOPEPLUS GOODNESS-OF-FIT ESTIMATION

This final section of the paper is concerned with whatto do with the data after they have been collected Wehave here the standard Bayesian situation Given the PFit is easy to generate samples of data with confidence in-tervals But we have the inverse situation in which we aregiven the data and need to estimate properties of the PFand get confidence limits for those estimates The pair ofarticles in this special issue by Wichmann and Hill (2001a2001b) as well as the articles by King-Smith et al (1994)Treutwein (1995) and Treutwein and Strasburger (1999)do an outstanding job of introducing the issues and providemany important insights For example Emerson (1986)and King-Smith et al emphasize the advantages of meanlikelihood over maximum likelihood The speed of mod-ern computers makes it just as easy to calculate meanlikelihood as it is to calculate maximum likelihood As an-other example Treutwein and Strasburger (1999) demon-strate the power of using Bayesian priors for estimatingparameters by showing how one can estimate all four pa-rameters of Equation 1A in fewer than 100 trials This im-portant finding deserves further examination

In this section I will focus on four items First I will dis-cuss the relative merits of parametric versus nonparamet-ric methods for estimating the PF properties The paperby Miller and Ulrich (2001) provides a detailed analysis ofa nonparametric method for obtaining all the moments ofthe PF I will discuss some advantages and limitations oftheir method Second is the question of goodness-of-fitWichmann and Hill (2001a) show that when the numberof trials is small the goodness-of-fit can be very biasedI will examine the cause of that bias Third is the issue dis-cussed by Wichmann and Hill (2001a) that of how lapseseffect slope estimates This topic is relevant to Strasburg-errsquos (2001b) data and analysis Finally there is the possiblycontroversial topic of an upward slope bias in adaptivemethods that is relevant to the papers of Kaernbach (2001b)and Strasburger (2001b)

Parametric Versus Nonparametric FitsThe main split between methods for estimating pa-

rameters of the psychometric function is the distinctionbetween parametric and nonparametric methods In aparametric method one uses a PF that involves severalparameters and uses the data to estimate the parametersFor example one might know the shape of the PF and es-timate just the threshold parameter McKee et al (1985)show how slope uncertainty affects threshold uncertaintyin probit analysis The two papers by Wichmann and Hill(2001a 2001b) are an excellent introduction to many is-sues in parametric fitting An example of a nonparamet-ric method for estimating threshold would be to averagethe levels of a Brownian staircase after excluding the firstfour or so reversals in order to avoid the starting level biasBefore I served as guest editor of this special symposium

issue I believed that if one had prior knowledge of the exactPF shape a parametric method for estimating threshold wasprobably better than nonparametric methods After read-ing the articles presented here as well as carrying out myrecent simulations I no longer believe this I think thatin the proper situation nonparametric methods are as goodas parametric methods Furthermore there are situationsin which the shape of the PF is unknown and nonparamet-ric methods can be better

Miller and Ulrichrsquos Article on the Nonparametric SpearmanndashKaumlrber Method of Analysis

The standard way to extract information from psycho-metric data is to assume a functional shape such as a Wei-bull cumulative normal logit dcent and then vary the pa-rameters until likelihood is maximized or chi-square isminimized Miller and Ulrich (2001) propose using theSpearmanndashKaumlrber (SK) nonparametric method whichdoes not require a functional shape They start with a prob-ability density function defined by them as the derivativeof the PF

(33)

where s(i) is the stimulus intensity at the ith level andP(i) is the probability correct at the that level The notationpdf(i) is meant to indicate that it is the pdf for the intervalbetween i ndash 1 and i Miller and Ulrich give the r th momentof this distribution as their Equation 3

(34)

The next four equations are my attempt to clarify their ap-proach I do this because when applicable it is a finemethod that deserves more attention Equation 34 prob-ably came from the mean value theorem of calculus

(35)

where f (s) 5 sr11 and f cent(smid) 5 (r 1 1)smidr is the deriv-

ative of f with respect to s somewhere within the intervalbetween s(i 1) and s(i) For a slowly varying functionsmid would be near the midpoint Given Equation35 Equa-tion 34 becomes

(36)

where Ds 5 s(i) s(i 1) Equation 36 is what one wouldmean by the rth moment of the distribution The case r 50 is important since it gives the area under the pdf

(37)

by using the definition in Equation 33 The summation inEquation 37 gives the area under the pdf For it to be aproper pdf its area must be unity which means that theprobability at the highest level P(imax) must be unity andthe probability at the lowest level Pmin must be zero asMiller and Ulrich (2001) point out

m0 = =aringpdf( ) ( ) ( ) max mini s P i P ii

D

mrr

i

i s s= aring pdf mid( ) D

f s i f s i f s s i s i( ) ( ) ( ) ( ) ( ) [ ] [ ] = cent [ ]1 1mid

mrr r

ri s i s i=

+ [ ]aring + +11

11 1pdf( ) ( ) ( )

pdf( )( ) ( )

( ) ( )i

P i P i

s i s i= 1

1

1440 KLEIN

The next simplest SK moment is for r 5 1

(38)

from Equation 33 This first moment provides an esti-mate of the centroid of the distribution corresponding tothe center of the psychometric function It could be usedas an estimate of threshold for detection or PSE for dis-crimination An alternate estimator would be the medianof the pdf corresponding to the 50 point of the PF (theproper definition of PSE for discrimination)

Miller and Ulrich (2001) use Monte Carlo simulationsto claim that the SK method does a better job of extract-ing properties of the psychometric function than do para-metric methods One must be a bit careful here since intheir parametric fitting they generate the data by usingone PF and then fit it with other PFs But still it is a sur-prising claim if this simple method is so good why havepeople not been using it for years There are two possibleanswers to this question One is that people never thoughtof using it before Miller and Ulrich The other possibil-ity is that it has problems I have concluded that the an-swer is a combination of both factors I think that there isan important place for the SK approach but its limitationsmust be understood That is my next topic

The most important limitation of the SK nonparamet-ric method is that it has only been shown to work well formethod of constant stimulus data in which the PF goesfrom 0 to 1 I question whether it can be applied with thesame confidence to 2AFC and to adaptive procedures withunequal numbers of trials at each level The reason for thislimitation is that the SK method does not offer a simpleway to weight the different samples Let us consider twosituations with unequal weighting adaptive methods and2AFC detection tasks

Adaptive methods place trials near threshold but withvery uneven distribution of number of trials at differentlevels The SK method gives equal weighting to sparselypopulated levels that can be highly variable thus con-tributing to a greater variance of parameter estimates Letus illustrate this point for an extreme case with five levelss 5 [1 2 3 4 5] Suppose the adaptive method placed [11 1 98 1] trials at the five levels and the number correctwere [0 1 1 49 1] giving probabilities [0 1 1 5 1] andpdf 5 [1 0 5 5] The following PEST-like (Taylor ampCreelman 1967) rules could lead to this unusual dataMove down a level if the presently tested level has greaterthan P1 correct move up three levels if the presentlytested level has less than P2 correct P1 can be anynumber less than 33 and P2 can be any number greaterthan 67 The example data would be produced by start-ing at Level 5 and getting four in a row correct followedby an error followed by a wide set possible of alternat-ing responses for which the PEST rules would leave thefourth level as the level to be tested The SK estimate of the

center of the PF is given by Equation 38 to be micro1 5 200Since a pdf with a negative probability is distasteful tosome one can monotonize the PF according to the proce-dure of Miller and Ulrich (which does include the numberof trials at each level) The new probabilities are [0 5151 51 1] with pdf 5 [51 0 0 49] The new estimate ofthreshold is micro1 5 297 However given the data with 4998correct at Level 4 an estimate that includes a weightingaccording to the number of trials at each level would placethe centroid of the distribution very close to micro1 5 4 Thuswe see that because the SK procedure ignores the numberof trials at each level it produces erroneous estimates ofthe PF properties This example with shifting thresholdswas chosen to dramatize the problems of unequal weight-ing and the effect of monotonizing

A similar problem can occur even with the method ofconstant stimuli with equal number of trials at each levelbut with unequal binomial weighting as occurs in 2AFCdata The unequal weighting occurs because of the 50lower asymptote The binomial statistics error bar of dataat 55 correct is much larger than for data at 95 cor-rect Parametric procedures such as probit analysis giveless weighting to the lower data This is why the most ef-ficient placement of trials in 2AFC is above 90 correctas I have discussed in Section II Thus it is expected thatfor 2AFC the SK method would give threshold estimatesless precise than the estimates given by a parametricmethod that includes weighting

I originally thought that the SK method would fail on alltypes of 2AFC data However in the discussion followingEquation 7 I pointed out that for 2AFC discriminationtasks there is a way of extending the data to the 0ndash100range that produces weightings similar to those for a yesnotask Rather than use the standard Equation 2 to deal withthe 50 lower asymptote one should use the method dis-cussed in connection with Figure 3 in which one can usefor the ordinate the percent of correct responses in Inter-val 2 and use the signal strength in Interval 2 minus thatin Interval 1 for the abscissa That procedure produces aPF that goes from 0 to 100 thereby making it suit-able for SK analysis while also removing a possible bias

Now let us examine the SK method as applied to datafor which it is expected to work such as discriminationPFs that go from 0 to 100 These are tasks in which thelocation of the psychometric function is the PSE and theslope is the threshold

A potentially important claim by Miller and Ulrich(2001) is that the SK method performs better than probitanalysis This is a surprising claim since their data aregenerated by a probit PF one would expect probit analy-sis to do well especially since probit analysis does espe-cially well with discrimination PFs that go from 0 to 100To help clarify this issue Miller and I did a number ofsimulations We focused on the case of N 5 40 with 8 tri-als at each of 5 levels The cumulative normal PF (Equa-tion 14) was used with equally spaced z scores and with thelowest and highest probabilities being 1 and 99 Theprobabilities at the sample points are P(i) 5 1 1222

m1

2 21

2

11

2

=

= [ ] +

aring

aring

pdf( )( ) ( )

( ) ( )( ) ( )

is i s i

P i P is i s i

MEASURING THE PSYCHOMETRIC FUNCTION 1441

50 8778 and 99 corresponding to z scores ofz(i) 5 ndash2328 1164 0 1164 and 2328 Nonmonot-onicity is eliminated as Miller and Ulrich suggest (seethe Appendix for Matlab Code 4 for monotonizing) I didMonte Carlo simulations to obtain the standard deviationof the location of the PF micro1 given by Equation38 I foundthat both the SK method and the parametric probit method(assuming a fixed unity slope) gave a standard deviationof between 028 and 029 Miller and Ulrichrsquos result re-ported in their Table 17 gives a standard error of 0071However their Gaussian had a standard deviation of 025whereas my z-score units correspond to a standard devi-ation of unity Thus their standard error must be multipliedby 4 giving a standard error of 0284 in excellent agree-ment with my value So at least in this instance the SKmethod did not do better than the parametric method (withslope fixed) and the surprise mentioned at the beginningof this paragraph did not materialize I did not examineother examples where nonparametric methods might dobetter than the matched parametric method However thesimulations of McKee et al (1985) give some insight intowhy probit analysis with floating slope can do poorly evenon data generated from probit PFs McKee et al showthat when the number of trials is low and if stimulus lev-els near both 0 and 100 are not tested the slope can bevery flat and the predicted threshold can be quite distantIf care is not taken to place a bound on these probit esti-mates one would expect to find deviant threshold esti-mates The SK method on the other hand has built-inbounds for threshold estimates so outliers are avoidedThese bounds could explain why SK can do better thanprobit When the PF slope is uncertain nonparametricmethods might work better than parametric methods adecided advantage for the SK method

It is instructive to also do an analytic calculation of thevariance based on an analysis similar to what was doneearlier If only the ith level of the five had been testedthe variance of the PF location (the PSE in a discrimina-tion task) would have been as follows (Gourevitch ampGalanter 1967)

(39)

where var(P) is given by Equation 27 and PF slope is

(40)

where for the cumulative normal dzi dxi 5 1s where sis the standard deviation of the pdf and

(41)

from Equation 3A By combining these equations one gets

(42)

Note that if all trials were placed at the optimal stimuluslocation of z 5 0 (P 5 5) Equation 42 would becomes2p 2Ni as discussed earlier When all five levels arecombined the total variance is smaller than each individ-ual variance as given by the variance summation formula

(43)

Finally upon plugging in the five values of zi rangingfrom ndash2328 to 12328 and Pi ranging from 1 to 99the standard error of the PF location is

(44)

This value is precisely the value obtained in our MonteCarlo simulations of the nonparametric SK method andalso by the slope fixed parametric probit analysis Thistriple confirmation shows that the SK method is excel-lent in the realm where it applies (equal number of trialsat levels that span 0 to 100) It also confirms our sim-ple analytic formula

The data presented in Miller and Ulrichrsquos Table 17 giveus a fine opportunity to ask about the efficiency of themethod of constant stimuli that they use For the N 5 40example discussed here the variance is

(45)

This value is more than twice the variance of p (2 Ntot)that is obtained for the same PF using adaptive methodsincluding the simple upndashdown staircase that place trialsnear the optimal point

The large variance for the Miller and Ulrich stimulusplacement is not surprising given that of the five stimu-lus levels the two at P 5 1 and 99 are quite inefficientThe inefficient placement of trials is a common problemfor the method of constant stimuli as opposed to adaptivemethods However as I mentioned earlier the method ofconstant stimuli combined with multiple response ratingsin a signal detection framework may be the best of allmethods for detection studies for revealing properties of thesystem near threshold while maintaining a high efficiency

A curious question comes up about the optimal placementof trials for measuring the location of the psychometric func-tion With the original function the optimal placement is atthe steepest part of the function (at P 5 0) However withthe SK method when one is determining the location ofthe pdf one might think that the optimal placement of sam-ples should occur at the points where the pdf is steepestThis contradiction is resolved when one realizes that onlythe original samples are independent The pdf samples areobtained by taking differences of adjacent PF values soneighboring samples are anticorrelated Thus one cannot

var tottot

= =SEN

2 3 24

SE = =var tot12 0 2844

var var tot = aring1 1i

var( )exp

PSE i i i

i

i

P Pz

N= ( ) ( )

2 12

2

ps

dP

dz

zi

i

iaeligegraveccedil

oumloslashdivide

=( )2 2

2

exp

p

dP

dx

dP

dz

dz

dxi

i

i

i

i

i

= times

var( )var

PSE i

i

i

i

P

dP

dx

=( )

aeligegraveccedil

oumloslashdivide

2

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

1424 KLEIN

man 1991) for anyone interested in the signal detectionapproach to psychophysical experiments (including theeffect of nonunity ROC slopes)

The z-Score Correction for Bias and Signal Detection Theory 2AFC

One might have thought that for 2AFC the connectionbetween the PF and dcent is well establishedmdashnamely (Greenamp Swets 1966)

(6)

where z(x) is the z score of the average of P1(x) for cor-rect judgments in Interval 1 and P2(x) for correct judg-ments in Interval 2 Typically this average P(x) is calcu-

lated by dividing the total number of correct trials by thetotal number of trials It is generally assumed that the2AFC procedure eliminates the effect of response bias onthreshold However in this section I will argue that the dcentas defined in Equation 6 is affected by an interval biaswhen one interval is selected more than the other

I have been thinking a lot about response bias in 2AFCtasks because of my recent experience as a subject in atemporal 2AFC contrast discrimination study where con-trast is defined as the change in luminance divided bythe background luminance In these experiments I noticedthat I had a strong tendency to choose the second intervalmore than the first The second interval typically appearssubjectively to be about 5 higher in contrast than it re-ally is Whether this is a perceptual effect because the in-

cent =d x z x( ) ( ) 2

0 1 2 30

05

1(a)

Wei

bull

with

bet

a =

2

0 1 2 32

0

2

4(b)

z-sc

ore

of W

eibu

ll (d

cent 1

)

0 1 2 305

1

15

2(c)

logl

og s

lope

of d

cent

stimulus in threshold units

2 1 0 10

05

1(d)

2 1 0 12

0

2

4(e)

2 1 0 105

1

15

2

stimulus in natural log units

(f)

dcent o

f Wei

bull

1

3

0

z-sc

ore

of W

eibu

ll (d

cent 1

)lo

glog

slo

pe o

f dcent

Wei

bull

with

bet

a =

2

dcent o

f Wei

bull

1

3

0

Figure 1 Six views of the Weibull function Pweibull 5 1 (1 g )exp( x tb ) where g 5 01587 b 5 2 and xt is the stimulus strength in

threshold units Panels andashc have a linear abscissa with xt 5 1 being the threshold Panels dndashf have a natural log abscissa with yt 5 0being the threshold In panels a and d the ordinate is probability The asterisk in panel d at yt 5 0 is the point of maximum slope on alogarithmic abscissa In panels b and e the ordinate is the z score of panels a and d The lower asymptote is z 5 1 If the ordinate isredefined so that the origin is at the lower asymptote the new ordinate shown on the right of panels b and e is dcent(xt) 5 z(xt) z(0) cor-responding to the signal detection dcent for an objective yesno task In panels c and f the ordinate is the logndashlog slope of dcent At xt 5 0 thelogndashlog slope 5 b The logndashlog slope falls rapidly as the stimulus strength approaches threshold The Matlab program that generated thisfigure is Appendix Code 1

MEASURING THE PSYCHOMETRIC FUNCTION 1425

tervals are too close together in time (800 msec) or a cog-nitive effect does not matter for the present article Whatdoes matter is that this bias produces a downward bias indcent With feedback lots of practice and lots of experiencebeing a subject I was able to reduce this interval bias andequalize the number of times I responded with each in-terval Naiumlve subjects may have a more difficult time re-ducing the bias In this section I show that there is a verysimple method for removing the interval bias by con-verting the 2AFC data to a PF that goes from 0 to 100

The recognition of bias in 2AFC is not new Green andSwets (1966) in their Appendix III34 point out that thebias in choice of interval does result in a downward biasin dcent However they imply that the effect of this bias issmall and can typically be ignored I should like to ques-tion that implication by using one of their own examplesto show that the bias can be substantial

In a run of 200 trials the Green and Swets example(Green amp Swets 1966 p 410) has 95 out of 100 correctwhen the test stimulus is in the second interval (z2 51645) and 50 out of 100 correct (z1 5 0) when the stim-ulus is in the first interval The standard 2AFC way to an-alyze these data would be to average the probabilities(95 1 50) 2 5 725 correct (zcorrect 5 0598) cor-responding to dcent 5 zIuml2 5 0845 However Green andSwets (p 410) point out that according to signal detec-tion theory one should analyze this data by averaging thez scores rather than averaging the probabilities or

(7)

The ratio between these two ways of calculating dcent is11630845 5 1376 Since dcent is approximately linearlyrelated to signal strength in discrimination tasks this 38reduction in dcent corresponds to an erroneous 38 increasein predicted contrast discrimination threshold when onecalculates threshold the standard way Note that if therehad been no bias so that the responses would be approx-imately equally divided across the two intervals then z2z1 and Equation 7 would be identical to the more familiarEquation 6 Since bias is fairly common especially amongnew observers the use of Equation 7 to calculate dcent seemsmuch more reasonable than using Equation 6 It is surpris-ing that the bias correction in Equation 7 is rarely used

Green and Swets (1966) present a different analysis In-stead of comparing dcent values for the biased versus nonbi-ased conditions they convert the dcents back to percent cor-rect The corrected percent correct (corresponding to dcent 51163) is 795 In terms of percent correct the bias seemsto be a small effect shifting percent correct a mere 7 from725 to 795 However the dcent ratio of 138 is a bettermeasure of the bias since it is directly related to the errorin discrimination threshold estimates

A further comment on the magnitude of the bias may beuseful The preceding subsection discussed the criterionbias of yesno methods which contributes linearly to dcent

(Equation 5) The present section discusses the 2AFC in-terval bias that contributes quadratically to dcent Thus forsmall amounts of bias the decrease in dcent is negligibleHowever as I have pointed out the bias can be large enoughto make a significant contribution to dcent

It is instructive to view the bias correction for the fullPF corresponding to this example Cumulative normalPFs are shown in the upper left panel of Figure 2 Thecurves labeled C1 and C2 are the probability correct forthe first and second intervals I1 and I2 The asterisks cor-respond to the stimulus strength used in this example at50 and 95 correct The dotndashdashed line is the aver-age of the two PFs for the individual intervals The slopeof the PF is set by the dotndashdashed line (the averageddata) being at 50 (the lower asymptote) for zero stim-ulus strength The lower left panel is the z-score version ofthe upper panel The dashed line is the average of the twosolid lines for z scores in I1 and I2 This is the signal de-tection method of averaging that Green and Swets (1966)present as the proper way to do the averaging The dotndashdashed line shown in the upper left panel is the z score ofthe average probability Notice that the z score for the av-eraged probability is lower than the averaged z score in-dicating a downward bias in dcent due to the interval bias asdiscussed at the beginning of this section The right pairof panels are the same as the left pair except that insteadof plotting C1 we plot 1 C1 the probability of re-sponding I2 incorrectly The dashed line is half the dif-ference between the two solid lines The other differenceis that in the lower right panel we have multiplied thedashed and dashndashdotted lines by Iuml2 so that these linesare dcent values rather than z scores

The final step in dealing with the 2AFC bias is to flipthe 1 C1 curve horizontally to negative abscissa valuesas in Figure 3 The ordinate is still the probability correctin interval 2 The abscissa becomes the difference instimulus strengths between I2 and I1 The flipped branchis the probability of responding I2 when the I2 stimulusstrength is less than that of I1 (an incorrect response)Figure 3a with the ordinate going from 0 to 100 is theproper way to represent 2AFC discrimination data Theother item that I have changed is the scale on the abscissato show what might happen in a real experiment The or-dinate values of 50 and 95 for the Green and Swets(1966) example have been placed at a contrast differenceof 5 The negative 5 value corresponds to the case inwhich the positive test pattern is in the f irst intervalThreshold corresponds to the inverse of the PF slopeThe bottom panel shows the standard signal detectionrepresentation of the signal in I1 and I2 dcent is the distancebetween these symmetric stimuli in standard deviationunits The Gaussians are centered at 65 The verticalline at 5 is the criterion such that 50 and 95 ofthe area of the two Gaussians is above the criterion Thez-score difference of 1645 between the two Gaussiansmust be divided by Iuml2 to get dcent because each trial hadtwo stimulus presentations with independent informa-

cent =+( )

= =dz z

22

1 6452

1 1632 1

1426 KLEIN

tion for the judgment This procedure identical to Equa-tion 7 gives the same dcent as before

Three Distinctions for Clarifying PFsIn dealing with PFs three distinctions need to be made

yesno versus forced choice detection versus discrimi-nation and constant stimuli versus adaptive methodsThese distinctions are usually clear but I should like topoint out some subtleties

For the forced choice versus yesno distinction thereare two sorts of forced choice tasks The standard versionhas multiple intervals separated spatially or temporallyand the stimulus is in only one of the intervals In the otherversion one of N stimuli is shown and the observer re-

sponds with a number from 1 to N For example Strasburg-er (2001b) presented 1 of 10 letters to the observer in a10AFC task Yesno tasks have some similarity to the lattertype of forced choice task Consider for example a detec-tion experiment in which one of five contrasts (includinga blank) are presented to the observer and the observer re-sponds with numbers from 1 to 5 This would be classifiedas a rating scale method of constant stimuli yesno tasksince only a single stimulus is presented and the rating isbased on a one-dimensional intensity

The detectiondiscrimination distinction is usually basedon whether the reference stimulus is a natural zero pointFor example suppose the task is to detect a high spatial fre-quency test pattern added to a spatially identical reference

0 1 2 30

2

4

6

8

1

C1C2

average

(a)

prob

abili

ty c

orre

ct

0 1 2 30

2

4

6

8

1

1 C1

C2

diff2

(b)

prob

abili

ty o

f I2

resp

onse

0 1 2 31

0

1

2

3

4(c)

C1

C2

z-average

stimulus strength

z-sc

ore

corr

ect

0 1 2 31

0

1

2

3

4

1 C1

C2

biased d cent

(d)

unbiased d cent

stimulus strength

z-sc

ore

for

I2 r

espo

nse

average prob

Figure 2 2AFC psychometric functions with a strong bias in favor of responding Interval 2 (I2) The bias is chosen from aspecific example presented by Green and Swets (1966) such that the observer has 50 and 95 correct when the test is in I1and I2 respectively These points are marked by asterisks The psychometric function being plotted is a cumulative normal Inall panels the abscissa is xt the stimulus strength (a) The psychometric functions for probability correct in I1 and I2 are shownand labeled C1 and C2 The average of the two probabilities labeled average is the dotndashdashed line it is the curve that is usu-ally reported The diamond is at 725 the average percent correct of the two asterisks (b) Same as panel a except that insteadof showing C1 we show 1ndashC1 the probability of saying I2 when the test was in I1 The abscissa is now labeled ldquoprobability ofI2 responserdquo (c) z scores of the three probabilities in panel a An additional dashed line is shown that is the average of the C1and C2 z-score curves The diamond is the z-score of the diamond in panel a and the star is the z-score average of the two panel casterisks (d) The sign of the C2 curve in panel c is flipped to correspond to panel b The dashed and dotndashdashed lines of panel chave been multiplied by Iuml2 in panel d so that they become dcent

MEASURING THE PSYCHOMETRIC FUNCTION 1427

pattern If the reference pattern has zero contrast the taskis detection If the reference pattern has a high contrast thetask is discrimination Klein (1985) discusses these tasks interms of monopolar and bipolar cues For discriminationa bipolar cue must be available whereby the test pattern canbe either positive or negative in relation to the reference Ifone cannot discriminate the negative cue from the positivethen it can be called a detection task

Finally the constant stimuli versus adaptive method dis-tinction is based on the formerrsquos having preassigned testlevels and the latterrsquos having levels that shift to a desiredplacement The output of the constant stimulus method is

a full PF and is thus fully entitled to be included in this spe-cial issue The output of adaptive methods is typically onlya single number the threshold specifying the location butnot the shape of the PF Two of the papers in this issue(Kaernbach 2001b Strasburger 2001b) explore the pos-sibility of also extracting the slope from adaptive data thatconcentrates trials around one level Even though adaptivemethods do not measure much about the PF they are sopopular that they are well represented in this special issue

The 10 rows of Table 1 present the articles in this specialissue (including the present article) Columns 2ndash5 corre-spond to the four categories associated with the first two

Figure 3 2AFC discrimination PF from Figure 2 has been extended to the 0 to 100 range without rescaling the or-dinate In panels a and b the two right-hand panels of Figure 2 have been modified by flipping the sign of the abscissa ofthe negative slope branch where the test is in Interval 1 (I1) The new abscissa is now the stimulus strength in I2 minus thestrength in I1 The abscissa scaling has been modified to be in stimulus units In this example the test stimulus has a con-trast of 5 The ordinate in panel a is the probability that the response is I2 The ordinate in panel b is the z score of thepanel a ordinate The asterisks are at the same points as in Figure 2 Panel c is the standard signal detection picture whennoise is added to the signal The abscissa has been modified to being the activity in I2 minus the activity in I1 The unitsof activation have arbitrarily been chosen to be the same as the units of panels a and b Activity distributions are shownfor stimuli of 5 and 15 units corresponding to the asterisks of panels a and b The subjectrsquos criterion is at 5 units ofactivation The upper abscissa is in z-score units where the Gaussians have unit variance The area under the two distri-butions above the criterion are 50 and 95 in agreement with the probabilities shown in panel a

15 10 5 0 5 10 150

5

1

prob

abili

ty o

f res

pons

e =

2

(a)

15 10 5 0 5 10 152

0

2

4

z-sc

ore

of r

espo

nse

= 2

(b)

contrast of I2 minus contrast of I1

15 10 5 0 5 10 150

05

1

activity in I2 minus activity in I1

responsewhen test in I1

(c)

response when test in I2

00822 0822z-score of activity in I2 minus activity in I1 (unit variance Gaussians)

d cent = 1645

crite

rion

16451645

resp

onse

1428 KLEIN

distinctions The last column classifies the articles accord-ing to the adaptive versus constant stimuli distinction Inorder to open up the full variety of PFs for discussion andto enable a deeper understanding of the relationship amongdifferent types of PFs I will now clarify the interrelation-ship of the various categories of PFs and their connectionto the signal detection approach

Lack of adaptive method for yesno tasks with con-trolled false alarm rate In Section II I will bring up anumber of advantages of the yesno task in comparisonwith to the 2AFC task (see also Kaernbach 1990) Giventhose yesno advantages one may wonder why the 2AFCmethod is so popular The usual answer is that 2AFC hasno response bias However as has been discussed the yesno method allows an unbiased dcent to be calculated and the2AFC method does allow an interval bias that affects dcentAnother reason for the prevalence of 2AFC experimentsis that a multitude of adaptive methods are available for2AFC but barely any available for an objective yesnotask in which the false alarm rate is measured so that dcentcan be calculated In Table 1 with one exception the rowswith adaptive methods are associated with the forcedchoice method The exception is Leek (2001) who dis-cusses adaptive yesno methods in which no blank trialsare presented In that case the false alarm rate is notmeasured so dcent cannot be calculated This type of yesnomethod does not belong in the ldquoobjectiverdquo category of con-cern for the present paper The ldquo1990rdquo entries in Table 1refer to Kaernbachrsquos (1990) description of a staircasemethod for an objective yesno task in which an equalnumber of blanks are intermixed with the signal trialsSince Kaernbach could have used that method for Kaern-bach (2001b) I placed the ldquo1990rdquo entry in his slot

Kaernbachrsquos (1990) yesno staircase rules are simpleThe signal level changes according to a rule such as thefollowing Move down one level for correct responses ahit ltyes|signalgt or a correct rejection ltno|blankgt moveup three levels for wrong responses a miss ltno|signalgtor a false alarm ltyes|blankgt This rule similar to the onedownndashthree up rule used in 2AFC places the trials so thatthe average of the hit rate and correct rejection rate is 75

What is now needed is a mechanism to get the observer toestablish an optimal criterion that equalizes the number ofldquoyesrdquo and ldquonordquo responses (the ROC negative diagonal)This situation is identical to the problem of getting 2AFCobservers to equalize the number of responses to Inter-vals 1 and 2 The quadratic bias in dcent is the same in bothcases The simplest way to get subjects to equalize theirresponses is to give them feedback about any bias in theirresponses With equal ldquoyesrdquo and ldquonordquo responses the 75correct corresponds to a z score of 0674 and a dcent 5 2z 51349 If the subject does not have equal numbers of ldquoyesrdquoand ldquonordquo responses then the dcent would be calculated bydcent 5 zhit zfalse alarm I do hope that Kaernbachrsquos clever yesno objective staircase will be explored by others Oneshould be able to enhance it with ratings and multiple stim-uli (Klein 2002)

Forced choice detection As can be seen in the secondcolumn of Table 1 a popular category in this specialissue is the forced choice method for detection Thismethod is used by Strasburger (2001b) Kaernbach(2001a) Linschoten et al (2001) and Wichmann and Hill(2001a 2001b) These researchers use PFs based on prob-ability ordinates The connection between P and dcent is dif-ferent from the yesno case given by Equation 5 For2AFC signal detection theory provides a simple connec-tion between dcent and probability correct dcent(x) 5 Iuml2 z(x)In the preceding section I discussed the option of averag-ing the probabilities and then taking the z score (highthreshold approach) or averaging the z scores and then cal-culating the probability (signal detection approach) Thesignal detection method is better because it has a strongerempirical basis and avoids bias

For an m-AFC task with m 2 the connection betweendcent and P is more complicated than it is for m 5 2 The con-nection given by a fairly simple integral (Green amp Swets1966 Macmillan amp Creelman 1991) has been tabulatedby Hacker and Ratcliff (1979) and by Macmillan and Creel-man (1991) One problem with these tables is that they arebased on the assumption of unity ROC slope It is knownthat there are many cases in which the ROC slope is notunity (Green amp Swets 1966) so these tables connecting

Table 1Classification of the 10 Articles in This Special Issue

According to Three Distinctions Forced Choice Versus YesNo Detection Versus Discrimination Adaptive Versus Constant Stimuli

Detection Discrimination Adaptive or Source m-AFC YesNo m-AFC YesNo Constant Stimuli

Leek (2001) General loose g 3 loose g AWichmann amp Hill (2001a) 2AFC (3) (3) (3) CWichmann amp Hill (2001b) 2AFC (3) (3) (3) CLinschoten et al (2001) 2AFC AStrasburger (2001a) General 3 3 CStrasburger (2001b) 10AFC AKaernbach (2001a) General 3 AKaernbach (2001b) General (1990) 3 (1990) AMiller amp Ulrich (2001) for g 5 0 3 (for 0ndash100) 3 CKlein (present) General 3 3 3 Both

MEASURING THE PSYCHOMETRIC FUNCTION 1429

dcent and probability correct should be treated cautiouslyW P Banks and I (unpublished) investigated this topic andfound that near the P 5 50 correct point the dependenceof dcent on ROC slope is minimal Away from the 50 pointthe dependence can be strong

YesNo detection Linschoten et al (2001) comparethree methods (limits staircase 2AFC likelihood) for mea-suring thresholds with a small number of trials In themethod of limits one starts with a subthreshold stimulusand gradually increases the strength On each trial the ob-server says ldquoyesrdquo or ldquonordquo with respect to whether or not thestimulus is detected Although this is a classic yesno de-tection method it will not be discussed in this article be-cause there is no control of response bias That is blankswere not intermixed with signals

In Table 1 the Wichmann and Hill (2001a 2001b) arti-cles are marked with 3s in parentheses because althoughthese authors write only about 2AFC they mention that alltheir methods are equally applicable for yesno or m-AFCof any type (detectiondiscrimination) And their Matlabimplementations are fully general All the special issue ar-ticles reporting experimental data used a forced choicemethod This bias in favor of the forced choice methodol-ogy is found not only in this special issue it is widespreadin the psychophysics community Given the advantages ofthe yesno method (see discussion in Section II) I hopethat once yesno adaptive methods are accepted they willbecome the method of choice

m-AFC discrimination It is common practice to rep-resent 2AFC results as a plot of percent correct averagedover all trials versus stimulus strength This PF goes from50 to 100 The asymmetry between the lower andupper asymptotes introduces some inefficiency in thresh-old estimation as will be discussed Another problem withthe standard 50 to 100 plot is that a bias in choice ofinterval will produce an underestimate of dcent as has beenshown earlier Researchers often use the 2AFC method be-cause they believe that it avoids biased threshold estimatesIt is therefore surprising that the relatively simple correc-tion for 2AFC bias discussed earlier is rarely done

The issue of bias in m-AFC also occurs when m 2 InStrasburgerrsquos 10AFC letter discrimination task it is com-mon for subjects to have biases for responding with par-ticular letters when guessing Any imbalance in the re-sponse bias for different letters will result in a reductionof dcent as it did in 2AFC

There are several benefits of viewing the 2AFC dis-crimination data in terms of a PF going from 0 to 100Not only does it provide a simple way of viewing and cal-culating the interval bias it also enables new methods forestimating the PF parameters such as those proposed byMiller and Ulrich (2001) as will be discussed in Sec-tion III In my original comments on the Miller and Ulrichpaper I pointed out that because their nonparametric pro-cedure has uniform weighting of the different PF levelstheir method does not apply to 2AFC tasks The asymmet-ric binomial error bars near the 50 and 100 levelscause the uniform weighting of the Miller and Ulrich ap-

proach to be nonoptimal However I now realize that the2AFC discrimination task can be fit by a cumulative nor-mal going from 0 to 100 Because of that insight Ihave marked the discrimination forced choice column ofTable 1 for the Miller and Ulrich (2001) paper with theproviso that the PF goes from 0 to 100 Owing to thepopularity of 2AFC this modification greatly expands therelevance of their nonparametric approach

YesNo discrimination Kaernbach (2001b) and Millerand Ulrich (2001) offer theoretical articles that examineproperties of PFs that go from P 5 0 to 100 (P 5 pin Equation 1) In both cases the PF is the cumulative nor-mal (Equation 3) Although these PFs with g 5 0 couldbe for a yesno detection task with a zero false alarm rate(not plausible) or a forced choice detection task with an in-finite number of alternatives (not plausible either) I sus-pect that the authors had in mind a yesno discriminationtask (Table 1 col 5) A typical discrimination task in vi-sion is contrast discrimination in which the observer re-sponds to whether the presented contrast is greater than orless than a memorized reference Feedback reinforces thestability of the reference In a typical discrimination taskthe reference is one exemplar from a continuum of stimu-lus strengths If the reference is at a special zero pointrather than being an element of a smooth continuum thetask is no longer a simple discrimination task Zero con-trast would be an example of a special reference Klein(1985) discusses several examples which illustrate how anatural zero can complicate the analysis One might won-der how to connect the PF from the detection regime inwhich the reference is zero contrast to the discriminationregime in which the reference (pedestal) is at a high con-trast The dcent function to be introduced in Equation 20 doesa reasonably good job of fitting data across the full range ofpedestal strength going from detection to discrimination

The connection of the discrimination PF to the signaldetection PF is the same as that for yesno detection givenin Equation 5 dcent(x) 5 z(x) z(0) where z is the z scoreof the probability of a ldquogreater thanrdquo judgment In a dis-crimination task the bias z(0) is the z score of the prob-ability of saying ldquogreaterrdquo when the reference stimulus ispresented The stimulus strength x that gives z(x) 5 0 isthe point of subjective equality (x 5 PSE) If the cumu-lative normal PF of Equation 3 is used then the z scoreis linearly proportional to the stimulus strength z(x) 5(x PSE)threshold where threshold is defined to be thepoint at which dcent 5 1 I will come back to these distinc-tions between detection and discrimination PFs after pre-senting more groundwork regarding thresholds log ab-scissas and PF shapes (see Equations 14 and 17)

Definition of ThresholdThreshold is often defined as the stimulus strength that

produces a probability correct halfway up the PF If humansoperated according to a high-threshold assumption thisdefinition of threshold would be stable across different ex-perimental methods However as I discussed followingEquation 2 high-threshold theory has been discredited

1430 KLEIN

According to the more successful signal detection theory(Green amp Swets 1966) the dcent at the midpoint of the PFchanges according to the number of alternatives in aforced choice method and according to the false alarm ratein a yesno method This variability of dcent with method is agood reason not to define threshold as the halfway pointof the PF

A definition of threshold that is relatively independent ofthe method used for its measurement is to define thresholdas the stimulus strength that gives a fixed value of dcent Thestimulus strength that gives dcent 5 1 (76 correct for 2AFC)is a common definition of threshold Although I will showthat higher dcent levels have the advantage of giving more pre-cise threshold estimates unless otherwise stated I will takethreshold to be at dcent 5 1 for simplicity This definition ap-plies to both yesno and m-AFC tasks and to both detectionand discrimination tasks

As an example consider the case shown in Figure 1where the lower asymptote (false alarm rate) in a yesnodetection task is g 5 1587 corresponding to a z score ofz 5 ndash1 If threshold is defined to be at dcent 5 10 then fromEquation 5 the z score for threshold is z 5 0 correspond-ing to a hit rate of 50 (not quite halfway up the PF) Thisexample with a 50 hit rate corresponds to defining dcentalong the horizontal ROC axis If threshold had been de-fined to be dcent 5 2 in Figure 1 then the probability correctat threshold would be 8413 This example in which boththe hit rate and correct rejection rate are equal (both are8413) corresponds to the ROC negative diagonal

Strasburgerrsquos Suggestion on Specifying Slope A Logarithmic Abscissa

In many of the articles in this special issue a logarithmicabscissa such as decibels is used Many shapes of PFs havebeen used with a log abscissa Most delightful but frus-trating is Strasburgerrsquos (2001a) paper on the PF maximumslope He compares the Weibull logistic Quick cumula-tive normal hyperbolic tangent and signal detection dcentusing a logarithmic abscissa The present section is the out-come of my struggles with a number of issues raised byStrasburger (2001a) and my attempt to clarify them

I will typically express stimulus strength x in thresh-old units

(8)

where a is the threshold Stimulus strength will be ex-pressed in natural logarithmic units y as well as in lin-ear units x

(9)

where y 5 loge(x) is the natural log of the stimulus and Y5 loge(a) is the threshold on the log abscissa

The slope of the psychometric function P( yt) with a log-arithmic abscissa is

and the maximum slope is called bcent by Strasburger (2001a)A very different definition of slope is sometimes used bypsychophysicists the logndashlog slope of the dcent function aslope that is constant at low stimulus strengths for manyPFs The logndashlog dcent slope of the Weibull function is shownin the bottom pair of panels in Figure 1 The low-contrastlogndashlog slope is b for a Weibull PF (Equation 12) and bfor a dcent PF (Equation 20) Strasburger (2001a) shows howthe maximum slope using a probability ordinate is con-nected to the logndashlog slope using a dcent ordinate

A frustrating aspect of Strasburgerrsquos article is that theslope units of P (probability correct per loge) are not fa-miliar Then it dawned on me that there is a simple con-nection between slope with a loge abscissa slopelog 5[dP( yt)](dyt) and slope with a linear abscissa slopelin 5[dP(xt)](dxt) namely

(11)

because dyt dxt 5 [d loge(xt)] (dxt) 5 1xt At thresholdxt 5 1 ( yt 5 0) so at that point Strasburgerrsquos slope with alogarithmic axis is identical to my familiar slope plotted inthreshold units on a linear axis The simple connection be-tween slope on the log and linear abscissas converted me tobeing a strong supporter of using a natural log abscissa

The Weibull and cumulative normal psychometricfunctions To provide a background for Strasburgerrsquos ar-ticle I will discuss three PFs Weibull cumulative nor-mal and dcent as well as their close connections A com-mon parameterization of the PF is given by the Weibullfunction

(12)

where pweib(xt) the PF that goes from 0 to 100 is re-lated to Pweib(xt) the probability of a correct responseby Equation 1 b is the slope and k controls the definitionof threshold pweib(1) 5 1 k is the percent correct atthreshold (xt 5 1) One reason for the Weibullrsquos popular-ity is that it does a good job of fitting actual data In termsof logarithmic units the Weibull function (Equation 12) be-comes

(13)

Panel a of Figure 1 is a plot of Equation 12 (Weibull func-tion as a function of x on a linear abscissa) for the case b 52 and k 5 exp( 1) 5 0368 The choice b 5 2 makes theWeibull an upside-down Gaussian Panel d is the samefunction this time plotted as a function of y correspondingto a natural log abscissa With this choice of k the point ofmaximum slope as a function of yt is the threshold point( yt 5 0) The point of maximum slope at yt 5 0 is markedwith an asterisk in panel d The same point in panel a atxt 5 1 is not the point of maximum slope on a linear ab-scissa because of Equation 11 When plotted as a dcent func-tion [a z-score transform of P(xt)] in panel b the Weibullaccelerates below threshold and decelerates above thresh-

p y ktyt

weib( ) = ( )[ ]1exp

b

p x ktxt

weib ( ) = 1b

slopeslope

lin =( )

=( )

times =dP x

dx

dP y

dy

dy

dx xt

t

t

t

t

t t

log

slope y dPdy

dpdy

tt

t

( ) =

= ( )1 g

y x x y Yt t= ( ) = =log log e e a

x xt = a

(10A)

(10B)

MEASURING THE PSYCHOMETRIC FUNCTION 1431

old in agreement with a wide range of experimental dataThe acceleration and deceleration are most clearly seen bythe slope of dcent in the logndashlog coordinates of panel e Thelogndashlog dcent slope is plotted in panels c and f At xt 5 0 thelogndashlog slope is 2 corresponding to our choice of b 5 2The slope falls surprisingly rapidly as stimulus strength in-creases and the slope is near 1 at xt 5 2 If we had chosenb 5 4 for the Weibull function then the logndashlog dcent slopewould have gone from 4 at xt 5 0 to near 2 at xt 5 2Equation 20 will present a dcent function that captures thisbehavior

Another PF commonly used is based on the cumulativenormal (Equation 3)

(14A)

or

(14B)

where s is the standard error of the underlying Gaussianfunction (its connection to b will be clarified later) Notethat the cumulative normal PF in Equation 14 should beused only with a logarithmic abscissa because y goes to

yen needed for the 0 lower asymptote whereas theWeibull function can be used with both the linear (Equa-tion 12) and the log (Equation 13) abscissa

Two examples of Strasburgerrsquos maximum slope (hisEquations 7 and 17) are

(15)

for the Weibull function (Equation 13) and

(16)

for the cumulative normal (Equation 14) In Equation 15 Iset k in Equation 12 to be k 5 exp( 1) This choiceamounts to defining threshold so that the maximum slopeoccurs at threshold ( yt 5 0) Note that Equations 12ndash16are missing the (1 g) factor that are present in Strasburg-errsquos Equations 7 and 17 because we are dealing with prather than P Equations 15 and 16 provide a clear simpleand close connection between the slopes of the Weibull andcumulative normal functions so I am grateful to Stras-burger for that insight

An example might help in showing how Equation 16works Consider a 2AFC task (g 5 05) assuming a cumu-lative normal PF with s 5 10 According to Equation 16bcent 5 0399 At threshold (assumed for now to be the pointof maximum slope) P(xt 5 1) 5 75 At 10 abovethreshold P(xt 5 11) 75 1 01 bcent 5 07899 which isquite close to the exact value of 7896 This example showshow bcent defined with a natural log abscissa yt is relevantto a linear abscissa xt

Now that the logarithmic abscissa has been introducedthis is a good place to stop and point out that the log ab-scissa is fine for detection but not for discrimination since

the x 5 0 point is often in the middle of the range How-ever the cumulative normal PF is especially relevant to dis-crimination where the PF goes from 0 to 100 andwould be written as

(17A)

or

(17B)

where x 5 PSE is the point of subjective equality Theparameter a is the threshold since dcent(a) 5 zF(a)zF(0) 5 1 The threshold a is also s standard deviationsof the Gaussian probability density function (pdf ) Thereciprocal of a is the PF slope I tend to use the letter sfor a unitless standard deviation as occurs for the loga-rithmic variable y I use a for a standard deviation thathas the units of the stimulus x (like percent contrast) Thecomparison of Equation 14B for detection and Equation17B for discrimination is useful Although Equation 14is used in most of the articles in this special issue whichare concerned with detection tasks the techniques that I willbe discussing are also relevant to discrimination tasks forwhich Equation 17 is used

Threshold and the Weibull FunctionTo illustrate how a dcent 5 1 definition of threshold works

for a Weibull function let us start with a yesno task inwhich the false alarm rate is P(0) 5 1587 correspondingto a z score of zFA 5 1 as in Figure 1 The z score atthreshold (dcent 5 1) is zTh 5 zFA11 5 0 corresponding to aprobability of P(1) 5 50 From Equations 1 and 12 thek value for this definition of threshold is given by k 5[1 P(1)][1 P(0)] 5 05943 The Weibull function be-comes

(18A)

If I had defined dcent 5 2 to be threshold then zTh 5 zFA125 1 leading to k 5 (1 08413)(1 01587) 5 01886giving

(18B)

As another example suppose the false alarm rate in ayesno task is at 50 not uncommon in a signal detectionexperiment with blanks and test stimuli intermixed Thenthreshold at dcent 5 1 would occur at 8413 and the PFwould be

(18C)

with Pweib(0) 5 50 and Pweib(1) 5 8413 This casecorresponds to defining dcent on the ROC vertical intercept

For 2AFC the connection between dcent and z is z 5 dcent205Thus zTh 5 2 05 5 07071 corresponding to Pweib(1) 57602 leading to k 5 04795 in Equation 12 These con-nections will be clarified when an explicit form (Equation20)is given for the dcent function dcent(xt)

P xtxt

weib ( ) = 1 0 5 0 3173 b

P xtxt

weib ( ) = 1 1 0 1587 0 1886( ) b

P xtxt

weib ( ) = 1 1 0 1587 0 5943( ) b

z x xtF ( ) = PSE

a

p x P x xt tF F F( ) = ( ) = aelig

egraveoumloslash

PSEa

cent = =bs p s

12

0 399

cent = =b b bexp( ) 1 0 368

z yy y Y

tt( ) = =s s

p yy

tt

F F( ) =aeligegraveccedil

oumloslashdivides

1432 KLEIN

Complications With Strasburgerrsquos Advocacy of Maximum Slope

Here I will mention three complications implicated inStrasburgerrsquos suggestion of defining slope at the point ofmaximum slope on a logarithmic abscissa

1 As can be seen in Equation 11 the slopes on linearand logarithmic abscissas are related by a factor of 1xtBecause of this extra factor the maximum slope occursat a lower stimulus strength on linear as compared withlog axes This makes the notion of maximum slope lessfundamental However since we are usually interested inthe percent error of threshold estimates it turns out thata logarithmic abscissa is the most relevant axis support-ing Strasburgerrsquos log abscissa definition

2 The maximum slope is not necessarily at threshold(as Strasburger points out) For the Weibull functions de-fined in Equation 13 with k 5 exp( 1) and the cumula-tive normal function in Equation 14 the maximum slope(on a log axis) does occur at threshold However the twopoints are decoupled in the generalized Weibull functiondefined in Equation 12 For the Quick version of theWeibull function (Equation 13 with k 5 05 placingthreshold halfway up the PF) the threshold is below thepoint of maximum slope the derivative of P(xt) atthreshold is

which is slightly different from the maximum slope asgiven by Equation 15 Similarly when threshold is de-fined at dcent 5 1 the threshold is not at the point of max-imum slope People who fit psychometric functions wouldprobably prefer reporting slopes at the detection thresh-old rather than at the point of maximum slope

3 One of the most important considerations in selectinga point at which to measure threshold is the question ofhow to minimize the bias and standard error of the thresh-old estimate The goal of adaptive methods is to place tri-als at one level In order to avoid a threshold bias due to aimproper estimate of PF slope the test level should be atthe defined threshold The variance of the threshold esti-mate when the data are concentrated as a single level (aswith adaptive procedures) is given by Gourevitch and Gal-anter (1967) as

(19)

If the binomial error factor of P(1 P)N were not presentthe optimal placement of trials for measuring thresholdwould be at the point of maximum slope However thepresence of the P(1 P) factor shifts the optimal point toa higher level Wichmann and Hill (2001b) consider howtrial placement affects threshold variance for the methodof constant stimuli This topic will be considered in Sec-tion II

Connecting the Weibull and dcent FunctionsFigures 5ndash7 of Strasburger (2001a) compare the logndash

log dcent slope b with the Weibull PF slope b Strasburgerrsquosconnection of b 5 088b is problematic since the dcent ver-sion of the Weibull PF does not have a fixed logndashlog slopeAn improved dcent representation of the Weibull function isthe topic of the present section A useful parameterizationfor dcent which I shall refer to as the StromeyerndashFoley func-tion was introduced by Stromeyer and Klein (1974) andused extensively by Foley (1994)

(20)

where xt is the stimulus strength in threshold units as inEquation 8 The factors with a in the denominator are pres-ent so that dcent equals unity at threshold (xt 5 1 or x 5 a)At low xt dcent x t

ba The exponent b (the logndashlog slope ofdcent at very low contrasts) controls the amount of facilita-tion near threshold At high xt Equation 20 becomes dcentx t

1 w (1 a) The parameter w is the logndash log slope of thetest threshold versus pedestal contrast function (the tvc orldquodipperrdquo function) at strong pedestals One must be a bitcautious however because in yesno procedures the tvcslope can be decoupled from w if the signal detection ROCcurve has nonunity slope (Stromeyer amp Klein 1974) Theparameter a controls the point at which the dcent function be-gins to saturate Typical values of these unitless parametersare b 5 2 w 5 05 and a 5 06 (Yu Klein amp Levi 2001)The function in Equation 20 accelerates at low contrast(logndashlog slope of b) and decelerates at high contrasts (logndashlog slope of 1 w) in general agreement with a broadrange of data

For 2AFC z 5 dcentIuml2 so from Equation 3 the connec-tion between dcent and probability correct is

(21)

To establish the connection between the PFs specified inEquations12 and 20ndash21 one must first have the two agreeat threshold For 2AFC with the dcent 5 1 the threshold atP 5 7602 can be enforced by choosing k 5 04795 (seethe discussion following Equation 18C) so that Equa-tions 1 and 12 become

(22)

Modifying k as in Equation 22 leaves the Weibull shapeunchanged and shifts only the definition of threshold Ifb 5 106b w 5 1 039b and a 5 0614 then for allvalues of b the Weibull and dcent functions (Equations 21and 22 vs Equation 12) differ by less than 00004 for allstimulus strengths At very low stimulus strengths b 5b The value b 5 106b is a compromise for getting anoverall optimal fit

Strasburger (2001a) is concerned with the same issueIn his Table 1 he reports dcent logndashlog slopes of [8847

P xtxt( ) = 1 1 5 4795( )

b

P xd x

tt( ) = +

cent( )eacute

eumlecircecirc

5 5erf2

cent( ) =d xx

a a xt

tb

tb w+ 1 + 1( )

var( )( )

YP P

N

dP y

dy

t

t

=( )eacute

eumlecircecirc

12

cent =( )

= =b g b g bthresh

dP x

dx

t

t

e( )log ( )

( ) 12

2347 1

MEASURING THE PSYCHOMETRIC FUNCTION 1433

18379 3131 4421] for b 5 [1 2 35 5] If the dcent slopeis taken to be a measure of the dcent exponent b then theratio bb is [08847 09190 08946 08842] for the fourb values Our value of bb 5 106 differs substantiallyfrom 08 (Pelli 1985) and 088 (Strasburger 2001a) Pelli(1985) and Strasburger (2001a) used dcent functions with a 51 in Equation 20 With a 5 0614 our dcent function (Equa-tion 20) starts saturating near threshold in agreement withexperimental data The saturation lowers the effective valueof b I was very surprised to find that the StromeyerndashFoley dcent function did such a good job in matching theWeibull function across the whole range of b and x Fora long time I had thought a different fit would be neededfor each value of b

For the same Weibull function in a yesno method (falsealarm rate 5 50) the parameters b and w are identicalto the 2AFC case above Only the value of a shifts froma 5 0614 (2AFC) to a 5 054 (yesno) In order to makext 5 1 at dcent 5 1 k becomes 03173 (see Equation 18C)instead of 04795 (2AFC) As with 2AFC the differencebetween the Weibull and dcent function is less than 0004(lt04) for all stimulus strengths and all values of b andfor both a linear and a logarithmic abscissa These valuesmean that for both the yes no and the 2AFC methods thedcent function (Equation 20) corresponding to a Weibullfunction with b 5 2 has a low-contrast logndashlog slope ofb 5 212 and a high-contrast logndashlog slope of 1 w 5078 The high-contrast tvc (test contrast vs pedestal con-trast discrimination function sometimes called the ldquodip-perrdquo function) logndashlog slope would be w 5 022

I also carried out a similar analysis asking what cumu-lative normal s is best matched to the Weibull I founds 5 11720b However the match is poor with errors of4 (rather than lt004 in the preceding analysis) Thepoor fit of the two functions makes the fitting proceduresensitive to the weighting used in the fitting procedure Ifone uses a weighting factor of [P(1 P)N] where P is thePF one will find a quite different value for s than if oneignored the weighting or if one chose the weighting on thebasis of the data rather than the fitting function

II COMPARING EXPERIMENTAL TECHNIQUES FOR MEASURING

THE PSYCHOMETRIC FUNCTION

This section inspired by Linschoten et al (2001) is anexamination of various experimental methods for gather-ing data to estimate thresholds Linschoten et al comparethree 2AFC methods for measuring smell and taste thresh-olds a maximum-likelihood adaptive method an upndashdownstaircase method and an ascending method of limits Thespecial aspect of their experiments was that in measuringsmell and taste each 2AFC trial takes between 40 and60 sec (Lew Harvey personal communication) because ofthe long duration needed for the nostrils or mouth to re-cover their sensitivity The purpose of the Linschoten et alpaper is to examine which of the three methods is mostaccurate when the number of trials is low Linschoten et al

conclude that the 2AFC method is a good method for thistask My prior experience with forced choice tasks and lownumber of trials (Manny amp Klein 1985 McKee et al1985) made me wonder whether there might be better meth-ods to use in circumstances in which the number of trialsis limited This topic is considered in Section II

Compare 2AFC to YesNoThe commonly accepted framework for comparing

2AFC and yesno threshold estimates is signal detectiontheory Initially I will make the common assumption thatthe dcent function is a power function (Equation 20 witha 5 1)

(23)

Later in this section the experimentally more accurateform Equation 20 with a 1 will be used The varianceof the threshold estimate will now be calculated for 2AFCand yesno for the simplest case of trials placed at a sin-gle stimulus level y the goal of most adaptive methodsThe PF slope relates ordinate variance to abscissa vari-ance (from Gourevitch amp Galanter 1967)

(24)

where Y 5 y yt is the threshold estimate for a natural logabscissa as given in Equation 9 This formula for var(Y )is called the asymptotic formula in that it becomes exactin the limit of large number of total trials N Wichmannand Hill (2001b) show that for the method of constantstimuli a proper choice of testing levels brings one to theasymptotic region for N 120 (the smallest value thatthey explored) Improper choice of testing levels requiresa larger N in order to get to the asymptotic region Equa-tion 24 is based on having the data at a single level as oc-curs asymptotically in adaptive procedures In addition thedefinition of threshold must be at the point being testedIf levels are tested away from threshold there will be anadditional contribution to the threshold variance (Finney1971 McKee et al 1985) if the slope is a variable param-eter in the fitting procedure Equation 24 reaches its as-ymptotic value more rapidly for adaptive methods with fo-cused placement of trials than with the constant stimulusmethod when slope is a free parameter

Equation 24 needs analytic expressions for the deriva-tive and the variance of dcent The derivative is obtained fromEquation 23

(25)

The variance of dcent for both yesno (dcent 5 zhit 1 zcorrect rejection)and 2AFC (dcent 5 Iuml2zave) is

(26)var var( ) exp var( ) cent( ) = = ( )d z z P2 4 2p

dddy

bdt

cent = cent

var( )var( )

Yd

dddyt

= cent

centaeligegraveccedil

oumloslashdivide

2

cent = = ( )d c bytb

texp

1434 KLEIN

For the yes no case Equation 26 is based on a unityslope ROC curve with the subjectrsquos criterion at the neg-ative diagonal of the ROC curve where the hit rate equalsthe correct rejection rate This would be the most effi-cient operating point The variance for a nonunity slopeROC curve was considered by Klein Stromeyer andGanz (1974) From binomial statistics

(27)

with n 5 N for 2AFC and n 5 N2 for yesno since forthis binary yesno task the number of signal and blanktrials is half the total number of trials

Putting all of these factors together gives

(28)

for both the yesno and the 2AFC cases The only dif-ference between the two cases is the definition of n andthe definition of dcent (dcent2AFC 5 205z and dcentYN 5 2z for asymmetric criterion) When this is expressed in terms ofN (number of trials) and z (z score of probability cor-rect) we get

(29)

for both 2AFC and yesno That is when the percent cor-rects are all equal (P2AFC 5 Phit 5 Pcorrect rejection) thethreshold variance for 2AFC and yesno are equal Theminimum value of var(Y ) is 164(Nb2) occurring at z 5157 (P 5 94) This value of 94 correct is the optimumvalue at which to test for both 2AFC and yesno tasks in-dependent of b

This result that the threshold variance is the same for2AFC and yesno methods seems to disagree with Figure 4of McKee et al (1985) where the standard errors of thethreshold estimates are plotted for 2AFC and yesno asa function of the number of trials The 2AFC standarderror (their Figure 4) is twice that of yesno correspond-ing to a factor of four in variance However the McKeeet al analysis is misleading since all they (we) did for theyesno task was to use Equation 2 to expand the PF to a 0to 100 range thereby doubling the PF slope Thatrescaling is not the way to convert a 2AFC detection taskto the yesno task of the same detectability A signal de-tection methodology is needed for that conversion aswas done in the present analysis

One surprise is that the optimal testing probability isindependent of the PF slope b This finding is a generalproperty of PFs in which the dependence on the slopeparameter has the form xb The 94 level corresponds todcentYN 5 315 and dcent2AFC 5 223 For the commonly usedP 5 75 (z 5 0765 dcentYN 5 135 dcent2AFC 5 0954)var(Y ) 5 408(Nb2) which is about 25 times the opti-mal value obtained by placing the stimuli at 94 Green(1990) had previously pointed out that the 94 and 92levels were the optimal points to test for the cumulative

normal and logit functions respectively because of theminimum threshold estimate variability when one is test-ing at that point on the PF Other PFs can have optimalpoints at slightly lower levels but these are still above thelevels to which adaptive methods are normally directed

It is also useful to compare the 2AFC and yesno at thesame dcent rather than at their optimal points In that case fordcent values fixed at [1 2 3] the ratio of the 2AFC varianceto the yesno variance is [182 136 079] At low dcent thereis an advantage for the 2AFC method and that reverses athigh dcent as is expected from the optimal dcent being higher foryesno than for 2AFC In general one should run the yesno method at dcent levels above 20 (84 correct in the blankand signal intervals corresponds to dcent 5 2)

The equality of the optimal var(Y ) for 2AFC and yesno methods is not only surprising it seems paradoxicalmdashas can be seen by the following gedankenexperiment Sup-pose subjects close their eyes on the first interval of the2AFC trial and base their judgments on just the secondinterval as if they were doing a yesno task since blanksand stimuli are randomly intermixed Instead of sayingldquoyesrdquo or ldquonordquo they would respond ldquoInterval 2rdquo or ldquoInter-val 1rdquo respectively so that the 2AFC task would become ayesno task The paradox is that one would expect the vari-ance to be worse than for the eyes open 2AFC case sinceone is ignoring information However Equation29 showsthat the 2AFC and yesno methods have identical optimalvariance I suspect that the resolution of this paradox is thatin the yesno method with a stable criterion the optimalvariance occurs at a higher dcent value (315) than the 2AFCvalue (223) The dcent power law function that I chose doesnot saturate at large dcent levels giving a benefit to the yesno method

In order to check whether the paradox still holds witha more realistic PF (a dcent function that saturates at largedcent ) I now compare the 2AFC and yesno variance usinga Weibull function with a slope b In order to connect2AFC and yesno I need to use signal detection theoryand the StromeyerndashFoley dcent function (Equation 20) withthe ldquoWeibull parametersrdquo b 5 106b a 5 0614 and w 51 39b Following Equation 22 I pointed out that withthese parameter values the difference between the 2AFCWeibull function and the 2AFC dcent function (converted toa probability ordinate) is less than 0004 After the de-rivative Equation 25 is appropriately modified we findthat the optimal 2AFC variance is 342(Nb2) For the yesno case the optimal variance is 402(Nb2) This result re-solves the paradox The optimal 2AFC variance is about18 lower than the yesno variance so closing onersquos eyesin one of the intervals does indeed hurt

We shouldnrsquot forget that if one counts stimulus presen-tations Np rather than trials N the optimal 2AFC variancebecomes 684(Npb2) as opposed to 402(Npb2) for yesnoThus for the Linschoten experiments with each stimuluspresentation being slow the yesno methodology is moreaccurate for a given number of stimulus presentationseven at low dcent values Kaernbach (1990) compares 2AFC

var( ) exp( )

( )Y z

P P

N bz= ( )2

122

p

var( ) exp( )

( )Y z

P P

n bd= ( ) cent

412

2p

var( )( )

PP P

n= 1

MEASURING THE PSYCHOMETRIC FUNCTION 1435

and yesno adaptive methods with simulations and ac-tual experiments His abscissa is number of presenta-tions and his results are compatible with our findings

The objective yesno method that I have been discussingis inefficient since 50 of the trials are blanks Greaterefficiency is possible if several points on the PF are mea-sured (method of constant stimuli) with the number of blanktrials the same as at the other levels For example if fivelevels of stimuli were measured including blanks then theblanks would be only 20 of the total rather than 50 Forthe 2AFC paradigm the subject would still have to look at50 of the presentations being blanks The yesno eff i-ciency can be further improved by having the subjectrespond to the multiple stimuli with multiple ratings ratherthan a binary yesno response This rating scale methodof constant stimuli is my favorite method and I have beenusing it for 20 years (Klein amp Stromeyer 1980) The mul-tiple ratings allow one to measure the psychometric func-tion shape for dcents as large as 4 (well into the dipper regionwhere stimuli are slightly above threshold) It also allowsone to measure the ROC slope (presence of multiplicativenoise) Simulations of its benefits will be presented inKlein (2002)

Insight into how the number of responses affects thresh-old variance can be obtained by using the cumulative nor-mal PF (Equation 17) considered by Kaernbach (2001b)and Miller and Ulrich (2001) with g 5 0 Typically this isthe case of yesno discrimination but it can also be 2AFCdiscrimination with an ordinate going from 0 to 100as I have discussed in connection with Figure 3 For a cu-mulative normal PF with a binary response and for a sin-gle stimulus level being tested the threshold variance is(Equation 19) as follows

(30)

from Equation 27 and forthcoming Equations 40 and 41The optimal level to test is at P 5 05 so the optimal vari-ance becomes

(31)

If instead of a binary response the observer gives an ana-log response (many many rating categories) then the vari-ance is the familiar connection between standard deviationand standard error

(32)

With four response categories the variance is 119s2Nwhich is closer to the analog than to the binary case Ifstimuli with multiple strengths are intermixed (methodof constant stimuli) then the benefit of going from a bi-nary response to multiple responses is greater than that in-dicated in Equations 31ndash32 (Klein 2002)

A brief list of other problems with the 2AFC method ap-plied to detection comprises the following

1 2AFC requires the subject to memorize and thencompare the results of two subjective responses It is cog-nitively simpler to report each response as it arrives Sub-jects without much experience can easily make mistakes(lapses) simply becoming confused about the order of thestimuli Klein (1985) discusses memory load issues rele-vant to 2AFC

2 The 2AFC methodology makes various types ofanalysis and modeling more difficult because it requiresthat one average over all criteria The response variable in2AFC is the Interval 2 activation minus the Interval 1 ac-tivation (see the abscissa in Figure 2) This variable goesfrom minus infinity to plus infinity whereas the yesnovariable goes from zero to infinity It therefore seems moredifficult to model the effects of probability summation anduncertainty for 2AFC than for yesno

3 Models that relate psychophysical performance to un-derlying processes require information about how the noisevaries with signal strength (multiplicative noise) The rat-ing scale method of constant stimuli not only measures dcentas a function of contrast it also is able to provide an esti-mate of how the variance of the signal distribution (theROC slope) increases with contrast The 2AFC methodlacks that information

4 Both the yesno and 2AFC methods are supposed toeliminate the effects of bias Earlier I pointed out that in myrecent 2AFC discrimination experiments when the stim-ulus is near threshold I find I have a strong bias in favor ofresponding ldquoStimulus 2rdquo Until I learned to compensatefor this bias (not typically done by naiumlve subjects) my dcentwas underestimated As I have discussed it is possible tocompensate for this bias but that is rarely done

Improving the Brownian (Up ndashDown) StaircaseAdaptive methods with a goal of placing trials at an

optimal point on the PF are efficient Leek (2001) pro-vides a nice overview of these methods There are twogeneral categories of adaptive methods those based onmaximum (or mean) likelihood and those based on sim-pler staircase rules The present section is concernedwith the older staircase methods which I will callBrownian methods I used the name ldquoBrownianrdquo becauseof the upndashdown staircasersquos similarity to one-dimensionalBrownian motion in which the direction of each step israndom The familiar upndashdown staircase is Brownianbecause at each trial the decision about whether to go upor down is decided by a random factor governed by prob-ability P(x) I want to make this connection to Brownianmotion because I suspect that there are results in thephysics literature that are relevant to our psychophysicalstaircases I hope these connections get made

A Brownian staircase has a minimal memory of the runhistory it needs to remember the previous step (includingdirection) and the previous number of reversals or numberof trials (used for when to stop and for when to decreasestep size) A typical Brownian rule is to decrease signalstrength by one step (like 1 dB) after a correct responseand to increase it three steps after an incorrect response (a

var( thresh) = s 2

N

var( thresh) = timesp s2

2

N

var(var( )

( )exp thresh) =aeligegraveccedil

oumloslashdivide

= ( )P

N dPdy

P P zN2

22

2 1p s

1436 KLEIN

one downndashthree up rule) I was pleased to learn recentlythat his rule works nicely for an objective yesno task(Kaernbach 1990) as well as 2AFC as discussed earlier

In recent years likelihood methods (Pentland 1980Watson amp Pelli 1983) have become popular and havesomewhat displaced Brownian methods There is a wide-spread feeling that likelihood methods have substantiallysmaller threshold variance than do staircases in whichthresholds are obtained by simply averaging the levelstested Given the importance of this issue for informingresearchers about how best to measure thresholds it issurprising that there have been so few simulations com-paring staircases and optimal methods Pentlandrsquos (1980)results showing very large threshold variance for a stair-case method are so dramatically different from the re-sults of my simulations that I am skeptical The best pre-vious simulations of which I am aware are those of Green(1990) who compared the threshold variance from sev-eral 2AFC staircase methods with the ideal variance(Equation 24) He found that as long as the staircase ruleplaced trials above 80 correct the ratio of observed toideal threshold variance was about 14 This value is thesquare of the inverse of Greenrsquos finding that the ratio ofthe expected to the observed sigma is about 85

In my own simulations (Klein 2002) I found that aslong as the final step size was less than 02s the thresh-old variance from simply averaging over levels was closeto the optimal value Thus the staircase threshold varianceis about the same as the variance found by other optimalmethods One especially interesting finding was that thecommon method of averaging an even number of rever-sal points gave a variance that was about 10 higher thanaveraging over all levels This result made me wonderhow the common practice of averaging reversals ever gotstarted Since Green (1990) averaged over just the reversalpoints I suspect that his results are an overestimate of thestaircase variance In addition since Green specifies stepsize in decibels it is difficult for me to determine whetherhis final step size was small enough Since the thresholdvariance is inversely related to slope it is important to re-late step size to slope rather than to decibels The most nat-ural reference unit is s the standard deviation associatedwith the cumulative normal In summary I suggest that thisgeneral feeling of distrust of the simple Brownian staircaseis misguided and that simple staircases are often as goodas likelihood methods In the section Summary of Advan-tages of Brownian Staircase I point out several advantagesof the simple staircase over the likelihood methods

A qualification should be made to the claim above thataveraging levels of a simple staircase is as good as a fancylikelihood method If the staircase method uses a too smallstep size at the beginning and if the starting point is farfrom threshold then the simple staircase can be inefficientin reaching the threshold region However at the end ofthat run the astute experimenter should notice the largediscrepancy between the starting point and the thresholdand the run should be redone with an improved startingpoint or with a larger initial step size that will be reducedas the staircase proceeds

Increase number of 2AFC response categories In-dependent of the type of adaptive method used the 2AFCtask has a flaw whereby a series of lucky guesses at lowstimulus levels can erroneously drive adaptive methods tolevels that are too low Some adaptive methods with a smallnumber of trials are unable to fully recover from a string oflucky guesses early in the run One solution is to allow thesubject to have more than two responses For reasons thatI have never understood people like to have only two re-sponse categories in a 2AFC task Just as I am reluctant totake part in 2AFC experiments I am even more reluctantto take part in two response experiments The reason issimple I always feel that I have within me more than onebit of information after looking at a stimulus I usually haveat least four or more subjective states (2 bits) for a yesnotask HN LN LY HY where H and L are high and lowconfidence and N and Y are did not and did see it For a2AFC my internal states are H1 L1 L2 H2 for high andlow confidence on Intervals 1 and 2 Kaernbach (2001a) inthis special issue does an excellent job of justifying thelow-confidence response category He provides solid ar-guments simulations and experimental data on how alow-confidence category can minimize the ldquolucky guessrdquoproblem

Kaernbach (2001a) groups the L1 and L2 categoriestogether into a ldquodonrsquot knowrdquo category and argues per-suasively that the inclusion of this third response can in-crease the precision and accuracy of threshold estimateswhile also leaving the subject a happier person He callsit the ldquounforced choicerdquo method Most of his article con-cerns the understandable fear of introducing a responsebias exactly what 2AFC was invented to avoid He has de-veloped an intelligent rule for what stimulus level shouldfollow a ldquodonrsquot knowrdquo response Kaernbach shows withboth simulations and real data that with his method thispotential response bias problem has only a second-ordereffect on threshold estimates (similar to the 2AFC inter-val bias discussed around Equation 7) and its advantagesoutweigh the disadvantages I urge anyone using the 2AFCmethod to read it I conjecture that Kaernbachrsquos method isespecially useful in trimming the lower tail of the distri-bution of threshold estimates and also in reducing the in-terval bias of 2AFC tasks since the low confidence ratingwould tend to be used for trials on which there is the mostuncertainty

Although Kaernbach has managed to convince me ofthe usefulness of his three-response method he might havetrouble convincing others because of the response biasquestion I should like to offer a solution that might quellthe response bias fears Rather than using 2AFC with threeresponses one can increase the number of responses tofourmdashH1 L1 L2 H2mdashas discussed two paragraphsabove The L rating can be used when subjects feel theyare guessing

To illustrate how this staircase would work considerthe case in which we would like the 2AFC staircase to con-verge to about 84 correct A 5 upndash1 down staircase(5 levels up for a wrong response and 1 level down for acorrect response) leads to an average probability correct

MEASURING THE PSYCHOMETRIC FUNCTION 1437

of 56 5 8333 If one had zero information about thetrial and guessed then one would be correct half the timeand the staircase would shift an average of (5 1)2 512 levels Thus in his scheme with a single ldquodonrsquot knowrdquoresponse Kaernbach would have the staircase move up2 levels following a ldquodonrsquot knowrdquo response One mightworry that continued use of the ldquodonrsquot knowrdquo responsewould continuously increase the level so that one wouldget an upward bias But Kaernbach points out that theldquodonrsquot knowrdquo response replaces incorrect responses to agreater extent than it replaces correct responses so thatthe 12 level shift is actually a more negative shift than the15 shift associated with a wrong response For my sug-gestion of four responses the step shifts would be ndash1 1113 and 15 levels for responses HC LC LI and HIwhere H and L stand for high and low confidence and Cand I stand for correct and incorrect A slightly more con-servative set of responses would be ndash1 0 14 15 inwhich case the low-confidence correct judgment leaves thelevel unchanged In all these examples including Kaern-bachrsquos three-response method the low-confidence re-sponse has an average level shift of 12 when one is guess-ing The most important feature of the low confidencerating is that when one is below threshold giving a lowconfidence rating will prevent one from long strings oflucky guesses that bring one to an inappropriately lowlevel This is the situation that can be difficult to overcomeif it occurs in the early phase of a staircase when the stepsize can be large The use of the confidence rating wouldbe especially important for short runs where the extra bitof information from the subject is useful for minimizingerroneous estimates of threshold

Another reason not to be fearful of the multiple-response2AFC method is that after the run is finished one can es-timate threshold by using a maximum likelihood proce-dure rather than by averaging the run locations Standardlikelihood analysis would have to be modified for Kaern-bachrsquos three-response method in order to deal with theldquodonrsquot knowrdquo category For the suggested four-responsemethod one could simply ignore the confidence ratings forthe likelihood analysis My simulations discussed earlierin this section and Greenrsquos (1990) reveal that averagingover the levels (excluding a number of initial levels toavoid the starting point bias) has about as good a preci-sion and accuracy as the best of the likelihood methodsIf the threshold estimates from the likelihood method andfrom the averaging levels method disagree that run couldbe thrown out

Summary of advantages of Brownian staircaseSince there is no substantial variance advantage to thefancier likelihood methods one can pay more attention toa number of advantages associated with simple staircases

1 At the beginning of a run likelihood methods mayhave too little inertia (the sequential stimulus levels jumparound easily) unless a ldquostiff rdquo prior is carefully chosenBy inertia I refer to how difficult it is for the next level todeviate from the present level At the beginning of a runone would like to familiarize the subject with the stimu-

lus by presenting a series of stimuli that gradually increasein difficulty This natural feature of the Brownian stair-case is usually missing in QUEST-like methods Theslightly inefficient first few trials of a simple staircasemay actually be a benefit since it gives the subject somepractice with slightly visible stimuli

2 Toward the end of a maximum likelihood run one canhave the opposite problem of excessive inertia wherebyit is difficult to have substantial shifts of test contrastThis can be a problem if nonstationary effects are presentFor example thresholds can increase in mid-run owingto fatigue or to adaptation if a suprathreshold pedestal ormask is present In an old-fashioned Brownian staircasewith moderate inertia a quick look at the staircase trackshows the change of threshold in mid-run That observa-tion can provide an important clue to possible problemswith the current procedures A method that reveals non-stationarity is especially important for difficult subjectssuch as infants or patients in whom fatigue or inatten-tion can be an important factor

3 Several researchers have told me that the simplicity ofthe staircase rules is itself an advantage It makes writingprograms easier It is nicer for teaching students It alsosimplifies the uncertain business of guessing likelihoodpriors and communicating the methodology when writ-ing articles

Methods That Can Be Used to Estimate Slope as Well as Threshold

There are good reasons why one might want to learnmore about the PF than just the single threshold valueEstimating the PF slope is the first step in this directionTo estimate slope one must test the PF at more than onepoint Methods are available to do this (Kaernbach 2001bKontsevich amp Tyler 1999) and one could even adapt sim-ple Brownian staircase procedures to give better slope es-timates with minimum harm to threshold estimates (seethe comments in Strasburger 2001b)

Probably the best method for getting both thresholds andslopes is the Y (Psi) method of Kontsevich and Tyler(1999) The Y method is not totally novel it borrows tech-niques from many previous researchersmdashas Kontsevichand Tyler point out in a delightful paragraph that both clar-ifies their method and states its ancestry This paragraphsummarizes what is probably the most accurate and preciseapproach to estimating thresholds and slope so I reproduceit here

To summarize the Y method is a combination of solutionsknown from the literature The method updates the poste-rior probability distribution across the sampled space ofthe psychometric functions based on Bayesrsquo rule (Hall1968 Watson amp Pelli 1983) The space of the psychome-tric functions is two-dimensional (King-Smith amp Rose1997 Watson amp Pelli 1983) Evaluation of the psycho-metric function is based on computing the mean of theposterior probability distribution (Emerson 1986 King-Smith et al 1994) The termination rule is based on thenumber of trials as the most practical option (King-Smith

1438 KLEIN

et al 1994 Watson amp Pelli 1983) The placement of eachnew trial is based on one-step ahead minimum search(King-Smith 1984) of the expected entropy cost function(Pelli 1987)

A simplified rough description of the Y trial placementfor 2AFC runs is that initially trials are placed at about the90 correct level to establish a solid threshold Oncethreshold has been roughly established trials are placed atabout the 90 and the 70 levels to pin down the slopeThe precise levels depend on the assumed PF shape

Before one starts measuring slopes however one mustfirst be convinced that knowledge of slope can be impor-tant It is so for several reasons

1 Parametric fitting procedures either require knowl-edge of slope b or must have data that adequately con-strain the slope estimate Part of the discussion that fol-lows will be on how to obtain reliable threshold estimateswhen slope is not well known A tight knowledge of slopecan minimize the error of the threshold estimate

2 Strasburger (2001a 2001b) gives good argumentsfor the desirability of knowing the shape of the PF He isinterested in differences between foveal and peripheralvisual processing and he uses the PF slope as an indica-tion of differences

3 Slopes are different for different stimuli and thiscan lead to misleading results if slope is ignored It mayhelp to describe my experience with the Modelfest pro-ject (Carney et al 2000) to make this issue concreteThis continuing project involves a dozen laboratoriesaround the world We have measured detection thresh-olds for a wide variety of visual stimuli The data areused for developing an improved model of spatial visionSome of the stimuli were simple easily memorized lowspatial frequency patterns that tend to have shallowerslopes because a stimulus-known-exactly template canbe used efficiently and with minimal uncertainty On theother hand tiny Modelfest stimuli can have a good dealof spatial uncertainty and are therefore expected to pro-duce PFs with steeper slopes Because of the slope changethe pattern of thresholds will depend on the level atwhich threshold is measured When this rich dataset isanalyzed with f ilter models that make estimates formechanism bandwidths and mechanism pooling thesedifferent slope-dependent threshold patterns will lead todifferent estimates for the model parameters Several ofus in the Modelfest data gathering group argued that weshould use a data-gathering methodology that wouldallow estimation of PF slope of each of the 45 stimuliHowever I must admit that when I was a subject my pa-tience wore thin and I too chose the quicker runs Al-though I fully understand the attraction of short runs fo-cused on thresholds and ignoring slope I still think thereare good reasons for spreading trials out For one thinginterspersing trials that are slightly visible helps the sub-ject maintain memory of the test pattern

4 Slopes can directly resolve differences between mod-els In my own research for example my colleagues and

I are able to use the PF slope to distinguish long-range fa-cilitation that produces a low-level excitation or noise re-duction (typical with a cross-orientation surround) from ahigher level uncertainty reduction (typical with a same-orientation surround)

Throwing-Out Rules Goodness-of-FitThere are occasions when a staircase goes sour Most

often this happens when the subjectrsquos sensitivity changesin the middle of a run possibly because of fatigue It istherefore good to have a well-specified rule that allowsnonstationary runs to be thrown out The throwing-out rulecould be based on whether the PF slope estimate is toolow or too high Alternatively a standard approach is tolook at the chi-square goodness-of-f it statistic whichwould involve an assumption about the shape of the un-derlying PF I have a commentary in Section III regard-ing biases in the chi-square metric that were found byWichmann and Hill (2001a) Biases make it difficult touse standard chi-square tables The trouble with this ap-proach of basing the goodness-of-fit just on the cumulateddata is that it ignores the sequential history of the run TheBrownian upndashdown staircase makes the sequential historyvividly visible with a plot of sequentially tested levelsWhen fatigue or adaptation sets in one sees the tested lev-els shift from one stable point to another Leek Hanna andMarshall (1991) developed a method of assessing the non-stationarity of a psychometric function over the course ofits measurement using two interleaved tracks EarlierHall (1983) also suggested a technique to address the sameproblem I encourage further research into the develop-ment of a non-stationarity statistic that can be used to dis-card bad runs This statistic would take the sequential his-tory into account as well as the PF shape and the chi-square

The development of a ldquothrowing-outrdquo rule is the ex-treme case of the notion of weighted averaging One im-portant reason for developing good estimators for good-ness-of-fit and good estimators for threshold variance isto be able to optimally combine thresholds from repeatedexperiments I shall return to this issue in connectionwith the Wichmann and Hill (2001a) analysis of bias ingoodness-of-fit metrics

Final Thought The Method Should Depend on the Situation

Also important is that the method chosen should be ap-propriate for the task For example suppose one is usingwell-experienced observers for testing and retesting sim-ilar conditions in which one is looking for small changesin threshold or one is interested in the PF shape In suchcases one has a good estimate of the expected thresholdand the signal detection method of constant stimuli withblanks and with ratings might be best On the other handfor clinical testing in which one is highly uncertain aboutan individualrsquos threshold and criterion stability a 2AFClikelihood or staircase method (with a confidence rating)might be best

MEASURING THE PSYCHOMETRIC FUNCTION 1439

III DATA-FITTING METHODS FOR ESTIMATING THRESHOLD AND SLOPEPLUS GOODNESS-OF-FIT ESTIMATION

This final section of the paper is concerned with whatto do with the data after they have been collected Wehave here the standard Bayesian situation Given the PFit is easy to generate samples of data with confidence in-tervals But we have the inverse situation in which we aregiven the data and need to estimate properties of the PFand get confidence limits for those estimates The pair ofarticles in this special issue by Wichmann and Hill (2001a2001b) as well as the articles by King-Smith et al (1994)Treutwein (1995) and Treutwein and Strasburger (1999)do an outstanding job of introducing the issues and providemany important insights For example Emerson (1986)and King-Smith et al emphasize the advantages of meanlikelihood over maximum likelihood The speed of mod-ern computers makes it just as easy to calculate meanlikelihood as it is to calculate maximum likelihood As an-other example Treutwein and Strasburger (1999) demon-strate the power of using Bayesian priors for estimatingparameters by showing how one can estimate all four pa-rameters of Equation 1A in fewer than 100 trials This im-portant finding deserves further examination

In this section I will focus on four items First I will dis-cuss the relative merits of parametric versus nonparamet-ric methods for estimating the PF properties The paperby Miller and Ulrich (2001) provides a detailed analysis ofa nonparametric method for obtaining all the moments ofthe PF I will discuss some advantages and limitations oftheir method Second is the question of goodness-of-fitWichmann and Hill (2001a) show that when the numberof trials is small the goodness-of-fit can be very biasedI will examine the cause of that bias Third is the issue dis-cussed by Wichmann and Hill (2001a) that of how lapseseffect slope estimates This topic is relevant to Strasburg-errsquos (2001b) data and analysis Finally there is the possiblycontroversial topic of an upward slope bias in adaptivemethods that is relevant to the papers of Kaernbach (2001b)and Strasburger (2001b)

Parametric Versus Nonparametric FitsThe main split between methods for estimating pa-

rameters of the psychometric function is the distinctionbetween parametric and nonparametric methods In aparametric method one uses a PF that involves severalparameters and uses the data to estimate the parametersFor example one might know the shape of the PF and es-timate just the threshold parameter McKee et al (1985)show how slope uncertainty affects threshold uncertaintyin probit analysis The two papers by Wichmann and Hill(2001a 2001b) are an excellent introduction to many is-sues in parametric fitting An example of a nonparamet-ric method for estimating threshold would be to averagethe levels of a Brownian staircase after excluding the firstfour or so reversals in order to avoid the starting level biasBefore I served as guest editor of this special symposium

issue I believed that if one had prior knowledge of the exactPF shape a parametric method for estimating threshold wasprobably better than nonparametric methods After read-ing the articles presented here as well as carrying out myrecent simulations I no longer believe this I think thatin the proper situation nonparametric methods are as goodas parametric methods Furthermore there are situationsin which the shape of the PF is unknown and nonparamet-ric methods can be better

Miller and Ulrichrsquos Article on the Nonparametric SpearmanndashKaumlrber Method of Analysis

The standard way to extract information from psycho-metric data is to assume a functional shape such as a Wei-bull cumulative normal logit dcent and then vary the pa-rameters until likelihood is maximized or chi-square isminimized Miller and Ulrich (2001) propose using theSpearmanndashKaumlrber (SK) nonparametric method whichdoes not require a functional shape They start with a prob-ability density function defined by them as the derivativeof the PF

(33)

where s(i) is the stimulus intensity at the ith level andP(i) is the probability correct at the that level The notationpdf(i) is meant to indicate that it is the pdf for the intervalbetween i ndash 1 and i Miller and Ulrich give the r th momentof this distribution as their Equation 3

(34)

The next four equations are my attempt to clarify their ap-proach I do this because when applicable it is a finemethod that deserves more attention Equation 34 prob-ably came from the mean value theorem of calculus

(35)

where f (s) 5 sr11 and f cent(smid) 5 (r 1 1)smidr is the deriv-

ative of f with respect to s somewhere within the intervalbetween s(i 1) and s(i) For a slowly varying functionsmid would be near the midpoint Given Equation35 Equa-tion 34 becomes

(36)

where Ds 5 s(i) s(i 1) Equation 36 is what one wouldmean by the rth moment of the distribution The case r 50 is important since it gives the area under the pdf

(37)

by using the definition in Equation 33 The summation inEquation 37 gives the area under the pdf For it to be aproper pdf its area must be unity which means that theprobability at the highest level P(imax) must be unity andthe probability at the lowest level Pmin must be zero asMiller and Ulrich (2001) point out

m0 = =aringpdf( ) ( ) ( ) max mini s P i P ii

D

mrr

i

i s s= aring pdf mid( ) D

f s i f s i f s s i s i( ) ( ) ( ) ( ) ( ) [ ] [ ] = cent [ ]1 1mid

mrr r

ri s i s i=

+ [ ]aring + +11

11 1pdf( ) ( ) ( )

pdf( )( ) ( )

( ) ( )i

P i P i

s i s i= 1

1

1440 KLEIN

The next simplest SK moment is for r 5 1

(38)

from Equation 33 This first moment provides an esti-mate of the centroid of the distribution corresponding tothe center of the psychometric function It could be usedas an estimate of threshold for detection or PSE for dis-crimination An alternate estimator would be the medianof the pdf corresponding to the 50 point of the PF (theproper definition of PSE for discrimination)

Miller and Ulrich (2001) use Monte Carlo simulationsto claim that the SK method does a better job of extract-ing properties of the psychometric function than do para-metric methods One must be a bit careful here since intheir parametric fitting they generate the data by usingone PF and then fit it with other PFs But still it is a sur-prising claim if this simple method is so good why havepeople not been using it for years There are two possibleanswers to this question One is that people never thoughtof using it before Miller and Ulrich The other possibil-ity is that it has problems I have concluded that the an-swer is a combination of both factors I think that there isan important place for the SK approach but its limitationsmust be understood That is my next topic

The most important limitation of the SK nonparamet-ric method is that it has only been shown to work well formethod of constant stimulus data in which the PF goesfrom 0 to 1 I question whether it can be applied with thesame confidence to 2AFC and to adaptive procedures withunequal numbers of trials at each level The reason for thislimitation is that the SK method does not offer a simpleway to weight the different samples Let us consider twosituations with unequal weighting adaptive methods and2AFC detection tasks

Adaptive methods place trials near threshold but withvery uneven distribution of number of trials at differentlevels The SK method gives equal weighting to sparselypopulated levels that can be highly variable thus con-tributing to a greater variance of parameter estimates Letus illustrate this point for an extreme case with five levelss 5 [1 2 3 4 5] Suppose the adaptive method placed [11 1 98 1] trials at the five levels and the number correctwere [0 1 1 49 1] giving probabilities [0 1 1 5 1] andpdf 5 [1 0 5 5] The following PEST-like (Taylor ampCreelman 1967) rules could lead to this unusual dataMove down a level if the presently tested level has greaterthan P1 correct move up three levels if the presentlytested level has less than P2 correct P1 can be anynumber less than 33 and P2 can be any number greaterthan 67 The example data would be produced by start-ing at Level 5 and getting four in a row correct followedby an error followed by a wide set possible of alternat-ing responses for which the PEST rules would leave thefourth level as the level to be tested The SK estimate of the

center of the PF is given by Equation 38 to be micro1 5 200Since a pdf with a negative probability is distasteful tosome one can monotonize the PF according to the proce-dure of Miller and Ulrich (which does include the numberof trials at each level) The new probabilities are [0 5151 51 1] with pdf 5 [51 0 0 49] The new estimate ofthreshold is micro1 5 297 However given the data with 4998correct at Level 4 an estimate that includes a weightingaccording to the number of trials at each level would placethe centroid of the distribution very close to micro1 5 4 Thuswe see that because the SK procedure ignores the numberof trials at each level it produces erroneous estimates ofthe PF properties This example with shifting thresholdswas chosen to dramatize the problems of unequal weight-ing and the effect of monotonizing

A similar problem can occur even with the method ofconstant stimuli with equal number of trials at each levelbut with unequal binomial weighting as occurs in 2AFCdata The unequal weighting occurs because of the 50lower asymptote The binomial statistics error bar of dataat 55 correct is much larger than for data at 95 cor-rect Parametric procedures such as probit analysis giveless weighting to the lower data This is why the most ef-ficient placement of trials in 2AFC is above 90 correctas I have discussed in Section II Thus it is expected thatfor 2AFC the SK method would give threshold estimatesless precise than the estimates given by a parametricmethod that includes weighting

I originally thought that the SK method would fail on alltypes of 2AFC data However in the discussion followingEquation 7 I pointed out that for 2AFC discriminationtasks there is a way of extending the data to the 0ndash100range that produces weightings similar to those for a yesnotask Rather than use the standard Equation 2 to deal withthe 50 lower asymptote one should use the method dis-cussed in connection with Figure 3 in which one can usefor the ordinate the percent of correct responses in Inter-val 2 and use the signal strength in Interval 2 minus thatin Interval 1 for the abscissa That procedure produces aPF that goes from 0 to 100 thereby making it suit-able for SK analysis while also removing a possible bias

Now let us examine the SK method as applied to datafor which it is expected to work such as discriminationPFs that go from 0 to 100 These are tasks in which thelocation of the psychometric function is the PSE and theslope is the threshold

A potentially important claim by Miller and Ulrich(2001) is that the SK method performs better than probitanalysis This is a surprising claim since their data aregenerated by a probit PF one would expect probit analy-sis to do well especially since probit analysis does espe-cially well with discrimination PFs that go from 0 to 100To help clarify this issue Miller and I did a number ofsimulations We focused on the case of N 5 40 with 8 tri-als at each of 5 levels The cumulative normal PF (Equa-tion 14) was used with equally spaced z scores and with thelowest and highest probabilities being 1 and 99 Theprobabilities at the sample points are P(i) 5 1 1222

m1

2 21

2

11

2

=

= [ ] +

aring

aring

pdf( )( ) ( )

( ) ( )( ) ( )

is i s i

P i P is i s i

MEASURING THE PSYCHOMETRIC FUNCTION 1441

50 8778 and 99 corresponding to z scores ofz(i) 5 ndash2328 1164 0 1164 and 2328 Nonmonot-onicity is eliminated as Miller and Ulrich suggest (seethe Appendix for Matlab Code 4 for monotonizing) I didMonte Carlo simulations to obtain the standard deviationof the location of the PF micro1 given by Equation38 I foundthat both the SK method and the parametric probit method(assuming a fixed unity slope) gave a standard deviationof between 028 and 029 Miller and Ulrichrsquos result re-ported in their Table 17 gives a standard error of 0071However their Gaussian had a standard deviation of 025whereas my z-score units correspond to a standard devi-ation of unity Thus their standard error must be multipliedby 4 giving a standard error of 0284 in excellent agree-ment with my value So at least in this instance the SKmethod did not do better than the parametric method (withslope fixed) and the surprise mentioned at the beginningof this paragraph did not materialize I did not examineother examples where nonparametric methods might dobetter than the matched parametric method However thesimulations of McKee et al (1985) give some insight intowhy probit analysis with floating slope can do poorly evenon data generated from probit PFs McKee et al showthat when the number of trials is low and if stimulus lev-els near both 0 and 100 are not tested the slope can bevery flat and the predicted threshold can be quite distantIf care is not taken to place a bound on these probit esti-mates one would expect to find deviant threshold esti-mates The SK method on the other hand has built-inbounds for threshold estimates so outliers are avoidedThese bounds could explain why SK can do better thanprobit When the PF slope is uncertain nonparametricmethods might work better than parametric methods adecided advantage for the SK method

It is instructive to also do an analytic calculation of thevariance based on an analysis similar to what was doneearlier If only the ith level of the five had been testedthe variance of the PF location (the PSE in a discrimina-tion task) would have been as follows (Gourevitch ampGalanter 1967)

(39)

where var(P) is given by Equation 27 and PF slope is

(40)

where for the cumulative normal dzi dxi 5 1s where sis the standard deviation of the pdf and

(41)

from Equation 3A By combining these equations one gets

(42)

Note that if all trials were placed at the optimal stimuluslocation of z 5 0 (P 5 5) Equation 42 would becomes2p 2Ni as discussed earlier When all five levels arecombined the total variance is smaller than each individ-ual variance as given by the variance summation formula

(43)

Finally upon plugging in the five values of zi rangingfrom ndash2328 to 12328 and Pi ranging from 1 to 99the standard error of the PF location is

(44)

This value is precisely the value obtained in our MonteCarlo simulations of the nonparametric SK method andalso by the slope fixed parametric probit analysis Thistriple confirmation shows that the SK method is excel-lent in the realm where it applies (equal number of trialsat levels that span 0 to 100) It also confirms our sim-ple analytic formula

The data presented in Miller and Ulrichrsquos Table 17 giveus a fine opportunity to ask about the efficiency of themethod of constant stimuli that they use For the N 5 40example discussed here the variance is

(45)

This value is more than twice the variance of p (2 Ntot)that is obtained for the same PF using adaptive methodsincluding the simple upndashdown staircase that place trialsnear the optimal point

The large variance for the Miller and Ulrich stimulusplacement is not surprising given that of the five stimu-lus levels the two at P 5 1 and 99 are quite inefficientThe inefficient placement of trials is a common problemfor the method of constant stimuli as opposed to adaptivemethods However as I mentioned earlier the method ofconstant stimuli combined with multiple response ratingsin a signal detection framework may be the best of allmethods for detection studies for revealing properties of thesystem near threshold while maintaining a high efficiency

A curious question comes up about the optimal placementof trials for measuring the location of the psychometric func-tion With the original function the optimal placement is atthe steepest part of the function (at P 5 0) However withthe SK method when one is determining the location ofthe pdf one might think that the optimal placement of sam-ples should occur at the points where the pdf is steepestThis contradiction is resolved when one realizes that onlythe original samples are independent The pdf samples areobtained by taking differences of adjacent PF values soneighboring samples are anticorrelated Thus one cannot

var tottot

= =SEN

2 3 24

SE = =var tot12 0 2844

var var tot = aring1 1i

var( )exp

PSE i i i

i

i

P Pz

N= ( ) ( )

2 12

2

ps

dP

dz

zi

i

iaeligegraveccedil

oumloslashdivide

=( )2 2

2

exp

p

dP

dx

dP

dz

dz

dxi

i

i

i

i

i

= times

var( )var

PSE i

i

i

i

P

dP

dx

=( )

aeligegraveccedil

oumloslashdivide

2

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

MEASURING THE PSYCHOMETRIC FUNCTION 1425

tervals are too close together in time (800 msec) or a cog-nitive effect does not matter for the present article Whatdoes matter is that this bias produces a downward bias indcent With feedback lots of practice and lots of experiencebeing a subject I was able to reduce this interval bias andequalize the number of times I responded with each in-terval Naiumlve subjects may have a more difficult time re-ducing the bias In this section I show that there is a verysimple method for removing the interval bias by con-verting the 2AFC data to a PF that goes from 0 to 100

The recognition of bias in 2AFC is not new Green andSwets (1966) in their Appendix III34 point out that thebias in choice of interval does result in a downward biasin dcent However they imply that the effect of this bias issmall and can typically be ignored I should like to ques-tion that implication by using one of their own examplesto show that the bias can be substantial

In a run of 200 trials the Green and Swets example(Green amp Swets 1966 p 410) has 95 out of 100 correctwhen the test stimulus is in the second interval (z2 51645) and 50 out of 100 correct (z1 5 0) when the stim-ulus is in the first interval The standard 2AFC way to an-alyze these data would be to average the probabilities(95 1 50) 2 5 725 correct (zcorrect 5 0598) cor-responding to dcent 5 zIuml2 5 0845 However Green andSwets (p 410) point out that according to signal detec-tion theory one should analyze this data by averaging thez scores rather than averaging the probabilities or

(7)

The ratio between these two ways of calculating dcent is11630845 5 1376 Since dcent is approximately linearlyrelated to signal strength in discrimination tasks this 38reduction in dcent corresponds to an erroneous 38 increasein predicted contrast discrimination threshold when onecalculates threshold the standard way Note that if therehad been no bias so that the responses would be approx-imately equally divided across the two intervals then z2z1 and Equation 7 would be identical to the more familiarEquation 6 Since bias is fairly common especially amongnew observers the use of Equation 7 to calculate dcent seemsmuch more reasonable than using Equation 6 It is surpris-ing that the bias correction in Equation 7 is rarely used

Green and Swets (1966) present a different analysis In-stead of comparing dcent values for the biased versus nonbi-ased conditions they convert the dcents back to percent cor-rect The corrected percent correct (corresponding to dcent 51163) is 795 In terms of percent correct the bias seemsto be a small effect shifting percent correct a mere 7 from725 to 795 However the dcent ratio of 138 is a bettermeasure of the bias since it is directly related to the errorin discrimination threshold estimates

A further comment on the magnitude of the bias may beuseful The preceding subsection discussed the criterionbias of yesno methods which contributes linearly to dcent

(Equation 5) The present section discusses the 2AFC in-terval bias that contributes quadratically to dcent Thus forsmall amounts of bias the decrease in dcent is negligibleHowever as I have pointed out the bias can be large enoughto make a significant contribution to dcent

It is instructive to view the bias correction for the fullPF corresponding to this example Cumulative normalPFs are shown in the upper left panel of Figure 2 Thecurves labeled C1 and C2 are the probability correct forthe first and second intervals I1 and I2 The asterisks cor-respond to the stimulus strength used in this example at50 and 95 correct The dotndashdashed line is the aver-age of the two PFs for the individual intervals The slopeof the PF is set by the dotndashdashed line (the averageddata) being at 50 (the lower asymptote) for zero stim-ulus strength The lower left panel is the z-score version ofthe upper panel The dashed line is the average of the twosolid lines for z scores in I1 and I2 This is the signal de-tection method of averaging that Green and Swets (1966)present as the proper way to do the averaging The dotndashdashed line shown in the upper left panel is the z score ofthe average probability Notice that the z score for the av-eraged probability is lower than the averaged z score in-dicating a downward bias in dcent due to the interval bias asdiscussed at the beginning of this section The right pairof panels are the same as the left pair except that insteadof plotting C1 we plot 1 C1 the probability of re-sponding I2 incorrectly The dashed line is half the dif-ference between the two solid lines The other differenceis that in the lower right panel we have multiplied thedashed and dashndashdotted lines by Iuml2 so that these linesare dcent values rather than z scores

The final step in dealing with the 2AFC bias is to flipthe 1 C1 curve horizontally to negative abscissa valuesas in Figure 3 The ordinate is still the probability correctin interval 2 The abscissa becomes the difference instimulus strengths between I2 and I1 The flipped branchis the probability of responding I2 when the I2 stimulusstrength is less than that of I1 (an incorrect response)Figure 3a with the ordinate going from 0 to 100 is theproper way to represent 2AFC discrimination data Theother item that I have changed is the scale on the abscissato show what might happen in a real experiment The or-dinate values of 50 and 95 for the Green and Swets(1966) example have been placed at a contrast differenceof 5 The negative 5 value corresponds to the case inwhich the positive test pattern is in the f irst intervalThreshold corresponds to the inverse of the PF slopeThe bottom panel shows the standard signal detectionrepresentation of the signal in I1 and I2 dcent is the distancebetween these symmetric stimuli in standard deviationunits The Gaussians are centered at 65 The verticalline at 5 is the criterion such that 50 and 95 ofthe area of the two Gaussians is above the criterion Thez-score difference of 1645 between the two Gaussiansmust be divided by Iuml2 to get dcent because each trial hadtwo stimulus presentations with independent informa-

cent =+( )

= =dz z

22

1 6452

1 1632 1

1426 KLEIN

tion for the judgment This procedure identical to Equa-tion 7 gives the same dcent as before

Three Distinctions for Clarifying PFsIn dealing with PFs three distinctions need to be made

yesno versus forced choice detection versus discrimi-nation and constant stimuli versus adaptive methodsThese distinctions are usually clear but I should like topoint out some subtleties

For the forced choice versus yesno distinction thereare two sorts of forced choice tasks The standard versionhas multiple intervals separated spatially or temporallyand the stimulus is in only one of the intervals In the otherversion one of N stimuli is shown and the observer re-

sponds with a number from 1 to N For example Strasburg-er (2001b) presented 1 of 10 letters to the observer in a10AFC task Yesno tasks have some similarity to the lattertype of forced choice task Consider for example a detec-tion experiment in which one of five contrasts (includinga blank) are presented to the observer and the observer re-sponds with numbers from 1 to 5 This would be classifiedas a rating scale method of constant stimuli yesno tasksince only a single stimulus is presented and the rating isbased on a one-dimensional intensity

The detectiondiscrimination distinction is usually basedon whether the reference stimulus is a natural zero pointFor example suppose the task is to detect a high spatial fre-quency test pattern added to a spatially identical reference

0 1 2 30

2

4

6

8

1

C1C2

average

(a)

prob

abili

ty c

orre

ct

0 1 2 30

2

4

6

8

1

1 C1

C2

diff2

(b)

prob

abili

ty o

f I2

resp

onse

0 1 2 31

0

1

2

3

4(c)

C1

C2

z-average

stimulus strength

z-sc

ore

corr

ect

0 1 2 31

0

1

2

3

4

1 C1

C2

biased d cent

(d)

unbiased d cent

stimulus strength

z-sc

ore

for

I2 r

espo

nse

average prob

Figure 2 2AFC psychometric functions with a strong bias in favor of responding Interval 2 (I2) The bias is chosen from aspecific example presented by Green and Swets (1966) such that the observer has 50 and 95 correct when the test is in I1and I2 respectively These points are marked by asterisks The psychometric function being plotted is a cumulative normal Inall panels the abscissa is xt the stimulus strength (a) The psychometric functions for probability correct in I1 and I2 are shownand labeled C1 and C2 The average of the two probabilities labeled average is the dotndashdashed line it is the curve that is usu-ally reported The diamond is at 725 the average percent correct of the two asterisks (b) Same as panel a except that insteadof showing C1 we show 1ndashC1 the probability of saying I2 when the test was in I1 The abscissa is now labeled ldquoprobability ofI2 responserdquo (c) z scores of the three probabilities in panel a An additional dashed line is shown that is the average of the C1and C2 z-score curves The diamond is the z-score of the diamond in panel a and the star is the z-score average of the two panel casterisks (d) The sign of the C2 curve in panel c is flipped to correspond to panel b The dashed and dotndashdashed lines of panel chave been multiplied by Iuml2 in panel d so that they become dcent

MEASURING THE PSYCHOMETRIC FUNCTION 1427

pattern If the reference pattern has zero contrast the taskis detection If the reference pattern has a high contrast thetask is discrimination Klein (1985) discusses these tasks interms of monopolar and bipolar cues For discriminationa bipolar cue must be available whereby the test pattern canbe either positive or negative in relation to the reference Ifone cannot discriminate the negative cue from the positivethen it can be called a detection task

Finally the constant stimuli versus adaptive method dis-tinction is based on the formerrsquos having preassigned testlevels and the latterrsquos having levels that shift to a desiredplacement The output of the constant stimulus method is

a full PF and is thus fully entitled to be included in this spe-cial issue The output of adaptive methods is typically onlya single number the threshold specifying the location butnot the shape of the PF Two of the papers in this issue(Kaernbach 2001b Strasburger 2001b) explore the pos-sibility of also extracting the slope from adaptive data thatconcentrates trials around one level Even though adaptivemethods do not measure much about the PF they are sopopular that they are well represented in this special issue

The 10 rows of Table 1 present the articles in this specialissue (including the present article) Columns 2ndash5 corre-spond to the four categories associated with the first two

Figure 3 2AFC discrimination PF from Figure 2 has been extended to the 0 to 100 range without rescaling the or-dinate In panels a and b the two right-hand panels of Figure 2 have been modified by flipping the sign of the abscissa ofthe negative slope branch where the test is in Interval 1 (I1) The new abscissa is now the stimulus strength in I2 minus thestrength in I1 The abscissa scaling has been modified to be in stimulus units In this example the test stimulus has a con-trast of 5 The ordinate in panel a is the probability that the response is I2 The ordinate in panel b is the z score of thepanel a ordinate The asterisks are at the same points as in Figure 2 Panel c is the standard signal detection picture whennoise is added to the signal The abscissa has been modified to being the activity in I2 minus the activity in I1 The unitsof activation have arbitrarily been chosen to be the same as the units of panels a and b Activity distributions are shownfor stimuli of 5 and 15 units corresponding to the asterisks of panels a and b The subjectrsquos criterion is at 5 units ofactivation The upper abscissa is in z-score units where the Gaussians have unit variance The area under the two distri-butions above the criterion are 50 and 95 in agreement with the probabilities shown in panel a

15 10 5 0 5 10 150

5

1

prob

abili

ty o

f res

pons

e =

2

(a)

15 10 5 0 5 10 152

0

2

4

z-sc

ore

of r

espo

nse

= 2

(b)

contrast of I2 minus contrast of I1

15 10 5 0 5 10 150

05

1

activity in I2 minus activity in I1

responsewhen test in I1

(c)

response when test in I2

00822 0822z-score of activity in I2 minus activity in I1 (unit variance Gaussians)

d cent = 1645

crite

rion

16451645

resp

onse

1428 KLEIN

distinctions The last column classifies the articles accord-ing to the adaptive versus constant stimuli distinction Inorder to open up the full variety of PFs for discussion andto enable a deeper understanding of the relationship amongdifferent types of PFs I will now clarify the interrelation-ship of the various categories of PFs and their connectionto the signal detection approach

Lack of adaptive method for yesno tasks with con-trolled false alarm rate In Section II I will bring up anumber of advantages of the yesno task in comparisonwith to the 2AFC task (see also Kaernbach 1990) Giventhose yesno advantages one may wonder why the 2AFCmethod is so popular The usual answer is that 2AFC hasno response bias However as has been discussed the yesno method allows an unbiased dcent to be calculated and the2AFC method does allow an interval bias that affects dcentAnother reason for the prevalence of 2AFC experimentsis that a multitude of adaptive methods are available for2AFC but barely any available for an objective yesnotask in which the false alarm rate is measured so that dcentcan be calculated In Table 1 with one exception the rowswith adaptive methods are associated with the forcedchoice method The exception is Leek (2001) who dis-cusses adaptive yesno methods in which no blank trialsare presented In that case the false alarm rate is notmeasured so dcent cannot be calculated This type of yesnomethod does not belong in the ldquoobjectiverdquo category of con-cern for the present paper The ldquo1990rdquo entries in Table 1refer to Kaernbachrsquos (1990) description of a staircasemethod for an objective yesno task in which an equalnumber of blanks are intermixed with the signal trialsSince Kaernbach could have used that method for Kaern-bach (2001b) I placed the ldquo1990rdquo entry in his slot

Kaernbachrsquos (1990) yesno staircase rules are simpleThe signal level changes according to a rule such as thefollowing Move down one level for correct responses ahit ltyes|signalgt or a correct rejection ltno|blankgt moveup three levels for wrong responses a miss ltno|signalgtor a false alarm ltyes|blankgt This rule similar to the onedownndashthree up rule used in 2AFC places the trials so thatthe average of the hit rate and correct rejection rate is 75

What is now needed is a mechanism to get the observer toestablish an optimal criterion that equalizes the number ofldquoyesrdquo and ldquonordquo responses (the ROC negative diagonal)This situation is identical to the problem of getting 2AFCobservers to equalize the number of responses to Inter-vals 1 and 2 The quadratic bias in dcent is the same in bothcases The simplest way to get subjects to equalize theirresponses is to give them feedback about any bias in theirresponses With equal ldquoyesrdquo and ldquonordquo responses the 75correct corresponds to a z score of 0674 and a dcent 5 2z 51349 If the subject does not have equal numbers of ldquoyesrdquoand ldquonordquo responses then the dcent would be calculated bydcent 5 zhit zfalse alarm I do hope that Kaernbachrsquos clever yesno objective staircase will be explored by others Oneshould be able to enhance it with ratings and multiple stim-uli (Klein 2002)

Forced choice detection As can be seen in the secondcolumn of Table 1 a popular category in this specialissue is the forced choice method for detection Thismethod is used by Strasburger (2001b) Kaernbach(2001a) Linschoten et al (2001) and Wichmann and Hill(2001a 2001b) These researchers use PFs based on prob-ability ordinates The connection between P and dcent is dif-ferent from the yesno case given by Equation 5 For2AFC signal detection theory provides a simple connec-tion between dcent and probability correct dcent(x) 5 Iuml2 z(x)In the preceding section I discussed the option of averag-ing the probabilities and then taking the z score (highthreshold approach) or averaging the z scores and then cal-culating the probability (signal detection approach) Thesignal detection method is better because it has a strongerempirical basis and avoids bias

For an m-AFC task with m 2 the connection betweendcent and P is more complicated than it is for m 5 2 The con-nection given by a fairly simple integral (Green amp Swets1966 Macmillan amp Creelman 1991) has been tabulatedby Hacker and Ratcliff (1979) and by Macmillan and Creel-man (1991) One problem with these tables is that they arebased on the assumption of unity ROC slope It is knownthat there are many cases in which the ROC slope is notunity (Green amp Swets 1966) so these tables connecting

Table 1Classification of the 10 Articles in This Special Issue

According to Three Distinctions Forced Choice Versus YesNo Detection Versus Discrimination Adaptive Versus Constant Stimuli

Detection Discrimination Adaptive or Source m-AFC YesNo m-AFC YesNo Constant Stimuli

Leek (2001) General loose g 3 loose g AWichmann amp Hill (2001a) 2AFC (3) (3) (3) CWichmann amp Hill (2001b) 2AFC (3) (3) (3) CLinschoten et al (2001) 2AFC AStrasburger (2001a) General 3 3 CStrasburger (2001b) 10AFC AKaernbach (2001a) General 3 AKaernbach (2001b) General (1990) 3 (1990) AMiller amp Ulrich (2001) for g 5 0 3 (for 0ndash100) 3 CKlein (present) General 3 3 3 Both

MEASURING THE PSYCHOMETRIC FUNCTION 1429

dcent and probability correct should be treated cautiouslyW P Banks and I (unpublished) investigated this topic andfound that near the P 5 50 correct point the dependenceof dcent on ROC slope is minimal Away from the 50 pointthe dependence can be strong

YesNo detection Linschoten et al (2001) comparethree methods (limits staircase 2AFC likelihood) for mea-suring thresholds with a small number of trials In themethod of limits one starts with a subthreshold stimulusand gradually increases the strength On each trial the ob-server says ldquoyesrdquo or ldquonordquo with respect to whether or not thestimulus is detected Although this is a classic yesno de-tection method it will not be discussed in this article be-cause there is no control of response bias That is blankswere not intermixed with signals

In Table 1 the Wichmann and Hill (2001a 2001b) arti-cles are marked with 3s in parentheses because althoughthese authors write only about 2AFC they mention that alltheir methods are equally applicable for yesno or m-AFCof any type (detectiondiscrimination) And their Matlabimplementations are fully general All the special issue ar-ticles reporting experimental data used a forced choicemethod This bias in favor of the forced choice methodol-ogy is found not only in this special issue it is widespreadin the psychophysics community Given the advantages ofthe yesno method (see discussion in Section II) I hopethat once yesno adaptive methods are accepted they willbecome the method of choice

m-AFC discrimination It is common practice to rep-resent 2AFC results as a plot of percent correct averagedover all trials versus stimulus strength This PF goes from50 to 100 The asymmetry between the lower andupper asymptotes introduces some inefficiency in thresh-old estimation as will be discussed Another problem withthe standard 50 to 100 plot is that a bias in choice ofinterval will produce an underestimate of dcent as has beenshown earlier Researchers often use the 2AFC method be-cause they believe that it avoids biased threshold estimatesIt is therefore surprising that the relatively simple correc-tion for 2AFC bias discussed earlier is rarely done

The issue of bias in m-AFC also occurs when m 2 InStrasburgerrsquos 10AFC letter discrimination task it is com-mon for subjects to have biases for responding with par-ticular letters when guessing Any imbalance in the re-sponse bias for different letters will result in a reductionof dcent as it did in 2AFC

There are several benefits of viewing the 2AFC dis-crimination data in terms of a PF going from 0 to 100Not only does it provide a simple way of viewing and cal-culating the interval bias it also enables new methods forestimating the PF parameters such as those proposed byMiller and Ulrich (2001) as will be discussed in Sec-tion III In my original comments on the Miller and Ulrichpaper I pointed out that because their nonparametric pro-cedure has uniform weighting of the different PF levelstheir method does not apply to 2AFC tasks The asymmet-ric binomial error bars near the 50 and 100 levelscause the uniform weighting of the Miller and Ulrich ap-

proach to be nonoptimal However I now realize that the2AFC discrimination task can be fit by a cumulative nor-mal going from 0 to 100 Because of that insight Ihave marked the discrimination forced choice column ofTable 1 for the Miller and Ulrich (2001) paper with theproviso that the PF goes from 0 to 100 Owing to thepopularity of 2AFC this modification greatly expands therelevance of their nonparametric approach

YesNo discrimination Kaernbach (2001b) and Millerand Ulrich (2001) offer theoretical articles that examineproperties of PFs that go from P 5 0 to 100 (P 5 pin Equation 1) In both cases the PF is the cumulative nor-mal (Equation 3) Although these PFs with g 5 0 couldbe for a yesno detection task with a zero false alarm rate(not plausible) or a forced choice detection task with an in-finite number of alternatives (not plausible either) I sus-pect that the authors had in mind a yesno discriminationtask (Table 1 col 5) A typical discrimination task in vi-sion is contrast discrimination in which the observer re-sponds to whether the presented contrast is greater than orless than a memorized reference Feedback reinforces thestability of the reference In a typical discrimination taskthe reference is one exemplar from a continuum of stimu-lus strengths If the reference is at a special zero pointrather than being an element of a smooth continuum thetask is no longer a simple discrimination task Zero con-trast would be an example of a special reference Klein(1985) discusses several examples which illustrate how anatural zero can complicate the analysis One might won-der how to connect the PF from the detection regime inwhich the reference is zero contrast to the discriminationregime in which the reference (pedestal) is at a high con-trast The dcent function to be introduced in Equation 20 doesa reasonably good job of fitting data across the full range ofpedestal strength going from detection to discrimination

The connection of the discrimination PF to the signaldetection PF is the same as that for yesno detection givenin Equation 5 dcent(x) 5 z(x) z(0) where z is the z scoreof the probability of a ldquogreater thanrdquo judgment In a dis-crimination task the bias z(0) is the z score of the prob-ability of saying ldquogreaterrdquo when the reference stimulus ispresented The stimulus strength x that gives z(x) 5 0 isthe point of subjective equality (x 5 PSE) If the cumu-lative normal PF of Equation 3 is used then the z scoreis linearly proportional to the stimulus strength z(x) 5(x PSE)threshold where threshold is defined to be thepoint at which dcent 5 1 I will come back to these distinc-tions between detection and discrimination PFs after pre-senting more groundwork regarding thresholds log ab-scissas and PF shapes (see Equations 14 and 17)

Definition of ThresholdThreshold is often defined as the stimulus strength that

produces a probability correct halfway up the PF If humansoperated according to a high-threshold assumption thisdefinition of threshold would be stable across different ex-perimental methods However as I discussed followingEquation 2 high-threshold theory has been discredited

1430 KLEIN

According to the more successful signal detection theory(Green amp Swets 1966) the dcent at the midpoint of the PFchanges according to the number of alternatives in aforced choice method and according to the false alarm ratein a yesno method This variability of dcent with method is agood reason not to define threshold as the halfway pointof the PF

A definition of threshold that is relatively independent ofthe method used for its measurement is to define thresholdas the stimulus strength that gives a fixed value of dcent Thestimulus strength that gives dcent 5 1 (76 correct for 2AFC)is a common definition of threshold Although I will showthat higher dcent levels have the advantage of giving more pre-cise threshold estimates unless otherwise stated I will takethreshold to be at dcent 5 1 for simplicity This definition ap-plies to both yesno and m-AFC tasks and to both detectionand discrimination tasks

As an example consider the case shown in Figure 1where the lower asymptote (false alarm rate) in a yesnodetection task is g 5 1587 corresponding to a z score ofz 5 ndash1 If threshold is defined to be at dcent 5 10 then fromEquation 5 the z score for threshold is z 5 0 correspond-ing to a hit rate of 50 (not quite halfway up the PF) Thisexample with a 50 hit rate corresponds to defining dcentalong the horizontal ROC axis If threshold had been de-fined to be dcent 5 2 in Figure 1 then the probability correctat threshold would be 8413 This example in which boththe hit rate and correct rejection rate are equal (both are8413) corresponds to the ROC negative diagonal

Strasburgerrsquos Suggestion on Specifying Slope A Logarithmic Abscissa

In many of the articles in this special issue a logarithmicabscissa such as decibels is used Many shapes of PFs havebeen used with a log abscissa Most delightful but frus-trating is Strasburgerrsquos (2001a) paper on the PF maximumslope He compares the Weibull logistic Quick cumula-tive normal hyperbolic tangent and signal detection dcentusing a logarithmic abscissa The present section is the out-come of my struggles with a number of issues raised byStrasburger (2001a) and my attempt to clarify them

I will typically express stimulus strength x in thresh-old units

(8)

where a is the threshold Stimulus strength will be ex-pressed in natural logarithmic units y as well as in lin-ear units x

(9)

where y 5 loge(x) is the natural log of the stimulus and Y5 loge(a) is the threshold on the log abscissa

The slope of the psychometric function P( yt) with a log-arithmic abscissa is

and the maximum slope is called bcent by Strasburger (2001a)A very different definition of slope is sometimes used bypsychophysicists the logndashlog slope of the dcent function aslope that is constant at low stimulus strengths for manyPFs The logndashlog dcent slope of the Weibull function is shownin the bottom pair of panels in Figure 1 The low-contrastlogndashlog slope is b for a Weibull PF (Equation 12) and bfor a dcent PF (Equation 20) Strasburger (2001a) shows howthe maximum slope using a probability ordinate is con-nected to the logndashlog slope using a dcent ordinate

A frustrating aspect of Strasburgerrsquos article is that theslope units of P (probability correct per loge) are not fa-miliar Then it dawned on me that there is a simple con-nection between slope with a loge abscissa slopelog 5[dP( yt)](dyt) and slope with a linear abscissa slopelin 5[dP(xt)](dxt) namely

(11)

because dyt dxt 5 [d loge(xt)] (dxt) 5 1xt At thresholdxt 5 1 ( yt 5 0) so at that point Strasburgerrsquos slope with alogarithmic axis is identical to my familiar slope plotted inthreshold units on a linear axis The simple connection be-tween slope on the log and linear abscissas converted me tobeing a strong supporter of using a natural log abscissa

The Weibull and cumulative normal psychometricfunctions To provide a background for Strasburgerrsquos ar-ticle I will discuss three PFs Weibull cumulative nor-mal and dcent as well as their close connections A com-mon parameterization of the PF is given by the Weibullfunction

(12)

where pweib(xt) the PF that goes from 0 to 100 is re-lated to Pweib(xt) the probability of a correct responseby Equation 1 b is the slope and k controls the definitionof threshold pweib(1) 5 1 k is the percent correct atthreshold (xt 5 1) One reason for the Weibullrsquos popular-ity is that it does a good job of fitting actual data In termsof logarithmic units the Weibull function (Equation 12) be-comes

(13)

Panel a of Figure 1 is a plot of Equation 12 (Weibull func-tion as a function of x on a linear abscissa) for the case b 52 and k 5 exp( 1) 5 0368 The choice b 5 2 makes theWeibull an upside-down Gaussian Panel d is the samefunction this time plotted as a function of y correspondingto a natural log abscissa With this choice of k the point ofmaximum slope as a function of yt is the threshold point( yt 5 0) The point of maximum slope at yt 5 0 is markedwith an asterisk in panel d The same point in panel a atxt 5 1 is not the point of maximum slope on a linear ab-scissa because of Equation 11 When plotted as a dcent func-tion [a z-score transform of P(xt)] in panel b the Weibullaccelerates below threshold and decelerates above thresh-

p y ktyt

weib( ) = ( )[ ]1exp

b

p x ktxt

weib ( ) = 1b

slopeslope

lin =( )

=( )

times =dP x

dx

dP y

dy

dy

dx xt

t

t

t

t

t t

log

slope y dPdy

dpdy

tt

t

( ) =

= ( )1 g

y x x y Yt t= ( ) = =log log e e a

x xt = a

(10A)

(10B)

MEASURING THE PSYCHOMETRIC FUNCTION 1431

old in agreement with a wide range of experimental dataThe acceleration and deceleration are most clearly seen bythe slope of dcent in the logndashlog coordinates of panel e Thelogndashlog dcent slope is plotted in panels c and f At xt 5 0 thelogndashlog slope is 2 corresponding to our choice of b 5 2The slope falls surprisingly rapidly as stimulus strength in-creases and the slope is near 1 at xt 5 2 If we had chosenb 5 4 for the Weibull function then the logndashlog dcent slopewould have gone from 4 at xt 5 0 to near 2 at xt 5 2Equation 20 will present a dcent function that captures thisbehavior

Another PF commonly used is based on the cumulativenormal (Equation 3)

(14A)

or

(14B)

where s is the standard error of the underlying Gaussianfunction (its connection to b will be clarified later) Notethat the cumulative normal PF in Equation 14 should beused only with a logarithmic abscissa because y goes to

yen needed for the 0 lower asymptote whereas theWeibull function can be used with both the linear (Equa-tion 12) and the log (Equation 13) abscissa

Two examples of Strasburgerrsquos maximum slope (hisEquations 7 and 17) are

(15)

for the Weibull function (Equation 13) and

(16)

for the cumulative normal (Equation 14) In Equation 15 Iset k in Equation 12 to be k 5 exp( 1) This choiceamounts to defining threshold so that the maximum slopeoccurs at threshold ( yt 5 0) Note that Equations 12ndash16are missing the (1 g) factor that are present in Strasburg-errsquos Equations 7 and 17 because we are dealing with prather than P Equations 15 and 16 provide a clear simpleand close connection between the slopes of the Weibull andcumulative normal functions so I am grateful to Stras-burger for that insight

An example might help in showing how Equation 16works Consider a 2AFC task (g 5 05) assuming a cumu-lative normal PF with s 5 10 According to Equation 16bcent 5 0399 At threshold (assumed for now to be the pointof maximum slope) P(xt 5 1) 5 75 At 10 abovethreshold P(xt 5 11) 75 1 01 bcent 5 07899 which isquite close to the exact value of 7896 This example showshow bcent defined with a natural log abscissa yt is relevantto a linear abscissa xt

Now that the logarithmic abscissa has been introducedthis is a good place to stop and point out that the log ab-scissa is fine for detection but not for discrimination since

the x 5 0 point is often in the middle of the range How-ever the cumulative normal PF is especially relevant to dis-crimination where the PF goes from 0 to 100 andwould be written as

(17A)

or

(17B)

where x 5 PSE is the point of subjective equality Theparameter a is the threshold since dcent(a) 5 zF(a)zF(0) 5 1 The threshold a is also s standard deviationsof the Gaussian probability density function (pdf ) Thereciprocal of a is the PF slope I tend to use the letter sfor a unitless standard deviation as occurs for the loga-rithmic variable y I use a for a standard deviation thathas the units of the stimulus x (like percent contrast) Thecomparison of Equation 14B for detection and Equation17B for discrimination is useful Although Equation 14is used in most of the articles in this special issue whichare concerned with detection tasks the techniques that I willbe discussing are also relevant to discrimination tasks forwhich Equation 17 is used

Threshold and the Weibull FunctionTo illustrate how a dcent 5 1 definition of threshold works

for a Weibull function let us start with a yesno task inwhich the false alarm rate is P(0) 5 1587 correspondingto a z score of zFA 5 1 as in Figure 1 The z score atthreshold (dcent 5 1) is zTh 5 zFA11 5 0 corresponding to aprobability of P(1) 5 50 From Equations 1 and 12 thek value for this definition of threshold is given by k 5[1 P(1)][1 P(0)] 5 05943 The Weibull function be-comes

(18A)

If I had defined dcent 5 2 to be threshold then zTh 5 zFA125 1 leading to k 5 (1 08413)(1 01587) 5 01886giving

(18B)

As another example suppose the false alarm rate in ayesno task is at 50 not uncommon in a signal detectionexperiment with blanks and test stimuli intermixed Thenthreshold at dcent 5 1 would occur at 8413 and the PFwould be

(18C)

with Pweib(0) 5 50 and Pweib(1) 5 8413 This casecorresponds to defining dcent on the ROC vertical intercept

For 2AFC the connection between dcent and z is z 5 dcent205Thus zTh 5 2 05 5 07071 corresponding to Pweib(1) 57602 leading to k 5 04795 in Equation 12 These con-nections will be clarified when an explicit form (Equation20)is given for the dcent function dcent(xt)

P xtxt

weib ( ) = 1 0 5 0 3173 b

P xtxt

weib ( ) = 1 1 0 1587 0 1886( ) b

P xtxt

weib ( ) = 1 1 0 1587 0 5943( ) b

z x xtF ( ) = PSE

a

p x P x xt tF F F( ) = ( ) = aelig

egraveoumloslash

PSEa

cent = =bs p s

12

0 399

cent = =b b bexp( ) 1 0 368

z yy y Y

tt( ) = =s s

p yy

tt

F F( ) =aeligegraveccedil

oumloslashdivides

1432 KLEIN

Complications With Strasburgerrsquos Advocacy of Maximum Slope

Here I will mention three complications implicated inStrasburgerrsquos suggestion of defining slope at the point ofmaximum slope on a logarithmic abscissa

1 As can be seen in Equation 11 the slopes on linearand logarithmic abscissas are related by a factor of 1xtBecause of this extra factor the maximum slope occursat a lower stimulus strength on linear as compared withlog axes This makes the notion of maximum slope lessfundamental However since we are usually interested inthe percent error of threshold estimates it turns out thata logarithmic abscissa is the most relevant axis support-ing Strasburgerrsquos log abscissa definition

2 The maximum slope is not necessarily at threshold(as Strasburger points out) For the Weibull functions de-fined in Equation 13 with k 5 exp( 1) and the cumula-tive normal function in Equation 14 the maximum slope(on a log axis) does occur at threshold However the twopoints are decoupled in the generalized Weibull functiondefined in Equation 12 For the Quick version of theWeibull function (Equation 13 with k 5 05 placingthreshold halfway up the PF) the threshold is below thepoint of maximum slope the derivative of P(xt) atthreshold is

which is slightly different from the maximum slope asgiven by Equation 15 Similarly when threshold is de-fined at dcent 5 1 the threshold is not at the point of max-imum slope People who fit psychometric functions wouldprobably prefer reporting slopes at the detection thresh-old rather than at the point of maximum slope

3 One of the most important considerations in selectinga point at which to measure threshold is the question ofhow to minimize the bias and standard error of the thresh-old estimate The goal of adaptive methods is to place tri-als at one level In order to avoid a threshold bias due to aimproper estimate of PF slope the test level should be atthe defined threshold The variance of the threshold esti-mate when the data are concentrated as a single level (aswith adaptive procedures) is given by Gourevitch and Gal-anter (1967) as

(19)

If the binomial error factor of P(1 P)N were not presentthe optimal placement of trials for measuring thresholdwould be at the point of maximum slope However thepresence of the P(1 P) factor shifts the optimal point toa higher level Wichmann and Hill (2001b) consider howtrial placement affects threshold variance for the methodof constant stimuli This topic will be considered in Sec-tion II

Connecting the Weibull and dcent FunctionsFigures 5ndash7 of Strasburger (2001a) compare the logndash

log dcent slope b with the Weibull PF slope b Strasburgerrsquosconnection of b 5 088b is problematic since the dcent ver-sion of the Weibull PF does not have a fixed logndashlog slopeAn improved dcent representation of the Weibull function isthe topic of the present section A useful parameterizationfor dcent which I shall refer to as the StromeyerndashFoley func-tion was introduced by Stromeyer and Klein (1974) andused extensively by Foley (1994)

(20)

where xt is the stimulus strength in threshold units as inEquation 8 The factors with a in the denominator are pres-ent so that dcent equals unity at threshold (xt 5 1 or x 5 a)At low xt dcent x t

ba The exponent b (the logndashlog slope ofdcent at very low contrasts) controls the amount of facilita-tion near threshold At high xt Equation 20 becomes dcentx t

1 w (1 a) The parameter w is the logndash log slope of thetest threshold versus pedestal contrast function (the tvc orldquodipperrdquo function) at strong pedestals One must be a bitcautious however because in yesno procedures the tvcslope can be decoupled from w if the signal detection ROCcurve has nonunity slope (Stromeyer amp Klein 1974) Theparameter a controls the point at which the dcent function be-gins to saturate Typical values of these unitless parametersare b 5 2 w 5 05 and a 5 06 (Yu Klein amp Levi 2001)The function in Equation 20 accelerates at low contrast(logndashlog slope of b) and decelerates at high contrasts (logndashlog slope of 1 w) in general agreement with a broadrange of data

For 2AFC z 5 dcentIuml2 so from Equation 3 the connec-tion between dcent and probability correct is

(21)

To establish the connection between the PFs specified inEquations12 and 20ndash21 one must first have the two agreeat threshold For 2AFC with the dcent 5 1 the threshold atP 5 7602 can be enforced by choosing k 5 04795 (seethe discussion following Equation 18C) so that Equa-tions 1 and 12 become

(22)

Modifying k as in Equation 22 leaves the Weibull shapeunchanged and shifts only the definition of threshold Ifb 5 106b w 5 1 039b and a 5 0614 then for allvalues of b the Weibull and dcent functions (Equations 21and 22 vs Equation 12) differ by less than 00004 for allstimulus strengths At very low stimulus strengths b 5b The value b 5 106b is a compromise for getting anoverall optimal fit

Strasburger (2001a) is concerned with the same issueIn his Table 1 he reports dcent logndashlog slopes of [8847

P xtxt( ) = 1 1 5 4795( )

b

P xd x

tt( ) = +

cent( )eacute

eumlecircecirc

5 5erf2

cent( ) =d xx

a a xt

tb

tb w+ 1 + 1( )

var( )( )

YP P

N

dP y

dy

t

t

=( )eacute

eumlecircecirc

12

cent =( )

= =b g b g bthresh

dP x

dx

t

t

e( )log ( )

( ) 12

2347 1

MEASURING THE PSYCHOMETRIC FUNCTION 1433

18379 3131 4421] for b 5 [1 2 35 5] If the dcent slopeis taken to be a measure of the dcent exponent b then theratio bb is [08847 09190 08946 08842] for the fourb values Our value of bb 5 106 differs substantiallyfrom 08 (Pelli 1985) and 088 (Strasburger 2001a) Pelli(1985) and Strasburger (2001a) used dcent functions with a 51 in Equation 20 With a 5 0614 our dcent function (Equa-tion 20) starts saturating near threshold in agreement withexperimental data The saturation lowers the effective valueof b I was very surprised to find that the StromeyerndashFoley dcent function did such a good job in matching theWeibull function across the whole range of b and x Fora long time I had thought a different fit would be neededfor each value of b

For the same Weibull function in a yesno method (falsealarm rate 5 50) the parameters b and w are identicalto the 2AFC case above Only the value of a shifts froma 5 0614 (2AFC) to a 5 054 (yesno) In order to makext 5 1 at dcent 5 1 k becomes 03173 (see Equation 18C)instead of 04795 (2AFC) As with 2AFC the differencebetween the Weibull and dcent function is less than 0004(lt04) for all stimulus strengths and all values of b andfor both a linear and a logarithmic abscissa These valuesmean that for both the yes no and the 2AFC methods thedcent function (Equation 20) corresponding to a Weibullfunction with b 5 2 has a low-contrast logndashlog slope ofb 5 212 and a high-contrast logndashlog slope of 1 w 5078 The high-contrast tvc (test contrast vs pedestal con-trast discrimination function sometimes called the ldquodip-perrdquo function) logndashlog slope would be w 5 022

I also carried out a similar analysis asking what cumu-lative normal s is best matched to the Weibull I founds 5 11720b However the match is poor with errors of4 (rather than lt004 in the preceding analysis) Thepoor fit of the two functions makes the fitting proceduresensitive to the weighting used in the fitting procedure Ifone uses a weighting factor of [P(1 P)N] where P is thePF one will find a quite different value for s than if oneignored the weighting or if one chose the weighting on thebasis of the data rather than the fitting function

II COMPARING EXPERIMENTAL TECHNIQUES FOR MEASURING

THE PSYCHOMETRIC FUNCTION

This section inspired by Linschoten et al (2001) is anexamination of various experimental methods for gather-ing data to estimate thresholds Linschoten et al comparethree 2AFC methods for measuring smell and taste thresh-olds a maximum-likelihood adaptive method an upndashdownstaircase method and an ascending method of limits Thespecial aspect of their experiments was that in measuringsmell and taste each 2AFC trial takes between 40 and60 sec (Lew Harvey personal communication) because ofthe long duration needed for the nostrils or mouth to re-cover their sensitivity The purpose of the Linschoten et alpaper is to examine which of the three methods is mostaccurate when the number of trials is low Linschoten et al

conclude that the 2AFC method is a good method for thistask My prior experience with forced choice tasks and lownumber of trials (Manny amp Klein 1985 McKee et al1985) made me wonder whether there might be better meth-ods to use in circumstances in which the number of trialsis limited This topic is considered in Section II

Compare 2AFC to YesNoThe commonly accepted framework for comparing

2AFC and yesno threshold estimates is signal detectiontheory Initially I will make the common assumption thatthe dcent function is a power function (Equation 20 witha 5 1)

(23)

Later in this section the experimentally more accurateform Equation 20 with a 1 will be used The varianceof the threshold estimate will now be calculated for 2AFCand yesno for the simplest case of trials placed at a sin-gle stimulus level y the goal of most adaptive methodsThe PF slope relates ordinate variance to abscissa vari-ance (from Gourevitch amp Galanter 1967)

(24)

where Y 5 y yt is the threshold estimate for a natural logabscissa as given in Equation 9 This formula for var(Y )is called the asymptotic formula in that it becomes exactin the limit of large number of total trials N Wichmannand Hill (2001b) show that for the method of constantstimuli a proper choice of testing levels brings one to theasymptotic region for N 120 (the smallest value thatthey explored) Improper choice of testing levels requiresa larger N in order to get to the asymptotic region Equa-tion 24 is based on having the data at a single level as oc-curs asymptotically in adaptive procedures In addition thedefinition of threshold must be at the point being testedIf levels are tested away from threshold there will be anadditional contribution to the threshold variance (Finney1971 McKee et al 1985) if the slope is a variable param-eter in the fitting procedure Equation 24 reaches its as-ymptotic value more rapidly for adaptive methods with fo-cused placement of trials than with the constant stimulusmethod when slope is a free parameter

Equation 24 needs analytic expressions for the deriva-tive and the variance of dcent The derivative is obtained fromEquation 23

(25)

The variance of dcent for both yesno (dcent 5 zhit 1 zcorrect rejection)and 2AFC (dcent 5 Iuml2zave) is

(26)var var( ) exp var( ) cent( ) = = ( )d z z P2 4 2p

dddy

bdt

cent = cent

var( )var( )

Yd

dddyt

= cent

centaeligegraveccedil

oumloslashdivide

2

cent = = ( )d c bytb

texp

1434 KLEIN

For the yes no case Equation 26 is based on a unityslope ROC curve with the subjectrsquos criterion at the neg-ative diagonal of the ROC curve where the hit rate equalsthe correct rejection rate This would be the most effi-cient operating point The variance for a nonunity slopeROC curve was considered by Klein Stromeyer andGanz (1974) From binomial statistics

(27)

with n 5 N for 2AFC and n 5 N2 for yesno since forthis binary yesno task the number of signal and blanktrials is half the total number of trials

Putting all of these factors together gives

(28)

for both the yesno and the 2AFC cases The only dif-ference between the two cases is the definition of n andthe definition of dcent (dcent2AFC 5 205z and dcentYN 5 2z for asymmetric criterion) When this is expressed in terms ofN (number of trials) and z (z score of probability cor-rect) we get

(29)

for both 2AFC and yesno That is when the percent cor-rects are all equal (P2AFC 5 Phit 5 Pcorrect rejection) thethreshold variance for 2AFC and yesno are equal Theminimum value of var(Y ) is 164(Nb2) occurring at z 5157 (P 5 94) This value of 94 correct is the optimumvalue at which to test for both 2AFC and yesno tasks in-dependent of b

This result that the threshold variance is the same for2AFC and yesno methods seems to disagree with Figure 4of McKee et al (1985) where the standard errors of thethreshold estimates are plotted for 2AFC and yesno asa function of the number of trials The 2AFC standarderror (their Figure 4) is twice that of yesno correspond-ing to a factor of four in variance However the McKeeet al analysis is misleading since all they (we) did for theyesno task was to use Equation 2 to expand the PF to a 0to 100 range thereby doubling the PF slope Thatrescaling is not the way to convert a 2AFC detection taskto the yesno task of the same detectability A signal de-tection methodology is needed for that conversion aswas done in the present analysis

One surprise is that the optimal testing probability isindependent of the PF slope b This finding is a generalproperty of PFs in which the dependence on the slopeparameter has the form xb The 94 level corresponds todcentYN 5 315 and dcent2AFC 5 223 For the commonly usedP 5 75 (z 5 0765 dcentYN 5 135 dcent2AFC 5 0954)var(Y ) 5 408(Nb2) which is about 25 times the opti-mal value obtained by placing the stimuli at 94 Green(1990) had previously pointed out that the 94 and 92levels were the optimal points to test for the cumulative

normal and logit functions respectively because of theminimum threshold estimate variability when one is test-ing at that point on the PF Other PFs can have optimalpoints at slightly lower levels but these are still above thelevels to which adaptive methods are normally directed

It is also useful to compare the 2AFC and yesno at thesame dcent rather than at their optimal points In that case fordcent values fixed at [1 2 3] the ratio of the 2AFC varianceto the yesno variance is [182 136 079] At low dcent thereis an advantage for the 2AFC method and that reverses athigh dcent as is expected from the optimal dcent being higher foryesno than for 2AFC In general one should run the yesno method at dcent levels above 20 (84 correct in the blankand signal intervals corresponds to dcent 5 2)

The equality of the optimal var(Y ) for 2AFC and yesno methods is not only surprising it seems paradoxicalmdashas can be seen by the following gedankenexperiment Sup-pose subjects close their eyes on the first interval of the2AFC trial and base their judgments on just the secondinterval as if they were doing a yesno task since blanksand stimuli are randomly intermixed Instead of sayingldquoyesrdquo or ldquonordquo they would respond ldquoInterval 2rdquo or ldquoInter-val 1rdquo respectively so that the 2AFC task would become ayesno task The paradox is that one would expect the vari-ance to be worse than for the eyes open 2AFC case sinceone is ignoring information However Equation29 showsthat the 2AFC and yesno methods have identical optimalvariance I suspect that the resolution of this paradox is thatin the yesno method with a stable criterion the optimalvariance occurs at a higher dcent value (315) than the 2AFCvalue (223) The dcent power law function that I chose doesnot saturate at large dcent levels giving a benefit to the yesno method

In order to check whether the paradox still holds witha more realistic PF (a dcent function that saturates at largedcent ) I now compare the 2AFC and yesno variance usinga Weibull function with a slope b In order to connect2AFC and yesno I need to use signal detection theoryand the StromeyerndashFoley dcent function (Equation 20) withthe ldquoWeibull parametersrdquo b 5 106b a 5 0614 and w 51 39b Following Equation 22 I pointed out that withthese parameter values the difference between the 2AFCWeibull function and the 2AFC dcent function (converted toa probability ordinate) is less than 0004 After the de-rivative Equation 25 is appropriately modified we findthat the optimal 2AFC variance is 342(Nb2) For the yesno case the optimal variance is 402(Nb2) This result re-solves the paradox The optimal 2AFC variance is about18 lower than the yesno variance so closing onersquos eyesin one of the intervals does indeed hurt

We shouldnrsquot forget that if one counts stimulus presen-tations Np rather than trials N the optimal 2AFC variancebecomes 684(Npb2) as opposed to 402(Npb2) for yesnoThus for the Linschoten experiments with each stimuluspresentation being slow the yesno methodology is moreaccurate for a given number of stimulus presentationseven at low dcent values Kaernbach (1990) compares 2AFC

var( ) exp( )

( )Y z

P P

N bz= ( )2

122

p

var( ) exp( )

( )Y z

P P

n bd= ( ) cent

412

2p

var( )( )

PP P

n= 1

MEASURING THE PSYCHOMETRIC FUNCTION 1435

and yesno adaptive methods with simulations and ac-tual experiments His abscissa is number of presenta-tions and his results are compatible with our findings

The objective yesno method that I have been discussingis inefficient since 50 of the trials are blanks Greaterefficiency is possible if several points on the PF are mea-sured (method of constant stimuli) with the number of blanktrials the same as at the other levels For example if fivelevels of stimuli were measured including blanks then theblanks would be only 20 of the total rather than 50 Forthe 2AFC paradigm the subject would still have to look at50 of the presentations being blanks The yesno eff i-ciency can be further improved by having the subjectrespond to the multiple stimuli with multiple ratings ratherthan a binary yesno response This rating scale methodof constant stimuli is my favorite method and I have beenusing it for 20 years (Klein amp Stromeyer 1980) The mul-tiple ratings allow one to measure the psychometric func-tion shape for dcents as large as 4 (well into the dipper regionwhere stimuli are slightly above threshold) It also allowsone to measure the ROC slope (presence of multiplicativenoise) Simulations of its benefits will be presented inKlein (2002)

Insight into how the number of responses affects thresh-old variance can be obtained by using the cumulative nor-mal PF (Equation 17) considered by Kaernbach (2001b)and Miller and Ulrich (2001) with g 5 0 Typically this isthe case of yesno discrimination but it can also be 2AFCdiscrimination with an ordinate going from 0 to 100as I have discussed in connection with Figure 3 For a cu-mulative normal PF with a binary response and for a sin-gle stimulus level being tested the threshold variance is(Equation 19) as follows

(30)

from Equation 27 and forthcoming Equations 40 and 41The optimal level to test is at P 5 05 so the optimal vari-ance becomes

(31)

If instead of a binary response the observer gives an ana-log response (many many rating categories) then the vari-ance is the familiar connection between standard deviationand standard error

(32)

With four response categories the variance is 119s2Nwhich is closer to the analog than to the binary case Ifstimuli with multiple strengths are intermixed (methodof constant stimuli) then the benefit of going from a bi-nary response to multiple responses is greater than that in-dicated in Equations 31ndash32 (Klein 2002)

A brief list of other problems with the 2AFC method ap-plied to detection comprises the following

1 2AFC requires the subject to memorize and thencompare the results of two subjective responses It is cog-nitively simpler to report each response as it arrives Sub-jects without much experience can easily make mistakes(lapses) simply becoming confused about the order of thestimuli Klein (1985) discusses memory load issues rele-vant to 2AFC

2 The 2AFC methodology makes various types ofanalysis and modeling more difficult because it requiresthat one average over all criteria The response variable in2AFC is the Interval 2 activation minus the Interval 1 ac-tivation (see the abscissa in Figure 2) This variable goesfrom minus infinity to plus infinity whereas the yesnovariable goes from zero to infinity It therefore seems moredifficult to model the effects of probability summation anduncertainty for 2AFC than for yesno

3 Models that relate psychophysical performance to un-derlying processes require information about how the noisevaries with signal strength (multiplicative noise) The rat-ing scale method of constant stimuli not only measures dcentas a function of contrast it also is able to provide an esti-mate of how the variance of the signal distribution (theROC slope) increases with contrast The 2AFC methodlacks that information

4 Both the yesno and 2AFC methods are supposed toeliminate the effects of bias Earlier I pointed out that in myrecent 2AFC discrimination experiments when the stim-ulus is near threshold I find I have a strong bias in favor ofresponding ldquoStimulus 2rdquo Until I learned to compensatefor this bias (not typically done by naiumlve subjects) my dcentwas underestimated As I have discussed it is possible tocompensate for this bias but that is rarely done

Improving the Brownian (Up ndashDown) StaircaseAdaptive methods with a goal of placing trials at an

optimal point on the PF are efficient Leek (2001) pro-vides a nice overview of these methods There are twogeneral categories of adaptive methods those based onmaximum (or mean) likelihood and those based on sim-pler staircase rules The present section is concernedwith the older staircase methods which I will callBrownian methods I used the name ldquoBrownianrdquo becauseof the upndashdown staircasersquos similarity to one-dimensionalBrownian motion in which the direction of each step israndom The familiar upndashdown staircase is Brownianbecause at each trial the decision about whether to go upor down is decided by a random factor governed by prob-ability P(x) I want to make this connection to Brownianmotion because I suspect that there are results in thephysics literature that are relevant to our psychophysicalstaircases I hope these connections get made

A Brownian staircase has a minimal memory of the runhistory it needs to remember the previous step (includingdirection) and the previous number of reversals or numberof trials (used for when to stop and for when to decreasestep size) A typical Brownian rule is to decrease signalstrength by one step (like 1 dB) after a correct responseand to increase it three steps after an incorrect response (a

var( thresh) = s 2

N

var( thresh) = timesp s2

2

N

var(var( )

( )exp thresh) =aeligegraveccedil

oumloslashdivide

= ( )P

N dPdy

P P zN2

22

2 1p s

1436 KLEIN

one downndashthree up rule) I was pleased to learn recentlythat his rule works nicely for an objective yesno task(Kaernbach 1990) as well as 2AFC as discussed earlier

In recent years likelihood methods (Pentland 1980Watson amp Pelli 1983) have become popular and havesomewhat displaced Brownian methods There is a wide-spread feeling that likelihood methods have substantiallysmaller threshold variance than do staircases in whichthresholds are obtained by simply averaging the levelstested Given the importance of this issue for informingresearchers about how best to measure thresholds it issurprising that there have been so few simulations com-paring staircases and optimal methods Pentlandrsquos (1980)results showing very large threshold variance for a stair-case method are so dramatically different from the re-sults of my simulations that I am skeptical The best pre-vious simulations of which I am aware are those of Green(1990) who compared the threshold variance from sev-eral 2AFC staircase methods with the ideal variance(Equation 24) He found that as long as the staircase ruleplaced trials above 80 correct the ratio of observed toideal threshold variance was about 14 This value is thesquare of the inverse of Greenrsquos finding that the ratio ofthe expected to the observed sigma is about 85

In my own simulations (Klein 2002) I found that aslong as the final step size was less than 02s the thresh-old variance from simply averaging over levels was closeto the optimal value Thus the staircase threshold varianceis about the same as the variance found by other optimalmethods One especially interesting finding was that thecommon method of averaging an even number of rever-sal points gave a variance that was about 10 higher thanaveraging over all levels This result made me wonderhow the common practice of averaging reversals ever gotstarted Since Green (1990) averaged over just the reversalpoints I suspect that his results are an overestimate of thestaircase variance In addition since Green specifies stepsize in decibels it is difficult for me to determine whetherhis final step size was small enough Since the thresholdvariance is inversely related to slope it is important to re-late step size to slope rather than to decibels The most nat-ural reference unit is s the standard deviation associatedwith the cumulative normal In summary I suggest that thisgeneral feeling of distrust of the simple Brownian staircaseis misguided and that simple staircases are often as goodas likelihood methods In the section Summary of Advan-tages of Brownian Staircase I point out several advantagesof the simple staircase over the likelihood methods

A qualification should be made to the claim above thataveraging levels of a simple staircase is as good as a fancylikelihood method If the staircase method uses a too smallstep size at the beginning and if the starting point is farfrom threshold then the simple staircase can be inefficientin reaching the threshold region However at the end ofthat run the astute experimenter should notice the largediscrepancy between the starting point and the thresholdand the run should be redone with an improved startingpoint or with a larger initial step size that will be reducedas the staircase proceeds

Increase number of 2AFC response categories In-dependent of the type of adaptive method used the 2AFCtask has a flaw whereby a series of lucky guesses at lowstimulus levels can erroneously drive adaptive methods tolevels that are too low Some adaptive methods with a smallnumber of trials are unable to fully recover from a string oflucky guesses early in the run One solution is to allow thesubject to have more than two responses For reasons thatI have never understood people like to have only two re-sponse categories in a 2AFC task Just as I am reluctant totake part in 2AFC experiments I am even more reluctantto take part in two response experiments The reason issimple I always feel that I have within me more than onebit of information after looking at a stimulus I usually haveat least four or more subjective states (2 bits) for a yesnotask HN LN LY HY where H and L are high and lowconfidence and N and Y are did not and did see it For a2AFC my internal states are H1 L1 L2 H2 for high andlow confidence on Intervals 1 and 2 Kaernbach (2001a) inthis special issue does an excellent job of justifying thelow-confidence response category He provides solid ar-guments simulations and experimental data on how alow-confidence category can minimize the ldquolucky guessrdquoproblem

Kaernbach (2001a) groups the L1 and L2 categoriestogether into a ldquodonrsquot knowrdquo category and argues per-suasively that the inclusion of this third response can in-crease the precision and accuracy of threshold estimateswhile also leaving the subject a happier person He callsit the ldquounforced choicerdquo method Most of his article con-cerns the understandable fear of introducing a responsebias exactly what 2AFC was invented to avoid He has de-veloped an intelligent rule for what stimulus level shouldfollow a ldquodonrsquot knowrdquo response Kaernbach shows withboth simulations and real data that with his method thispotential response bias problem has only a second-ordereffect on threshold estimates (similar to the 2AFC inter-val bias discussed around Equation 7) and its advantagesoutweigh the disadvantages I urge anyone using the 2AFCmethod to read it I conjecture that Kaernbachrsquos method isespecially useful in trimming the lower tail of the distri-bution of threshold estimates and also in reducing the in-terval bias of 2AFC tasks since the low confidence ratingwould tend to be used for trials on which there is the mostuncertainty

Although Kaernbach has managed to convince me ofthe usefulness of his three-response method he might havetrouble convincing others because of the response biasquestion I should like to offer a solution that might quellthe response bias fears Rather than using 2AFC with threeresponses one can increase the number of responses tofourmdashH1 L1 L2 H2mdashas discussed two paragraphsabove The L rating can be used when subjects feel theyare guessing

To illustrate how this staircase would work considerthe case in which we would like the 2AFC staircase to con-verge to about 84 correct A 5 upndash1 down staircase(5 levels up for a wrong response and 1 level down for acorrect response) leads to an average probability correct

MEASURING THE PSYCHOMETRIC FUNCTION 1437

of 56 5 8333 If one had zero information about thetrial and guessed then one would be correct half the timeand the staircase would shift an average of (5 1)2 512 levels Thus in his scheme with a single ldquodonrsquot knowrdquoresponse Kaernbach would have the staircase move up2 levels following a ldquodonrsquot knowrdquo response One mightworry that continued use of the ldquodonrsquot knowrdquo responsewould continuously increase the level so that one wouldget an upward bias But Kaernbach points out that theldquodonrsquot knowrdquo response replaces incorrect responses to agreater extent than it replaces correct responses so thatthe 12 level shift is actually a more negative shift than the15 shift associated with a wrong response For my sug-gestion of four responses the step shifts would be ndash1 1113 and 15 levels for responses HC LC LI and HIwhere H and L stand for high and low confidence and Cand I stand for correct and incorrect A slightly more con-servative set of responses would be ndash1 0 14 15 inwhich case the low-confidence correct judgment leaves thelevel unchanged In all these examples including Kaern-bachrsquos three-response method the low-confidence re-sponse has an average level shift of 12 when one is guess-ing The most important feature of the low confidencerating is that when one is below threshold giving a lowconfidence rating will prevent one from long strings oflucky guesses that bring one to an inappropriately lowlevel This is the situation that can be difficult to overcomeif it occurs in the early phase of a staircase when the stepsize can be large The use of the confidence rating wouldbe especially important for short runs where the extra bitof information from the subject is useful for minimizingerroneous estimates of threshold

Another reason not to be fearful of the multiple-response2AFC method is that after the run is finished one can es-timate threshold by using a maximum likelihood proce-dure rather than by averaging the run locations Standardlikelihood analysis would have to be modified for Kaern-bachrsquos three-response method in order to deal with theldquodonrsquot knowrdquo category For the suggested four-responsemethod one could simply ignore the confidence ratings forthe likelihood analysis My simulations discussed earlierin this section and Greenrsquos (1990) reveal that averagingover the levels (excluding a number of initial levels toavoid the starting point bias) has about as good a preci-sion and accuracy as the best of the likelihood methodsIf the threshold estimates from the likelihood method andfrom the averaging levels method disagree that run couldbe thrown out

Summary of advantages of Brownian staircaseSince there is no substantial variance advantage to thefancier likelihood methods one can pay more attention toa number of advantages associated with simple staircases

1 At the beginning of a run likelihood methods mayhave too little inertia (the sequential stimulus levels jumparound easily) unless a ldquostiff rdquo prior is carefully chosenBy inertia I refer to how difficult it is for the next level todeviate from the present level At the beginning of a runone would like to familiarize the subject with the stimu-

lus by presenting a series of stimuli that gradually increasein difficulty This natural feature of the Brownian stair-case is usually missing in QUEST-like methods Theslightly inefficient first few trials of a simple staircasemay actually be a benefit since it gives the subject somepractice with slightly visible stimuli

2 Toward the end of a maximum likelihood run one canhave the opposite problem of excessive inertia wherebyit is difficult to have substantial shifts of test contrastThis can be a problem if nonstationary effects are presentFor example thresholds can increase in mid-run owingto fatigue or to adaptation if a suprathreshold pedestal ormask is present In an old-fashioned Brownian staircasewith moderate inertia a quick look at the staircase trackshows the change of threshold in mid-run That observa-tion can provide an important clue to possible problemswith the current procedures A method that reveals non-stationarity is especially important for difficult subjectssuch as infants or patients in whom fatigue or inatten-tion can be an important factor

3 Several researchers have told me that the simplicity ofthe staircase rules is itself an advantage It makes writingprograms easier It is nicer for teaching students It alsosimplifies the uncertain business of guessing likelihoodpriors and communicating the methodology when writ-ing articles

Methods That Can Be Used to Estimate Slope as Well as Threshold

There are good reasons why one might want to learnmore about the PF than just the single threshold valueEstimating the PF slope is the first step in this directionTo estimate slope one must test the PF at more than onepoint Methods are available to do this (Kaernbach 2001bKontsevich amp Tyler 1999) and one could even adapt sim-ple Brownian staircase procedures to give better slope es-timates with minimum harm to threshold estimates (seethe comments in Strasburger 2001b)

Probably the best method for getting both thresholds andslopes is the Y (Psi) method of Kontsevich and Tyler(1999) The Y method is not totally novel it borrows tech-niques from many previous researchersmdashas Kontsevichand Tyler point out in a delightful paragraph that both clar-ifies their method and states its ancestry This paragraphsummarizes what is probably the most accurate and preciseapproach to estimating thresholds and slope so I reproduceit here

To summarize the Y method is a combination of solutionsknown from the literature The method updates the poste-rior probability distribution across the sampled space ofthe psychometric functions based on Bayesrsquo rule (Hall1968 Watson amp Pelli 1983) The space of the psychome-tric functions is two-dimensional (King-Smith amp Rose1997 Watson amp Pelli 1983) Evaluation of the psycho-metric function is based on computing the mean of theposterior probability distribution (Emerson 1986 King-Smith et al 1994) The termination rule is based on thenumber of trials as the most practical option (King-Smith

1438 KLEIN

et al 1994 Watson amp Pelli 1983) The placement of eachnew trial is based on one-step ahead minimum search(King-Smith 1984) of the expected entropy cost function(Pelli 1987)

A simplified rough description of the Y trial placementfor 2AFC runs is that initially trials are placed at about the90 correct level to establish a solid threshold Oncethreshold has been roughly established trials are placed atabout the 90 and the 70 levels to pin down the slopeThe precise levels depend on the assumed PF shape

Before one starts measuring slopes however one mustfirst be convinced that knowledge of slope can be impor-tant It is so for several reasons

1 Parametric fitting procedures either require knowl-edge of slope b or must have data that adequately con-strain the slope estimate Part of the discussion that fol-lows will be on how to obtain reliable threshold estimateswhen slope is not well known A tight knowledge of slopecan minimize the error of the threshold estimate

2 Strasburger (2001a 2001b) gives good argumentsfor the desirability of knowing the shape of the PF He isinterested in differences between foveal and peripheralvisual processing and he uses the PF slope as an indica-tion of differences

3 Slopes are different for different stimuli and thiscan lead to misleading results if slope is ignored It mayhelp to describe my experience with the Modelfest pro-ject (Carney et al 2000) to make this issue concreteThis continuing project involves a dozen laboratoriesaround the world We have measured detection thresh-olds for a wide variety of visual stimuli The data areused for developing an improved model of spatial visionSome of the stimuli were simple easily memorized lowspatial frequency patterns that tend to have shallowerslopes because a stimulus-known-exactly template canbe used efficiently and with minimal uncertainty On theother hand tiny Modelfest stimuli can have a good dealof spatial uncertainty and are therefore expected to pro-duce PFs with steeper slopes Because of the slope changethe pattern of thresholds will depend on the level atwhich threshold is measured When this rich dataset isanalyzed with f ilter models that make estimates formechanism bandwidths and mechanism pooling thesedifferent slope-dependent threshold patterns will lead todifferent estimates for the model parameters Several ofus in the Modelfest data gathering group argued that weshould use a data-gathering methodology that wouldallow estimation of PF slope of each of the 45 stimuliHowever I must admit that when I was a subject my pa-tience wore thin and I too chose the quicker runs Al-though I fully understand the attraction of short runs fo-cused on thresholds and ignoring slope I still think thereare good reasons for spreading trials out For one thinginterspersing trials that are slightly visible helps the sub-ject maintain memory of the test pattern

4 Slopes can directly resolve differences between mod-els In my own research for example my colleagues and

I are able to use the PF slope to distinguish long-range fa-cilitation that produces a low-level excitation or noise re-duction (typical with a cross-orientation surround) from ahigher level uncertainty reduction (typical with a same-orientation surround)

Throwing-Out Rules Goodness-of-FitThere are occasions when a staircase goes sour Most

often this happens when the subjectrsquos sensitivity changesin the middle of a run possibly because of fatigue It istherefore good to have a well-specified rule that allowsnonstationary runs to be thrown out The throwing-out rulecould be based on whether the PF slope estimate is toolow or too high Alternatively a standard approach is tolook at the chi-square goodness-of-f it statistic whichwould involve an assumption about the shape of the un-derlying PF I have a commentary in Section III regard-ing biases in the chi-square metric that were found byWichmann and Hill (2001a) Biases make it difficult touse standard chi-square tables The trouble with this ap-proach of basing the goodness-of-fit just on the cumulateddata is that it ignores the sequential history of the run TheBrownian upndashdown staircase makes the sequential historyvividly visible with a plot of sequentially tested levelsWhen fatigue or adaptation sets in one sees the tested lev-els shift from one stable point to another Leek Hanna andMarshall (1991) developed a method of assessing the non-stationarity of a psychometric function over the course ofits measurement using two interleaved tracks EarlierHall (1983) also suggested a technique to address the sameproblem I encourage further research into the develop-ment of a non-stationarity statistic that can be used to dis-card bad runs This statistic would take the sequential his-tory into account as well as the PF shape and the chi-square

The development of a ldquothrowing-outrdquo rule is the ex-treme case of the notion of weighted averaging One im-portant reason for developing good estimators for good-ness-of-fit and good estimators for threshold variance isto be able to optimally combine thresholds from repeatedexperiments I shall return to this issue in connectionwith the Wichmann and Hill (2001a) analysis of bias ingoodness-of-fit metrics

Final Thought The Method Should Depend on the Situation

Also important is that the method chosen should be ap-propriate for the task For example suppose one is usingwell-experienced observers for testing and retesting sim-ilar conditions in which one is looking for small changesin threshold or one is interested in the PF shape In suchcases one has a good estimate of the expected thresholdand the signal detection method of constant stimuli withblanks and with ratings might be best On the other handfor clinical testing in which one is highly uncertain aboutan individualrsquos threshold and criterion stability a 2AFClikelihood or staircase method (with a confidence rating)might be best

MEASURING THE PSYCHOMETRIC FUNCTION 1439

III DATA-FITTING METHODS FOR ESTIMATING THRESHOLD AND SLOPEPLUS GOODNESS-OF-FIT ESTIMATION

This final section of the paper is concerned with whatto do with the data after they have been collected Wehave here the standard Bayesian situation Given the PFit is easy to generate samples of data with confidence in-tervals But we have the inverse situation in which we aregiven the data and need to estimate properties of the PFand get confidence limits for those estimates The pair ofarticles in this special issue by Wichmann and Hill (2001a2001b) as well as the articles by King-Smith et al (1994)Treutwein (1995) and Treutwein and Strasburger (1999)do an outstanding job of introducing the issues and providemany important insights For example Emerson (1986)and King-Smith et al emphasize the advantages of meanlikelihood over maximum likelihood The speed of mod-ern computers makes it just as easy to calculate meanlikelihood as it is to calculate maximum likelihood As an-other example Treutwein and Strasburger (1999) demon-strate the power of using Bayesian priors for estimatingparameters by showing how one can estimate all four pa-rameters of Equation 1A in fewer than 100 trials This im-portant finding deserves further examination

In this section I will focus on four items First I will dis-cuss the relative merits of parametric versus nonparamet-ric methods for estimating the PF properties The paperby Miller and Ulrich (2001) provides a detailed analysis ofa nonparametric method for obtaining all the moments ofthe PF I will discuss some advantages and limitations oftheir method Second is the question of goodness-of-fitWichmann and Hill (2001a) show that when the numberof trials is small the goodness-of-fit can be very biasedI will examine the cause of that bias Third is the issue dis-cussed by Wichmann and Hill (2001a) that of how lapseseffect slope estimates This topic is relevant to Strasburg-errsquos (2001b) data and analysis Finally there is the possiblycontroversial topic of an upward slope bias in adaptivemethods that is relevant to the papers of Kaernbach (2001b)and Strasburger (2001b)

Parametric Versus Nonparametric FitsThe main split between methods for estimating pa-

rameters of the psychometric function is the distinctionbetween parametric and nonparametric methods In aparametric method one uses a PF that involves severalparameters and uses the data to estimate the parametersFor example one might know the shape of the PF and es-timate just the threshold parameter McKee et al (1985)show how slope uncertainty affects threshold uncertaintyin probit analysis The two papers by Wichmann and Hill(2001a 2001b) are an excellent introduction to many is-sues in parametric fitting An example of a nonparamet-ric method for estimating threshold would be to averagethe levels of a Brownian staircase after excluding the firstfour or so reversals in order to avoid the starting level biasBefore I served as guest editor of this special symposium

issue I believed that if one had prior knowledge of the exactPF shape a parametric method for estimating threshold wasprobably better than nonparametric methods After read-ing the articles presented here as well as carrying out myrecent simulations I no longer believe this I think thatin the proper situation nonparametric methods are as goodas parametric methods Furthermore there are situationsin which the shape of the PF is unknown and nonparamet-ric methods can be better

Miller and Ulrichrsquos Article on the Nonparametric SpearmanndashKaumlrber Method of Analysis

The standard way to extract information from psycho-metric data is to assume a functional shape such as a Wei-bull cumulative normal logit dcent and then vary the pa-rameters until likelihood is maximized or chi-square isminimized Miller and Ulrich (2001) propose using theSpearmanndashKaumlrber (SK) nonparametric method whichdoes not require a functional shape They start with a prob-ability density function defined by them as the derivativeof the PF

(33)

where s(i) is the stimulus intensity at the ith level andP(i) is the probability correct at the that level The notationpdf(i) is meant to indicate that it is the pdf for the intervalbetween i ndash 1 and i Miller and Ulrich give the r th momentof this distribution as their Equation 3

(34)

The next four equations are my attempt to clarify their ap-proach I do this because when applicable it is a finemethod that deserves more attention Equation 34 prob-ably came from the mean value theorem of calculus

(35)

where f (s) 5 sr11 and f cent(smid) 5 (r 1 1)smidr is the deriv-

ative of f with respect to s somewhere within the intervalbetween s(i 1) and s(i) For a slowly varying functionsmid would be near the midpoint Given Equation35 Equa-tion 34 becomes

(36)

where Ds 5 s(i) s(i 1) Equation 36 is what one wouldmean by the rth moment of the distribution The case r 50 is important since it gives the area under the pdf

(37)

by using the definition in Equation 33 The summation inEquation 37 gives the area under the pdf For it to be aproper pdf its area must be unity which means that theprobability at the highest level P(imax) must be unity andthe probability at the lowest level Pmin must be zero asMiller and Ulrich (2001) point out

m0 = =aringpdf( ) ( ) ( ) max mini s P i P ii

D

mrr

i

i s s= aring pdf mid( ) D

f s i f s i f s s i s i( ) ( ) ( ) ( ) ( ) [ ] [ ] = cent [ ]1 1mid

mrr r

ri s i s i=

+ [ ]aring + +11

11 1pdf( ) ( ) ( )

pdf( )( ) ( )

( ) ( )i

P i P i

s i s i= 1

1

1440 KLEIN

The next simplest SK moment is for r 5 1

(38)

from Equation 33 This first moment provides an esti-mate of the centroid of the distribution corresponding tothe center of the psychometric function It could be usedas an estimate of threshold for detection or PSE for dis-crimination An alternate estimator would be the medianof the pdf corresponding to the 50 point of the PF (theproper definition of PSE for discrimination)

Miller and Ulrich (2001) use Monte Carlo simulationsto claim that the SK method does a better job of extract-ing properties of the psychometric function than do para-metric methods One must be a bit careful here since intheir parametric fitting they generate the data by usingone PF and then fit it with other PFs But still it is a sur-prising claim if this simple method is so good why havepeople not been using it for years There are two possibleanswers to this question One is that people never thoughtof using it before Miller and Ulrich The other possibil-ity is that it has problems I have concluded that the an-swer is a combination of both factors I think that there isan important place for the SK approach but its limitationsmust be understood That is my next topic

The most important limitation of the SK nonparamet-ric method is that it has only been shown to work well formethod of constant stimulus data in which the PF goesfrom 0 to 1 I question whether it can be applied with thesame confidence to 2AFC and to adaptive procedures withunequal numbers of trials at each level The reason for thislimitation is that the SK method does not offer a simpleway to weight the different samples Let us consider twosituations with unequal weighting adaptive methods and2AFC detection tasks

Adaptive methods place trials near threshold but withvery uneven distribution of number of trials at differentlevels The SK method gives equal weighting to sparselypopulated levels that can be highly variable thus con-tributing to a greater variance of parameter estimates Letus illustrate this point for an extreme case with five levelss 5 [1 2 3 4 5] Suppose the adaptive method placed [11 1 98 1] trials at the five levels and the number correctwere [0 1 1 49 1] giving probabilities [0 1 1 5 1] andpdf 5 [1 0 5 5] The following PEST-like (Taylor ampCreelman 1967) rules could lead to this unusual dataMove down a level if the presently tested level has greaterthan P1 correct move up three levels if the presentlytested level has less than P2 correct P1 can be anynumber less than 33 and P2 can be any number greaterthan 67 The example data would be produced by start-ing at Level 5 and getting four in a row correct followedby an error followed by a wide set possible of alternat-ing responses for which the PEST rules would leave thefourth level as the level to be tested The SK estimate of the

center of the PF is given by Equation 38 to be micro1 5 200Since a pdf with a negative probability is distasteful tosome one can monotonize the PF according to the proce-dure of Miller and Ulrich (which does include the numberof trials at each level) The new probabilities are [0 5151 51 1] with pdf 5 [51 0 0 49] The new estimate ofthreshold is micro1 5 297 However given the data with 4998correct at Level 4 an estimate that includes a weightingaccording to the number of trials at each level would placethe centroid of the distribution very close to micro1 5 4 Thuswe see that because the SK procedure ignores the numberof trials at each level it produces erroneous estimates ofthe PF properties This example with shifting thresholdswas chosen to dramatize the problems of unequal weight-ing and the effect of monotonizing

A similar problem can occur even with the method ofconstant stimuli with equal number of trials at each levelbut with unequal binomial weighting as occurs in 2AFCdata The unequal weighting occurs because of the 50lower asymptote The binomial statistics error bar of dataat 55 correct is much larger than for data at 95 cor-rect Parametric procedures such as probit analysis giveless weighting to the lower data This is why the most ef-ficient placement of trials in 2AFC is above 90 correctas I have discussed in Section II Thus it is expected thatfor 2AFC the SK method would give threshold estimatesless precise than the estimates given by a parametricmethod that includes weighting

I originally thought that the SK method would fail on alltypes of 2AFC data However in the discussion followingEquation 7 I pointed out that for 2AFC discriminationtasks there is a way of extending the data to the 0ndash100range that produces weightings similar to those for a yesnotask Rather than use the standard Equation 2 to deal withthe 50 lower asymptote one should use the method dis-cussed in connection with Figure 3 in which one can usefor the ordinate the percent of correct responses in Inter-val 2 and use the signal strength in Interval 2 minus thatin Interval 1 for the abscissa That procedure produces aPF that goes from 0 to 100 thereby making it suit-able for SK analysis while also removing a possible bias

Now let us examine the SK method as applied to datafor which it is expected to work such as discriminationPFs that go from 0 to 100 These are tasks in which thelocation of the psychometric function is the PSE and theslope is the threshold

A potentially important claim by Miller and Ulrich(2001) is that the SK method performs better than probitanalysis This is a surprising claim since their data aregenerated by a probit PF one would expect probit analy-sis to do well especially since probit analysis does espe-cially well with discrimination PFs that go from 0 to 100To help clarify this issue Miller and I did a number ofsimulations We focused on the case of N 5 40 with 8 tri-als at each of 5 levels The cumulative normal PF (Equa-tion 14) was used with equally spaced z scores and with thelowest and highest probabilities being 1 and 99 Theprobabilities at the sample points are P(i) 5 1 1222

m1

2 21

2

11

2

=

= [ ] +

aring

aring

pdf( )( ) ( )

( ) ( )( ) ( )

is i s i

P i P is i s i

MEASURING THE PSYCHOMETRIC FUNCTION 1441

50 8778 and 99 corresponding to z scores ofz(i) 5 ndash2328 1164 0 1164 and 2328 Nonmonot-onicity is eliminated as Miller and Ulrich suggest (seethe Appendix for Matlab Code 4 for monotonizing) I didMonte Carlo simulations to obtain the standard deviationof the location of the PF micro1 given by Equation38 I foundthat both the SK method and the parametric probit method(assuming a fixed unity slope) gave a standard deviationof between 028 and 029 Miller and Ulrichrsquos result re-ported in their Table 17 gives a standard error of 0071However their Gaussian had a standard deviation of 025whereas my z-score units correspond to a standard devi-ation of unity Thus their standard error must be multipliedby 4 giving a standard error of 0284 in excellent agree-ment with my value So at least in this instance the SKmethod did not do better than the parametric method (withslope fixed) and the surprise mentioned at the beginningof this paragraph did not materialize I did not examineother examples where nonparametric methods might dobetter than the matched parametric method However thesimulations of McKee et al (1985) give some insight intowhy probit analysis with floating slope can do poorly evenon data generated from probit PFs McKee et al showthat when the number of trials is low and if stimulus lev-els near both 0 and 100 are not tested the slope can bevery flat and the predicted threshold can be quite distantIf care is not taken to place a bound on these probit esti-mates one would expect to find deviant threshold esti-mates The SK method on the other hand has built-inbounds for threshold estimates so outliers are avoidedThese bounds could explain why SK can do better thanprobit When the PF slope is uncertain nonparametricmethods might work better than parametric methods adecided advantage for the SK method

It is instructive to also do an analytic calculation of thevariance based on an analysis similar to what was doneearlier If only the ith level of the five had been testedthe variance of the PF location (the PSE in a discrimina-tion task) would have been as follows (Gourevitch ampGalanter 1967)

(39)

where var(P) is given by Equation 27 and PF slope is

(40)

where for the cumulative normal dzi dxi 5 1s where sis the standard deviation of the pdf and

(41)

from Equation 3A By combining these equations one gets

(42)

Note that if all trials were placed at the optimal stimuluslocation of z 5 0 (P 5 5) Equation 42 would becomes2p 2Ni as discussed earlier When all five levels arecombined the total variance is smaller than each individ-ual variance as given by the variance summation formula

(43)

Finally upon plugging in the five values of zi rangingfrom ndash2328 to 12328 and Pi ranging from 1 to 99the standard error of the PF location is

(44)

This value is precisely the value obtained in our MonteCarlo simulations of the nonparametric SK method andalso by the slope fixed parametric probit analysis Thistriple confirmation shows that the SK method is excel-lent in the realm where it applies (equal number of trialsat levels that span 0 to 100) It also confirms our sim-ple analytic formula

The data presented in Miller and Ulrichrsquos Table 17 giveus a fine opportunity to ask about the efficiency of themethod of constant stimuli that they use For the N 5 40example discussed here the variance is

(45)

This value is more than twice the variance of p (2 Ntot)that is obtained for the same PF using adaptive methodsincluding the simple upndashdown staircase that place trialsnear the optimal point

The large variance for the Miller and Ulrich stimulusplacement is not surprising given that of the five stimu-lus levels the two at P 5 1 and 99 are quite inefficientThe inefficient placement of trials is a common problemfor the method of constant stimuli as opposed to adaptivemethods However as I mentioned earlier the method ofconstant stimuli combined with multiple response ratingsin a signal detection framework may be the best of allmethods for detection studies for revealing properties of thesystem near threshold while maintaining a high efficiency

A curious question comes up about the optimal placementof trials for measuring the location of the psychometric func-tion With the original function the optimal placement is atthe steepest part of the function (at P 5 0) However withthe SK method when one is determining the location ofthe pdf one might think that the optimal placement of sam-ples should occur at the points where the pdf is steepestThis contradiction is resolved when one realizes that onlythe original samples are independent The pdf samples areobtained by taking differences of adjacent PF values soneighboring samples are anticorrelated Thus one cannot

var tottot

= =SEN

2 3 24

SE = =var tot12 0 2844

var var tot = aring1 1i

var( )exp

PSE i i i

i

i

P Pz

N= ( ) ( )

2 12

2

ps

dP

dz

zi

i

iaeligegraveccedil

oumloslashdivide

=( )2 2

2

exp

p

dP

dx

dP

dz

dz

dxi

i

i

i

i

i

= times

var( )var

PSE i

i

i

i

P

dP

dx

=( )

aeligegraveccedil

oumloslashdivide

2

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

1426 KLEIN

tion for the judgment This procedure identical to Equa-tion 7 gives the same dcent as before

Three Distinctions for Clarifying PFsIn dealing with PFs three distinctions need to be made

yesno versus forced choice detection versus discrimi-nation and constant stimuli versus adaptive methodsThese distinctions are usually clear but I should like topoint out some subtleties

For the forced choice versus yesno distinction thereare two sorts of forced choice tasks The standard versionhas multiple intervals separated spatially or temporallyand the stimulus is in only one of the intervals In the otherversion one of N stimuli is shown and the observer re-

sponds with a number from 1 to N For example Strasburg-er (2001b) presented 1 of 10 letters to the observer in a10AFC task Yesno tasks have some similarity to the lattertype of forced choice task Consider for example a detec-tion experiment in which one of five contrasts (includinga blank) are presented to the observer and the observer re-sponds with numbers from 1 to 5 This would be classifiedas a rating scale method of constant stimuli yesno tasksince only a single stimulus is presented and the rating isbased on a one-dimensional intensity

The detectiondiscrimination distinction is usually basedon whether the reference stimulus is a natural zero pointFor example suppose the task is to detect a high spatial fre-quency test pattern added to a spatially identical reference

0 1 2 30

2

4

6

8

1

C1C2

average

(a)

prob

abili

ty c

orre

ct

0 1 2 30

2

4

6

8

1

1 C1

C2

diff2

(b)

prob

abili

ty o

f I2

resp

onse

0 1 2 31

0

1

2

3

4(c)

C1

C2

z-average

stimulus strength

z-sc

ore

corr

ect

0 1 2 31

0

1

2

3

4

1 C1

C2

biased d cent

(d)

unbiased d cent

stimulus strength

z-sc

ore

for

I2 r

espo

nse

average prob

Figure 2 2AFC psychometric functions with a strong bias in favor of responding Interval 2 (I2) The bias is chosen from aspecific example presented by Green and Swets (1966) such that the observer has 50 and 95 correct when the test is in I1and I2 respectively These points are marked by asterisks The psychometric function being plotted is a cumulative normal Inall panels the abscissa is xt the stimulus strength (a) The psychometric functions for probability correct in I1 and I2 are shownand labeled C1 and C2 The average of the two probabilities labeled average is the dotndashdashed line it is the curve that is usu-ally reported The diamond is at 725 the average percent correct of the two asterisks (b) Same as panel a except that insteadof showing C1 we show 1ndashC1 the probability of saying I2 when the test was in I1 The abscissa is now labeled ldquoprobability ofI2 responserdquo (c) z scores of the three probabilities in panel a An additional dashed line is shown that is the average of the C1and C2 z-score curves The diamond is the z-score of the diamond in panel a and the star is the z-score average of the two panel casterisks (d) The sign of the C2 curve in panel c is flipped to correspond to panel b The dashed and dotndashdashed lines of panel chave been multiplied by Iuml2 in panel d so that they become dcent

MEASURING THE PSYCHOMETRIC FUNCTION 1427

pattern If the reference pattern has zero contrast the taskis detection If the reference pattern has a high contrast thetask is discrimination Klein (1985) discusses these tasks interms of monopolar and bipolar cues For discriminationa bipolar cue must be available whereby the test pattern canbe either positive or negative in relation to the reference Ifone cannot discriminate the negative cue from the positivethen it can be called a detection task

Finally the constant stimuli versus adaptive method dis-tinction is based on the formerrsquos having preassigned testlevels and the latterrsquos having levels that shift to a desiredplacement The output of the constant stimulus method is

a full PF and is thus fully entitled to be included in this spe-cial issue The output of adaptive methods is typically onlya single number the threshold specifying the location butnot the shape of the PF Two of the papers in this issue(Kaernbach 2001b Strasburger 2001b) explore the pos-sibility of also extracting the slope from adaptive data thatconcentrates trials around one level Even though adaptivemethods do not measure much about the PF they are sopopular that they are well represented in this special issue

The 10 rows of Table 1 present the articles in this specialissue (including the present article) Columns 2ndash5 corre-spond to the four categories associated with the first two

Figure 3 2AFC discrimination PF from Figure 2 has been extended to the 0 to 100 range without rescaling the or-dinate In panels a and b the two right-hand panels of Figure 2 have been modified by flipping the sign of the abscissa ofthe negative slope branch where the test is in Interval 1 (I1) The new abscissa is now the stimulus strength in I2 minus thestrength in I1 The abscissa scaling has been modified to be in stimulus units In this example the test stimulus has a con-trast of 5 The ordinate in panel a is the probability that the response is I2 The ordinate in panel b is the z score of thepanel a ordinate The asterisks are at the same points as in Figure 2 Panel c is the standard signal detection picture whennoise is added to the signal The abscissa has been modified to being the activity in I2 minus the activity in I1 The unitsof activation have arbitrarily been chosen to be the same as the units of panels a and b Activity distributions are shownfor stimuli of 5 and 15 units corresponding to the asterisks of panels a and b The subjectrsquos criterion is at 5 units ofactivation The upper abscissa is in z-score units where the Gaussians have unit variance The area under the two distri-butions above the criterion are 50 and 95 in agreement with the probabilities shown in panel a

15 10 5 0 5 10 150

5

1

prob

abili

ty o

f res

pons

e =

2

(a)

15 10 5 0 5 10 152

0

2

4

z-sc

ore

of r

espo

nse

= 2

(b)

contrast of I2 minus contrast of I1

15 10 5 0 5 10 150

05

1

activity in I2 minus activity in I1

responsewhen test in I1

(c)

response when test in I2

00822 0822z-score of activity in I2 minus activity in I1 (unit variance Gaussians)

d cent = 1645

crite

rion

16451645

resp

onse

1428 KLEIN

distinctions The last column classifies the articles accord-ing to the adaptive versus constant stimuli distinction Inorder to open up the full variety of PFs for discussion andto enable a deeper understanding of the relationship amongdifferent types of PFs I will now clarify the interrelation-ship of the various categories of PFs and their connectionto the signal detection approach

Lack of adaptive method for yesno tasks with con-trolled false alarm rate In Section II I will bring up anumber of advantages of the yesno task in comparisonwith to the 2AFC task (see also Kaernbach 1990) Giventhose yesno advantages one may wonder why the 2AFCmethod is so popular The usual answer is that 2AFC hasno response bias However as has been discussed the yesno method allows an unbiased dcent to be calculated and the2AFC method does allow an interval bias that affects dcentAnother reason for the prevalence of 2AFC experimentsis that a multitude of adaptive methods are available for2AFC but barely any available for an objective yesnotask in which the false alarm rate is measured so that dcentcan be calculated In Table 1 with one exception the rowswith adaptive methods are associated with the forcedchoice method The exception is Leek (2001) who dis-cusses adaptive yesno methods in which no blank trialsare presented In that case the false alarm rate is notmeasured so dcent cannot be calculated This type of yesnomethod does not belong in the ldquoobjectiverdquo category of con-cern for the present paper The ldquo1990rdquo entries in Table 1refer to Kaernbachrsquos (1990) description of a staircasemethod for an objective yesno task in which an equalnumber of blanks are intermixed with the signal trialsSince Kaernbach could have used that method for Kaern-bach (2001b) I placed the ldquo1990rdquo entry in his slot

Kaernbachrsquos (1990) yesno staircase rules are simpleThe signal level changes according to a rule such as thefollowing Move down one level for correct responses ahit ltyes|signalgt or a correct rejection ltno|blankgt moveup three levels for wrong responses a miss ltno|signalgtor a false alarm ltyes|blankgt This rule similar to the onedownndashthree up rule used in 2AFC places the trials so thatthe average of the hit rate and correct rejection rate is 75

What is now needed is a mechanism to get the observer toestablish an optimal criterion that equalizes the number ofldquoyesrdquo and ldquonordquo responses (the ROC negative diagonal)This situation is identical to the problem of getting 2AFCobservers to equalize the number of responses to Inter-vals 1 and 2 The quadratic bias in dcent is the same in bothcases The simplest way to get subjects to equalize theirresponses is to give them feedback about any bias in theirresponses With equal ldquoyesrdquo and ldquonordquo responses the 75correct corresponds to a z score of 0674 and a dcent 5 2z 51349 If the subject does not have equal numbers of ldquoyesrdquoand ldquonordquo responses then the dcent would be calculated bydcent 5 zhit zfalse alarm I do hope that Kaernbachrsquos clever yesno objective staircase will be explored by others Oneshould be able to enhance it with ratings and multiple stim-uli (Klein 2002)

Forced choice detection As can be seen in the secondcolumn of Table 1 a popular category in this specialissue is the forced choice method for detection Thismethod is used by Strasburger (2001b) Kaernbach(2001a) Linschoten et al (2001) and Wichmann and Hill(2001a 2001b) These researchers use PFs based on prob-ability ordinates The connection between P and dcent is dif-ferent from the yesno case given by Equation 5 For2AFC signal detection theory provides a simple connec-tion between dcent and probability correct dcent(x) 5 Iuml2 z(x)In the preceding section I discussed the option of averag-ing the probabilities and then taking the z score (highthreshold approach) or averaging the z scores and then cal-culating the probability (signal detection approach) Thesignal detection method is better because it has a strongerempirical basis and avoids bias

For an m-AFC task with m 2 the connection betweendcent and P is more complicated than it is for m 5 2 The con-nection given by a fairly simple integral (Green amp Swets1966 Macmillan amp Creelman 1991) has been tabulatedby Hacker and Ratcliff (1979) and by Macmillan and Creel-man (1991) One problem with these tables is that they arebased on the assumption of unity ROC slope It is knownthat there are many cases in which the ROC slope is notunity (Green amp Swets 1966) so these tables connecting

Table 1Classification of the 10 Articles in This Special Issue

According to Three Distinctions Forced Choice Versus YesNo Detection Versus Discrimination Adaptive Versus Constant Stimuli

Detection Discrimination Adaptive or Source m-AFC YesNo m-AFC YesNo Constant Stimuli

Leek (2001) General loose g 3 loose g AWichmann amp Hill (2001a) 2AFC (3) (3) (3) CWichmann amp Hill (2001b) 2AFC (3) (3) (3) CLinschoten et al (2001) 2AFC AStrasburger (2001a) General 3 3 CStrasburger (2001b) 10AFC AKaernbach (2001a) General 3 AKaernbach (2001b) General (1990) 3 (1990) AMiller amp Ulrich (2001) for g 5 0 3 (for 0ndash100) 3 CKlein (present) General 3 3 3 Both

MEASURING THE PSYCHOMETRIC FUNCTION 1429

dcent and probability correct should be treated cautiouslyW P Banks and I (unpublished) investigated this topic andfound that near the P 5 50 correct point the dependenceof dcent on ROC slope is minimal Away from the 50 pointthe dependence can be strong

YesNo detection Linschoten et al (2001) comparethree methods (limits staircase 2AFC likelihood) for mea-suring thresholds with a small number of trials In themethod of limits one starts with a subthreshold stimulusand gradually increases the strength On each trial the ob-server says ldquoyesrdquo or ldquonordquo with respect to whether or not thestimulus is detected Although this is a classic yesno de-tection method it will not be discussed in this article be-cause there is no control of response bias That is blankswere not intermixed with signals

In Table 1 the Wichmann and Hill (2001a 2001b) arti-cles are marked with 3s in parentheses because althoughthese authors write only about 2AFC they mention that alltheir methods are equally applicable for yesno or m-AFCof any type (detectiondiscrimination) And their Matlabimplementations are fully general All the special issue ar-ticles reporting experimental data used a forced choicemethod This bias in favor of the forced choice methodol-ogy is found not only in this special issue it is widespreadin the psychophysics community Given the advantages ofthe yesno method (see discussion in Section II) I hopethat once yesno adaptive methods are accepted they willbecome the method of choice

m-AFC discrimination It is common practice to rep-resent 2AFC results as a plot of percent correct averagedover all trials versus stimulus strength This PF goes from50 to 100 The asymmetry between the lower andupper asymptotes introduces some inefficiency in thresh-old estimation as will be discussed Another problem withthe standard 50 to 100 plot is that a bias in choice ofinterval will produce an underestimate of dcent as has beenshown earlier Researchers often use the 2AFC method be-cause they believe that it avoids biased threshold estimatesIt is therefore surprising that the relatively simple correc-tion for 2AFC bias discussed earlier is rarely done

The issue of bias in m-AFC also occurs when m 2 InStrasburgerrsquos 10AFC letter discrimination task it is com-mon for subjects to have biases for responding with par-ticular letters when guessing Any imbalance in the re-sponse bias for different letters will result in a reductionof dcent as it did in 2AFC

There are several benefits of viewing the 2AFC dis-crimination data in terms of a PF going from 0 to 100Not only does it provide a simple way of viewing and cal-culating the interval bias it also enables new methods forestimating the PF parameters such as those proposed byMiller and Ulrich (2001) as will be discussed in Sec-tion III In my original comments on the Miller and Ulrichpaper I pointed out that because their nonparametric pro-cedure has uniform weighting of the different PF levelstheir method does not apply to 2AFC tasks The asymmet-ric binomial error bars near the 50 and 100 levelscause the uniform weighting of the Miller and Ulrich ap-

proach to be nonoptimal However I now realize that the2AFC discrimination task can be fit by a cumulative nor-mal going from 0 to 100 Because of that insight Ihave marked the discrimination forced choice column ofTable 1 for the Miller and Ulrich (2001) paper with theproviso that the PF goes from 0 to 100 Owing to thepopularity of 2AFC this modification greatly expands therelevance of their nonparametric approach

YesNo discrimination Kaernbach (2001b) and Millerand Ulrich (2001) offer theoretical articles that examineproperties of PFs that go from P 5 0 to 100 (P 5 pin Equation 1) In both cases the PF is the cumulative nor-mal (Equation 3) Although these PFs with g 5 0 couldbe for a yesno detection task with a zero false alarm rate(not plausible) or a forced choice detection task with an in-finite number of alternatives (not plausible either) I sus-pect that the authors had in mind a yesno discriminationtask (Table 1 col 5) A typical discrimination task in vi-sion is contrast discrimination in which the observer re-sponds to whether the presented contrast is greater than orless than a memorized reference Feedback reinforces thestability of the reference In a typical discrimination taskthe reference is one exemplar from a continuum of stimu-lus strengths If the reference is at a special zero pointrather than being an element of a smooth continuum thetask is no longer a simple discrimination task Zero con-trast would be an example of a special reference Klein(1985) discusses several examples which illustrate how anatural zero can complicate the analysis One might won-der how to connect the PF from the detection regime inwhich the reference is zero contrast to the discriminationregime in which the reference (pedestal) is at a high con-trast The dcent function to be introduced in Equation 20 doesa reasonably good job of fitting data across the full range ofpedestal strength going from detection to discrimination

The connection of the discrimination PF to the signaldetection PF is the same as that for yesno detection givenin Equation 5 dcent(x) 5 z(x) z(0) where z is the z scoreof the probability of a ldquogreater thanrdquo judgment In a dis-crimination task the bias z(0) is the z score of the prob-ability of saying ldquogreaterrdquo when the reference stimulus ispresented The stimulus strength x that gives z(x) 5 0 isthe point of subjective equality (x 5 PSE) If the cumu-lative normal PF of Equation 3 is used then the z scoreis linearly proportional to the stimulus strength z(x) 5(x PSE)threshold where threshold is defined to be thepoint at which dcent 5 1 I will come back to these distinc-tions between detection and discrimination PFs after pre-senting more groundwork regarding thresholds log ab-scissas and PF shapes (see Equations 14 and 17)

Definition of ThresholdThreshold is often defined as the stimulus strength that

produces a probability correct halfway up the PF If humansoperated according to a high-threshold assumption thisdefinition of threshold would be stable across different ex-perimental methods However as I discussed followingEquation 2 high-threshold theory has been discredited

1430 KLEIN

According to the more successful signal detection theory(Green amp Swets 1966) the dcent at the midpoint of the PFchanges according to the number of alternatives in aforced choice method and according to the false alarm ratein a yesno method This variability of dcent with method is agood reason not to define threshold as the halfway pointof the PF

A definition of threshold that is relatively independent ofthe method used for its measurement is to define thresholdas the stimulus strength that gives a fixed value of dcent Thestimulus strength that gives dcent 5 1 (76 correct for 2AFC)is a common definition of threshold Although I will showthat higher dcent levels have the advantage of giving more pre-cise threshold estimates unless otherwise stated I will takethreshold to be at dcent 5 1 for simplicity This definition ap-plies to both yesno and m-AFC tasks and to both detectionand discrimination tasks

As an example consider the case shown in Figure 1where the lower asymptote (false alarm rate) in a yesnodetection task is g 5 1587 corresponding to a z score ofz 5 ndash1 If threshold is defined to be at dcent 5 10 then fromEquation 5 the z score for threshold is z 5 0 correspond-ing to a hit rate of 50 (not quite halfway up the PF) Thisexample with a 50 hit rate corresponds to defining dcentalong the horizontal ROC axis If threshold had been de-fined to be dcent 5 2 in Figure 1 then the probability correctat threshold would be 8413 This example in which boththe hit rate and correct rejection rate are equal (both are8413) corresponds to the ROC negative diagonal

Strasburgerrsquos Suggestion on Specifying Slope A Logarithmic Abscissa

In many of the articles in this special issue a logarithmicabscissa such as decibels is used Many shapes of PFs havebeen used with a log abscissa Most delightful but frus-trating is Strasburgerrsquos (2001a) paper on the PF maximumslope He compares the Weibull logistic Quick cumula-tive normal hyperbolic tangent and signal detection dcentusing a logarithmic abscissa The present section is the out-come of my struggles with a number of issues raised byStrasburger (2001a) and my attempt to clarify them

I will typically express stimulus strength x in thresh-old units

(8)

where a is the threshold Stimulus strength will be ex-pressed in natural logarithmic units y as well as in lin-ear units x

(9)

where y 5 loge(x) is the natural log of the stimulus and Y5 loge(a) is the threshold on the log abscissa

The slope of the psychometric function P( yt) with a log-arithmic abscissa is

and the maximum slope is called bcent by Strasburger (2001a)A very different definition of slope is sometimes used bypsychophysicists the logndashlog slope of the dcent function aslope that is constant at low stimulus strengths for manyPFs The logndashlog dcent slope of the Weibull function is shownin the bottom pair of panels in Figure 1 The low-contrastlogndashlog slope is b for a Weibull PF (Equation 12) and bfor a dcent PF (Equation 20) Strasburger (2001a) shows howthe maximum slope using a probability ordinate is con-nected to the logndashlog slope using a dcent ordinate

A frustrating aspect of Strasburgerrsquos article is that theslope units of P (probability correct per loge) are not fa-miliar Then it dawned on me that there is a simple con-nection between slope with a loge abscissa slopelog 5[dP( yt)](dyt) and slope with a linear abscissa slopelin 5[dP(xt)](dxt) namely

(11)

because dyt dxt 5 [d loge(xt)] (dxt) 5 1xt At thresholdxt 5 1 ( yt 5 0) so at that point Strasburgerrsquos slope with alogarithmic axis is identical to my familiar slope plotted inthreshold units on a linear axis The simple connection be-tween slope on the log and linear abscissas converted me tobeing a strong supporter of using a natural log abscissa

The Weibull and cumulative normal psychometricfunctions To provide a background for Strasburgerrsquos ar-ticle I will discuss three PFs Weibull cumulative nor-mal and dcent as well as their close connections A com-mon parameterization of the PF is given by the Weibullfunction

(12)

where pweib(xt) the PF that goes from 0 to 100 is re-lated to Pweib(xt) the probability of a correct responseby Equation 1 b is the slope and k controls the definitionof threshold pweib(1) 5 1 k is the percent correct atthreshold (xt 5 1) One reason for the Weibullrsquos popular-ity is that it does a good job of fitting actual data In termsof logarithmic units the Weibull function (Equation 12) be-comes

(13)

Panel a of Figure 1 is a plot of Equation 12 (Weibull func-tion as a function of x on a linear abscissa) for the case b 52 and k 5 exp( 1) 5 0368 The choice b 5 2 makes theWeibull an upside-down Gaussian Panel d is the samefunction this time plotted as a function of y correspondingto a natural log abscissa With this choice of k the point ofmaximum slope as a function of yt is the threshold point( yt 5 0) The point of maximum slope at yt 5 0 is markedwith an asterisk in panel d The same point in panel a atxt 5 1 is not the point of maximum slope on a linear ab-scissa because of Equation 11 When plotted as a dcent func-tion [a z-score transform of P(xt)] in panel b the Weibullaccelerates below threshold and decelerates above thresh-

p y ktyt

weib( ) = ( )[ ]1exp

b

p x ktxt

weib ( ) = 1b

slopeslope

lin =( )

=( )

times =dP x

dx

dP y

dy

dy

dx xt

t

t

t

t

t t

log

slope y dPdy

dpdy

tt

t

( ) =

= ( )1 g

y x x y Yt t= ( ) = =log log e e a

x xt = a

(10A)

(10B)

MEASURING THE PSYCHOMETRIC FUNCTION 1431

old in agreement with a wide range of experimental dataThe acceleration and deceleration are most clearly seen bythe slope of dcent in the logndashlog coordinates of panel e Thelogndashlog dcent slope is plotted in panels c and f At xt 5 0 thelogndashlog slope is 2 corresponding to our choice of b 5 2The slope falls surprisingly rapidly as stimulus strength in-creases and the slope is near 1 at xt 5 2 If we had chosenb 5 4 for the Weibull function then the logndashlog dcent slopewould have gone from 4 at xt 5 0 to near 2 at xt 5 2Equation 20 will present a dcent function that captures thisbehavior

Another PF commonly used is based on the cumulativenormal (Equation 3)

(14A)

or

(14B)

where s is the standard error of the underlying Gaussianfunction (its connection to b will be clarified later) Notethat the cumulative normal PF in Equation 14 should beused only with a logarithmic abscissa because y goes to

yen needed for the 0 lower asymptote whereas theWeibull function can be used with both the linear (Equa-tion 12) and the log (Equation 13) abscissa

Two examples of Strasburgerrsquos maximum slope (hisEquations 7 and 17) are

(15)

for the Weibull function (Equation 13) and

(16)

for the cumulative normal (Equation 14) In Equation 15 Iset k in Equation 12 to be k 5 exp( 1) This choiceamounts to defining threshold so that the maximum slopeoccurs at threshold ( yt 5 0) Note that Equations 12ndash16are missing the (1 g) factor that are present in Strasburg-errsquos Equations 7 and 17 because we are dealing with prather than P Equations 15 and 16 provide a clear simpleand close connection between the slopes of the Weibull andcumulative normal functions so I am grateful to Stras-burger for that insight

An example might help in showing how Equation 16works Consider a 2AFC task (g 5 05) assuming a cumu-lative normal PF with s 5 10 According to Equation 16bcent 5 0399 At threshold (assumed for now to be the pointof maximum slope) P(xt 5 1) 5 75 At 10 abovethreshold P(xt 5 11) 75 1 01 bcent 5 07899 which isquite close to the exact value of 7896 This example showshow bcent defined with a natural log abscissa yt is relevantto a linear abscissa xt

Now that the logarithmic abscissa has been introducedthis is a good place to stop and point out that the log ab-scissa is fine for detection but not for discrimination since

the x 5 0 point is often in the middle of the range How-ever the cumulative normal PF is especially relevant to dis-crimination where the PF goes from 0 to 100 andwould be written as

(17A)

or

(17B)

where x 5 PSE is the point of subjective equality Theparameter a is the threshold since dcent(a) 5 zF(a)zF(0) 5 1 The threshold a is also s standard deviationsof the Gaussian probability density function (pdf ) Thereciprocal of a is the PF slope I tend to use the letter sfor a unitless standard deviation as occurs for the loga-rithmic variable y I use a for a standard deviation thathas the units of the stimulus x (like percent contrast) Thecomparison of Equation 14B for detection and Equation17B for discrimination is useful Although Equation 14is used in most of the articles in this special issue whichare concerned with detection tasks the techniques that I willbe discussing are also relevant to discrimination tasks forwhich Equation 17 is used

Threshold and the Weibull FunctionTo illustrate how a dcent 5 1 definition of threshold works

for a Weibull function let us start with a yesno task inwhich the false alarm rate is P(0) 5 1587 correspondingto a z score of zFA 5 1 as in Figure 1 The z score atthreshold (dcent 5 1) is zTh 5 zFA11 5 0 corresponding to aprobability of P(1) 5 50 From Equations 1 and 12 thek value for this definition of threshold is given by k 5[1 P(1)][1 P(0)] 5 05943 The Weibull function be-comes

(18A)

If I had defined dcent 5 2 to be threshold then zTh 5 zFA125 1 leading to k 5 (1 08413)(1 01587) 5 01886giving

(18B)

As another example suppose the false alarm rate in ayesno task is at 50 not uncommon in a signal detectionexperiment with blanks and test stimuli intermixed Thenthreshold at dcent 5 1 would occur at 8413 and the PFwould be

(18C)

with Pweib(0) 5 50 and Pweib(1) 5 8413 This casecorresponds to defining dcent on the ROC vertical intercept

For 2AFC the connection between dcent and z is z 5 dcent205Thus zTh 5 2 05 5 07071 corresponding to Pweib(1) 57602 leading to k 5 04795 in Equation 12 These con-nections will be clarified when an explicit form (Equation20)is given for the dcent function dcent(xt)

P xtxt

weib ( ) = 1 0 5 0 3173 b

P xtxt

weib ( ) = 1 1 0 1587 0 1886( ) b

P xtxt

weib ( ) = 1 1 0 1587 0 5943( ) b

z x xtF ( ) = PSE

a

p x P x xt tF F F( ) = ( ) = aelig

egraveoumloslash

PSEa

cent = =bs p s

12

0 399

cent = =b b bexp( ) 1 0 368

z yy y Y

tt( ) = =s s

p yy

tt

F F( ) =aeligegraveccedil

oumloslashdivides

1432 KLEIN

Complications With Strasburgerrsquos Advocacy of Maximum Slope

Here I will mention three complications implicated inStrasburgerrsquos suggestion of defining slope at the point ofmaximum slope on a logarithmic abscissa

1 As can be seen in Equation 11 the slopes on linearand logarithmic abscissas are related by a factor of 1xtBecause of this extra factor the maximum slope occursat a lower stimulus strength on linear as compared withlog axes This makes the notion of maximum slope lessfundamental However since we are usually interested inthe percent error of threshold estimates it turns out thata logarithmic abscissa is the most relevant axis support-ing Strasburgerrsquos log abscissa definition

2 The maximum slope is not necessarily at threshold(as Strasburger points out) For the Weibull functions de-fined in Equation 13 with k 5 exp( 1) and the cumula-tive normal function in Equation 14 the maximum slope(on a log axis) does occur at threshold However the twopoints are decoupled in the generalized Weibull functiondefined in Equation 12 For the Quick version of theWeibull function (Equation 13 with k 5 05 placingthreshold halfway up the PF) the threshold is below thepoint of maximum slope the derivative of P(xt) atthreshold is

which is slightly different from the maximum slope asgiven by Equation 15 Similarly when threshold is de-fined at dcent 5 1 the threshold is not at the point of max-imum slope People who fit psychometric functions wouldprobably prefer reporting slopes at the detection thresh-old rather than at the point of maximum slope

3 One of the most important considerations in selectinga point at which to measure threshold is the question ofhow to minimize the bias and standard error of the thresh-old estimate The goal of adaptive methods is to place tri-als at one level In order to avoid a threshold bias due to aimproper estimate of PF slope the test level should be atthe defined threshold The variance of the threshold esti-mate when the data are concentrated as a single level (aswith adaptive procedures) is given by Gourevitch and Gal-anter (1967) as

(19)

If the binomial error factor of P(1 P)N were not presentthe optimal placement of trials for measuring thresholdwould be at the point of maximum slope However thepresence of the P(1 P) factor shifts the optimal point toa higher level Wichmann and Hill (2001b) consider howtrial placement affects threshold variance for the methodof constant stimuli This topic will be considered in Sec-tion II

Connecting the Weibull and dcent FunctionsFigures 5ndash7 of Strasburger (2001a) compare the logndash

log dcent slope b with the Weibull PF slope b Strasburgerrsquosconnection of b 5 088b is problematic since the dcent ver-sion of the Weibull PF does not have a fixed logndashlog slopeAn improved dcent representation of the Weibull function isthe topic of the present section A useful parameterizationfor dcent which I shall refer to as the StromeyerndashFoley func-tion was introduced by Stromeyer and Klein (1974) andused extensively by Foley (1994)

(20)

where xt is the stimulus strength in threshold units as inEquation 8 The factors with a in the denominator are pres-ent so that dcent equals unity at threshold (xt 5 1 or x 5 a)At low xt dcent x t

ba The exponent b (the logndashlog slope ofdcent at very low contrasts) controls the amount of facilita-tion near threshold At high xt Equation 20 becomes dcentx t

1 w (1 a) The parameter w is the logndash log slope of thetest threshold versus pedestal contrast function (the tvc orldquodipperrdquo function) at strong pedestals One must be a bitcautious however because in yesno procedures the tvcslope can be decoupled from w if the signal detection ROCcurve has nonunity slope (Stromeyer amp Klein 1974) Theparameter a controls the point at which the dcent function be-gins to saturate Typical values of these unitless parametersare b 5 2 w 5 05 and a 5 06 (Yu Klein amp Levi 2001)The function in Equation 20 accelerates at low contrast(logndashlog slope of b) and decelerates at high contrasts (logndashlog slope of 1 w) in general agreement with a broadrange of data

For 2AFC z 5 dcentIuml2 so from Equation 3 the connec-tion between dcent and probability correct is

(21)

To establish the connection between the PFs specified inEquations12 and 20ndash21 one must first have the two agreeat threshold For 2AFC with the dcent 5 1 the threshold atP 5 7602 can be enforced by choosing k 5 04795 (seethe discussion following Equation 18C) so that Equa-tions 1 and 12 become

(22)

Modifying k as in Equation 22 leaves the Weibull shapeunchanged and shifts only the definition of threshold Ifb 5 106b w 5 1 039b and a 5 0614 then for allvalues of b the Weibull and dcent functions (Equations 21and 22 vs Equation 12) differ by less than 00004 for allstimulus strengths At very low stimulus strengths b 5b The value b 5 106b is a compromise for getting anoverall optimal fit

Strasburger (2001a) is concerned with the same issueIn his Table 1 he reports dcent logndashlog slopes of [8847

P xtxt( ) = 1 1 5 4795( )

b

P xd x

tt( ) = +

cent( )eacute

eumlecircecirc

5 5erf2

cent( ) =d xx

a a xt

tb

tb w+ 1 + 1( )

var( )( )

YP P

N

dP y

dy

t

t

=( )eacute

eumlecircecirc

12

cent =( )

= =b g b g bthresh

dP x

dx

t

t

e( )log ( )

( ) 12

2347 1

MEASURING THE PSYCHOMETRIC FUNCTION 1433

18379 3131 4421] for b 5 [1 2 35 5] If the dcent slopeis taken to be a measure of the dcent exponent b then theratio bb is [08847 09190 08946 08842] for the fourb values Our value of bb 5 106 differs substantiallyfrom 08 (Pelli 1985) and 088 (Strasburger 2001a) Pelli(1985) and Strasburger (2001a) used dcent functions with a 51 in Equation 20 With a 5 0614 our dcent function (Equa-tion 20) starts saturating near threshold in agreement withexperimental data The saturation lowers the effective valueof b I was very surprised to find that the StromeyerndashFoley dcent function did such a good job in matching theWeibull function across the whole range of b and x Fora long time I had thought a different fit would be neededfor each value of b

For the same Weibull function in a yesno method (falsealarm rate 5 50) the parameters b and w are identicalto the 2AFC case above Only the value of a shifts froma 5 0614 (2AFC) to a 5 054 (yesno) In order to makext 5 1 at dcent 5 1 k becomes 03173 (see Equation 18C)instead of 04795 (2AFC) As with 2AFC the differencebetween the Weibull and dcent function is less than 0004(lt04) for all stimulus strengths and all values of b andfor both a linear and a logarithmic abscissa These valuesmean that for both the yes no and the 2AFC methods thedcent function (Equation 20) corresponding to a Weibullfunction with b 5 2 has a low-contrast logndashlog slope ofb 5 212 and a high-contrast logndashlog slope of 1 w 5078 The high-contrast tvc (test contrast vs pedestal con-trast discrimination function sometimes called the ldquodip-perrdquo function) logndashlog slope would be w 5 022

I also carried out a similar analysis asking what cumu-lative normal s is best matched to the Weibull I founds 5 11720b However the match is poor with errors of4 (rather than lt004 in the preceding analysis) Thepoor fit of the two functions makes the fitting proceduresensitive to the weighting used in the fitting procedure Ifone uses a weighting factor of [P(1 P)N] where P is thePF one will find a quite different value for s than if oneignored the weighting or if one chose the weighting on thebasis of the data rather than the fitting function

II COMPARING EXPERIMENTAL TECHNIQUES FOR MEASURING

THE PSYCHOMETRIC FUNCTION

This section inspired by Linschoten et al (2001) is anexamination of various experimental methods for gather-ing data to estimate thresholds Linschoten et al comparethree 2AFC methods for measuring smell and taste thresh-olds a maximum-likelihood adaptive method an upndashdownstaircase method and an ascending method of limits Thespecial aspect of their experiments was that in measuringsmell and taste each 2AFC trial takes between 40 and60 sec (Lew Harvey personal communication) because ofthe long duration needed for the nostrils or mouth to re-cover their sensitivity The purpose of the Linschoten et alpaper is to examine which of the three methods is mostaccurate when the number of trials is low Linschoten et al

conclude that the 2AFC method is a good method for thistask My prior experience with forced choice tasks and lownumber of trials (Manny amp Klein 1985 McKee et al1985) made me wonder whether there might be better meth-ods to use in circumstances in which the number of trialsis limited This topic is considered in Section II

Compare 2AFC to YesNoThe commonly accepted framework for comparing

2AFC and yesno threshold estimates is signal detectiontheory Initially I will make the common assumption thatthe dcent function is a power function (Equation 20 witha 5 1)

(23)

Later in this section the experimentally more accurateform Equation 20 with a 1 will be used The varianceof the threshold estimate will now be calculated for 2AFCand yesno for the simplest case of trials placed at a sin-gle stimulus level y the goal of most adaptive methodsThe PF slope relates ordinate variance to abscissa vari-ance (from Gourevitch amp Galanter 1967)

(24)

where Y 5 y yt is the threshold estimate for a natural logabscissa as given in Equation 9 This formula for var(Y )is called the asymptotic formula in that it becomes exactin the limit of large number of total trials N Wichmannand Hill (2001b) show that for the method of constantstimuli a proper choice of testing levels brings one to theasymptotic region for N 120 (the smallest value thatthey explored) Improper choice of testing levels requiresa larger N in order to get to the asymptotic region Equa-tion 24 is based on having the data at a single level as oc-curs asymptotically in adaptive procedures In addition thedefinition of threshold must be at the point being testedIf levels are tested away from threshold there will be anadditional contribution to the threshold variance (Finney1971 McKee et al 1985) if the slope is a variable param-eter in the fitting procedure Equation 24 reaches its as-ymptotic value more rapidly for adaptive methods with fo-cused placement of trials than with the constant stimulusmethod when slope is a free parameter

Equation 24 needs analytic expressions for the deriva-tive and the variance of dcent The derivative is obtained fromEquation 23

(25)

The variance of dcent for both yesno (dcent 5 zhit 1 zcorrect rejection)and 2AFC (dcent 5 Iuml2zave) is

(26)var var( ) exp var( ) cent( ) = = ( )d z z P2 4 2p

dddy

bdt

cent = cent

var( )var( )

Yd

dddyt

= cent

centaeligegraveccedil

oumloslashdivide

2

cent = = ( )d c bytb

texp

1434 KLEIN

For the yes no case Equation 26 is based on a unityslope ROC curve with the subjectrsquos criterion at the neg-ative diagonal of the ROC curve where the hit rate equalsthe correct rejection rate This would be the most effi-cient operating point The variance for a nonunity slopeROC curve was considered by Klein Stromeyer andGanz (1974) From binomial statistics

(27)

with n 5 N for 2AFC and n 5 N2 for yesno since forthis binary yesno task the number of signal and blanktrials is half the total number of trials

Putting all of these factors together gives

(28)

for both the yesno and the 2AFC cases The only dif-ference between the two cases is the definition of n andthe definition of dcent (dcent2AFC 5 205z and dcentYN 5 2z for asymmetric criterion) When this is expressed in terms ofN (number of trials) and z (z score of probability cor-rect) we get

(29)

for both 2AFC and yesno That is when the percent cor-rects are all equal (P2AFC 5 Phit 5 Pcorrect rejection) thethreshold variance for 2AFC and yesno are equal Theminimum value of var(Y ) is 164(Nb2) occurring at z 5157 (P 5 94) This value of 94 correct is the optimumvalue at which to test for both 2AFC and yesno tasks in-dependent of b

This result that the threshold variance is the same for2AFC and yesno methods seems to disagree with Figure 4of McKee et al (1985) where the standard errors of thethreshold estimates are plotted for 2AFC and yesno asa function of the number of trials The 2AFC standarderror (their Figure 4) is twice that of yesno correspond-ing to a factor of four in variance However the McKeeet al analysis is misleading since all they (we) did for theyesno task was to use Equation 2 to expand the PF to a 0to 100 range thereby doubling the PF slope Thatrescaling is not the way to convert a 2AFC detection taskto the yesno task of the same detectability A signal de-tection methodology is needed for that conversion aswas done in the present analysis

One surprise is that the optimal testing probability isindependent of the PF slope b This finding is a generalproperty of PFs in which the dependence on the slopeparameter has the form xb The 94 level corresponds todcentYN 5 315 and dcent2AFC 5 223 For the commonly usedP 5 75 (z 5 0765 dcentYN 5 135 dcent2AFC 5 0954)var(Y ) 5 408(Nb2) which is about 25 times the opti-mal value obtained by placing the stimuli at 94 Green(1990) had previously pointed out that the 94 and 92levels were the optimal points to test for the cumulative

normal and logit functions respectively because of theminimum threshold estimate variability when one is test-ing at that point on the PF Other PFs can have optimalpoints at slightly lower levels but these are still above thelevels to which adaptive methods are normally directed

It is also useful to compare the 2AFC and yesno at thesame dcent rather than at their optimal points In that case fordcent values fixed at [1 2 3] the ratio of the 2AFC varianceto the yesno variance is [182 136 079] At low dcent thereis an advantage for the 2AFC method and that reverses athigh dcent as is expected from the optimal dcent being higher foryesno than for 2AFC In general one should run the yesno method at dcent levels above 20 (84 correct in the blankand signal intervals corresponds to dcent 5 2)

The equality of the optimal var(Y ) for 2AFC and yesno methods is not only surprising it seems paradoxicalmdashas can be seen by the following gedankenexperiment Sup-pose subjects close their eyes on the first interval of the2AFC trial and base their judgments on just the secondinterval as if they were doing a yesno task since blanksand stimuli are randomly intermixed Instead of sayingldquoyesrdquo or ldquonordquo they would respond ldquoInterval 2rdquo or ldquoInter-val 1rdquo respectively so that the 2AFC task would become ayesno task The paradox is that one would expect the vari-ance to be worse than for the eyes open 2AFC case sinceone is ignoring information However Equation29 showsthat the 2AFC and yesno methods have identical optimalvariance I suspect that the resolution of this paradox is thatin the yesno method with a stable criterion the optimalvariance occurs at a higher dcent value (315) than the 2AFCvalue (223) The dcent power law function that I chose doesnot saturate at large dcent levels giving a benefit to the yesno method

In order to check whether the paradox still holds witha more realistic PF (a dcent function that saturates at largedcent ) I now compare the 2AFC and yesno variance usinga Weibull function with a slope b In order to connect2AFC and yesno I need to use signal detection theoryand the StromeyerndashFoley dcent function (Equation 20) withthe ldquoWeibull parametersrdquo b 5 106b a 5 0614 and w 51 39b Following Equation 22 I pointed out that withthese parameter values the difference between the 2AFCWeibull function and the 2AFC dcent function (converted toa probability ordinate) is less than 0004 After the de-rivative Equation 25 is appropriately modified we findthat the optimal 2AFC variance is 342(Nb2) For the yesno case the optimal variance is 402(Nb2) This result re-solves the paradox The optimal 2AFC variance is about18 lower than the yesno variance so closing onersquos eyesin one of the intervals does indeed hurt

We shouldnrsquot forget that if one counts stimulus presen-tations Np rather than trials N the optimal 2AFC variancebecomes 684(Npb2) as opposed to 402(Npb2) for yesnoThus for the Linschoten experiments with each stimuluspresentation being slow the yesno methodology is moreaccurate for a given number of stimulus presentationseven at low dcent values Kaernbach (1990) compares 2AFC

var( ) exp( )

( )Y z

P P

N bz= ( )2

122

p

var( ) exp( )

( )Y z

P P

n bd= ( ) cent

412

2p

var( )( )

PP P

n= 1

MEASURING THE PSYCHOMETRIC FUNCTION 1435

and yesno adaptive methods with simulations and ac-tual experiments His abscissa is number of presenta-tions and his results are compatible with our findings

The objective yesno method that I have been discussingis inefficient since 50 of the trials are blanks Greaterefficiency is possible if several points on the PF are mea-sured (method of constant stimuli) with the number of blanktrials the same as at the other levels For example if fivelevels of stimuli were measured including blanks then theblanks would be only 20 of the total rather than 50 Forthe 2AFC paradigm the subject would still have to look at50 of the presentations being blanks The yesno eff i-ciency can be further improved by having the subjectrespond to the multiple stimuli with multiple ratings ratherthan a binary yesno response This rating scale methodof constant stimuli is my favorite method and I have beenusing it for 20 years (Klein amp Stromeyer 1980) The mul-tiple ratings allow one to measure the psychometric func-tion shape for dcents as large as 4 (well into the dipper regionwhere stimuli are slightly above threshold) It also allowsone to measure the ROC slope (presence of multiplicativenoise) Simulations of its benefits will be presented inKlein (2002)

Insight into how the number of responses affects thresh-old variance can be obtained by using the cumulative nor-mal PF (Equation 17) considered by Kaernbach (2001b)and Miller and Ulrich (2001) with g 5 0 Typically this isthe case of yesno discrimination but it can also be 2AFCdiscrimination with an ordinate going from 0 to 100as I have discussed in connection with Figure 3 For a cu-mulative normal PF with a binary response and for a sin-gle stimulus level being tested the threshold variance is(Equation 19) as follows

(30)

from Equation 27 and forthcoming Equations 40 and 41The optimal level to test is at P 5 05 so the optimal vari-ance becomes

(31)

If instead of a binary response the observer gives an ana-log response (many many rating categories) then the vari-ance is the familiar connection between standard deviationand standard error

(32)

With four response categories the variance is 119s2Nwhich is closer to the analog than to the binary case Ifstimuli with multiple strengths are intermixed (methodof constant stimuli) then the benefit of going from a bi-nary response to multiple responses is greater than that in-dicated in Equations 31ndash32 (Klein 2002)

A brief list of other problems with the 2AFC method ap-plied to detection comprises the following

1 2AFC requires the subject to memorize and thencompare the results of two subjective responses It is cog-nitively simpler to report each response as it arrives Sub-jects without much experience can easily make mistakes(lapses) simply becoming confused about the order of thestimuli Klein (1985) discusses memory load issues rele-vant to 2AFC

2 The 2AFC methodology makes various types ofanalysis and modeling more difficult because it requiresthat one average over all criteria The response variable in2AFC is the Interval 2 activation minus the Interval 1 ac-tivation (see the abscissa in Figure 2) This variable goesfrom minus infinity to plus infinity whereas the yesnovariable goes from zero to infinity It therefore seems moredifficult to model the effects of probability summation anduncertainty for 2AFC than for yesno

3 Models that relate psychophysical performance to un-derlying processes require information about how the noisevaries with signal strength (multiplicative noise) The rat-ing scale method of constant stimuli not only measures dcentas a function of contrast it also is able to provide an esti-mate of how the variance of the signal distribution (theROC slope) increases with contrast The 2AFC methodlacks that information

4 Both the yesno and 2AFC methods are supposed toeliminate the effects of bias Earlier I pointed out that in myrecent 2AFC discrimination experiments when the stim-ulus is near threshold I find I have a strong bias in favor ofresponding ldquoStimulus 2rdquo Until I learned to compensatefor this bias (not typically done by naiumlve subjects) my dcentwas underestimated As I have discussed it is possible tocompensate for this bias but that is rarely done

Improving the Brownian (Up ndashDown) StaircaseAdaptive methods with a goal of placing trials at an

optimal point on the PF are efficient Leek (2001) pro-vides a nice overview of these methods There are twogeneral categories of adaptive methods those based onmaximum (or mean) likelihood and those based on sim-pler staircase rules The present section is concernedwith the older staircase methods which I will callBrownian methods I used the name ldquoBrownianrdquo becauseof the upndashdown staircasersquos similarity to one-dimensionalBrownian motion in which the direction of each step israndom The familiar upndashdown staircase is Brownianbecause at each trial the decision about whether to go upor down is decided by a random factor governed by prob-ability P(x) I want to make this connection to Brownianmotion because I suspect that there are results in thephysics literature that are relevant to our psychophysicalstaircases I hope these connections get made

A Brownian staircase has a minimal memory of the runhistory it needs to remember the previous step (includingdirection) and the previous number of reversals or numberof trials (used for when to stop and for when to decreasestep size) A typical Brownian rule is to decrease signalstrength by one step (like 1 dB) after a correct responseand to increase it three steps after an incorrect response (a

var( thresh) = s 2

N

var( thresh) = timesp s2

2

N

var(var( )

( )exp thresh) =aeligegraveccedil

oumloslashdivide

= ( )P

N dPdy

P P zN2

22

2 1p s

1436 KLEIN

one downndashthree up rule) I was pleased to learn recentlythat his rule works nicely for an objective yesno task(Kaernbach 1990) as well as 2AFC as discussed earlier

In recent years likelihood methods (Pentland 1980Watson amp Pelli 1983) have become popular and havesomewhat displaced Brownian methods There is a wide-spread feeling that likelihood methods have substantiallysmaller threshold variance than do staircases in whichthresholds are obtained by simply averaging the levelstested Given the importance of this issue for informingresearchers about how best to measure thresholds it issurprising that there have been so few simulations com-paring staircases and optimal methods Pentlandrsquos (1980)results showing very large threshold variance for a stair-case method are so dramatically different from the re-sults of my simulations that I am skeptical The best pre-vious simulations of which I am aware are those of Green(1990) who compared the threshold variance from sev-eral 2AFC staircase methods with the ideal variance(Equation 24) He found that as long as the staircase ruleplaced trials above 80 correct the ratio of observed toideal threshold variance was about 14 This value is thesquare of the inverse of Greenrsquos finding that the ratio ofthe expected to the observed sigma is about 85

In my own simulations (Klein 2002) I found that aslong as the final step size was less than 02s the thresh-old variance from simply averaging over levels was closeto the optimal value Thus the staircase threshold varianceis about the same as the variance found by other optimalmethods One especially interesting finding was that thecommon method of averaging an even number of rever-sal points gave a variance that was about 10 higher thanaveraging over all levels This result made me wonderhow the common practice of averaging reversals ever gotstarted Since Green (1990) averaged over just the reversalpoints I suspect that his results are an overestimate of thestaircase variance In addition since Green specifies stepsize in decibels it is difficult for me to determine whetherhis final step size was small enough Since the thresholdvariance is inversely related to slope it is important to re-late step size to slope rather than to decibels The most nat-ural reference unit is s the standard deviation associatedwith the cumulative normal In summary I suggest that thisgeneral feeling of distrust of the simple Brownian staircaseis misguided and that simple staircases are often as goodas likelihood methods In the section Summary of Advan-tages of Brownian Staircase I point out several advantagesof the simple staircase over the likelihood methods

A qualification should be made to the claim above thataveraging levels of a simple staircase is as good as a fancylikelihood method If the staircase method uses a too smallstep size at the beginning and if the starting point is farfrom threshold then the simple staircase can be inefficientin reaching the threshold region However at the end ofthat run the astute experimenter should notice the largediscrepancy between the starting point and the thresholdand the run should be redone with an improved startingpoint or with a larger initial step size that will be reducedas the staircase proceeds

Increase number of 2AFC response categories In-dependent of the type of adaptive method used the 2AFCtask has a flaw whereby a series of lucky guesses at lowstimulus levels can erroneously drive adaptive methods tolevels that are too low Some adaptive methods with a smallnumber of trials are unable to fully recover from a string oflucky guesses early in the run One solution is to allow thesubject to have more than two responses For reasons thatI have never understood people like to have only two re-sponse categories in a 2AFC task Just as I am reluctant totake part in 2AFC experiments I am even more reluctantto take part in two response experiments The reason issimple I always feel that I have within me more than onebit of information after looking at a stimulus I usually haveat least four or more subjective states (2 bits) for a yesnotask HN LN LY HY where H and L are high and lowconfidence and N and Y are did not and did see it For a2AFC my internal states are H1 L1 L2 H2 for high andlow confidence on Intervals 1 and 2 Kaernbach (2001a) inthis special issue does an excellent job of justifying thelow-confidence response category He provides solid ar-guments simulations and experimental data on how alow-confidence category can minimize the ldquolucky guessrdquoproblem

Kaernbach (2001a) groups the L1 and L2 categoriestogether into a ldquodonrsquot knowrdquo category and argues per-suasively that the inclusion of this third response can in-crease the precision and accuracy of threshold estimateswhile also leaving the subject a happier person He callsit the ldquounforced choicerdquo method Most of his article con-cerns the understandable fear of introducing a responsebias exactly what 2AFC was invented to avoid He has de-veloped an intelligent rule for what stimulus level shouldfollow a ldquodonrsquot knowrdquo response Kaernbach shows withboth simulations and real data that with his method thispotential response bias problem has only a second-ordereffect on threshold estimates (similar to the 2AFC inter-val bias discussed around Equation 7) and its advantagesoutweigh the disadvantages I urge anyone using the 2AFCmethod to read it I conjecture that Kaernbachrsquos method isespecially useful in trimming the lower tail of the distri-bution of threshold estimates and also in reducing the in-terval bias of 2AFC tasks since the low confidence ratingwould tend to be used for trials on which there is the mostuncertainty

Although Kaernbach has managed to convince me ofthe usefulness of his three-response method he might havetrouble convincing others because of the response biasquestion I should like to offer a solution that might quellthe response bias fears Rather than using 2AFC with threeresponses one can increase the number of responses tofourmdashH1 L1 L2 H2mdashas discussed two paragraphsabove The L rating can be used when subjects feel theyare guessing

To illustrate how this staircase would work considerthe case in which we would like the 2AFC staircase to con-verge to about 84 correct A 5 upndash1 down staircase(5 levels up for a wrong response and 1 level down for acorrect response) leads to an average probability correct

MEASURING THE PSYCHOMETRIC FUNCTION 1437

of 56 5 8333 If one had zero information about thetrial and guessed then one would be correct half the timeand the staircase would shift an average of (5 1)2 512 levels Thus in his scheme with a single ldquodonrsquot knowrdquoresponse Kaernbach would have the staircase move up2 levels following a ldquodonrsquot knowrdquo response One mightworry that continued use of the ldquodonrsquot knowrdquo responsewould continuously increase the level so that one wouldget an upward bias But Kaernbach points out that theldquodonrsquot knowrdquo response replaces incorrect responses to agreater extent than it replaces correct responses so thatthe 12 level shift is actually a more negative shift than the15 shift associated with a wrong response For my sug-gestion of four responses the step shifts would be ndash1 1113 and 15 levels for responses HC LC LI and HIwhere H and L stand for high and low confidence and Cand I stand for correct and incorrect A slightly more con-servative set of responses would be ndash1 0 14 15 inwhich case the low-confidence correct judgment leaves thelevel unchanged In all these examples including Kaern-bachrsquos three-response method the low-confidence re-sponse has an average level shift of 12 when one is guess-ing The most important feature of the low confidencerating is that when one is below threshold giving a lowconfidence rating will prevent one from long strings oflucky guesses that bring one to an inappropriately lowlevel This is the situation that can be difficult to overcomeif it occurs in the early phase of a staircase when the stepsize can be large The use of the confidence rating wouldbe especially important for short runs where the extra bitof information from the subject is useful for minimizingerroneous estimates of threshold

Another reason not to be fearful of the multiple-response2AFC method is that after the run is finished one can es-timate threshold by using a maximum likelihood proce-dure rather than by averaging the run locations Standardlikelihood analysis would have to be modified for Kaern-bachrsquos three-response method in order to deal with theldquodonrsquot knowrdquo category For the suggested four-responsemethod one could simply ignore the confidence ratings forthe likelihood analysis My simulations discussed earlierin this section and Greenrsquos (1990) reveal that averagingover the levels (excluding a number of initial levels toavoid the starting point bias) has about as good a preci-sion and accuracy as the best of the likelihood methodsIf the threshold estimates from the likelihood method andfrom the averaging levels method disagree that run couldbe thrown out

Summary of advantages of Brownian staircaseSince there is no substantial variance advantage to thefancier likelihood methods one can pay more attention toa number of advantages associated with simple staircases

1 At the beginning of a run likelihood methods mayhave too little inertia (the sequential stimulus levels jumparound easily) unless a ldquostiff rdquo prior is carefully chosenBy inertia I refer to how difficult it is for the next level todeviate from the present level At the beginning of a runone would like to familiarize the subject with the stimu-

lus by presenting a series of stimuli that gradually increasein difficulty This natural feature of the Brownian stair-case is usually missing in QUEST-like methods Theslightly inefficient first few trials of a simple staircasemay actually be a benefit since it gives the subject somepractice with slightly visible stimuli

2 Toward the end of a maximum likelihood run one canhave the opposite problem of excessive inertia wherebyit is difficult to have substantial shifts of test contrastThis can be a problem if nonstationary effects are presentFor example thresholds can increase in mid-run owingto fatigue or to adaptation if a suprathreshold pedestal ormask is present In an old-fashioned Brownian staircasewith moderate inertia a quick look at the staircase trackshows the change of threshold in mid-run That observa-tion can provide an important clue to possible problemswith the current procedures A method that reveals non-stationarity is especially important for difficult subjectssuch as infants or patients in whom fatigue or inatten-tion can be an important factor

3 Several researchers have told me that the simplicity ofthe staircase rules is itself an advantage It makes writingprograms easier It is nicer for teaching students It alsosimplifies the uncertain business of guessing likelihoodpriors and communicating the methodology when writ-ing articles

Methods That Can Be Used to Estimate Slope as Well as Threshold

There are good reasons why one might want to learnmore about the PF than just the single threshold valueEstimating the PF slope is the first step in this directionTo estimate slope one must test the PF at more than onepoint Methods are available to do this (Kaernbach 2001bKontsevich amp Tyler 1999) and one could even adapt sim-ple Brownian staircase procedures to give better slope es-timates with minimum harm to threshold estimates (seethe comments in Strasburger 2001b)

Probably the best method for getting both thresholds andslopes is the Y (Psi) method of Kontsevich and Tyler(1999) The Y method is not totally novel it borrows tech-niques from many previous researchersmdashas Kontsevichand Tyler point out in a delightful paragraph that both clar-ifies their method and states its ancestry This paragraphsummarizes what is probably the most accurate and preciseapproach to estimating thresholds and slope so I reproduceit here

To summarize the Y method is a combination of solutionsknown from the literature The method updates the poste-rior probability distribution across the sampled space ofthe psychometric functions based on Bayesrsquo rule (Hall1968 Watson amp Pelli 1983) The space of the psychome-tric functions is two-dimensional (King-Smith amp Rose1997 Watson amp Pelli 1983) Evaluation of the psycho-metric function is based on computing the mean of theposterior probability distribution (Emerson 1986 King-Smith et al 1994) The termination rule is based on thenumber of trials as the most practical option (King-Smith

1438 KLEIN

et al 1994 Watson amp Pelli 1983) The placement of eachnew trial is based on one-step ahead minimum search(King-Smith 1984) of the expected entropy cost function(Pelli 1987)

A simplified rough description of the Y trial placementfor 2AFC runs is that initially trials are placed at about the90 correct level to establish a solid threshold Oncethreshold has been roughly established trials are placed atabout the 90 and the 70 levels to pin down the slopeThe precise levels depend on the assumed PF shape

Before one starts measuring slopes however one mustfirst be convinced that knowledge of slope can be impor-tant It is so for several reasons

1 Parametric fitting procedures either require knowl-edge of slope b or must have data that adequately con-strain the slope estimate Part of the discussion that fol-lows will be on how to obtain reliable threshold estimateswhen slope is not well known A tight knowledge of slopecan minimize the error of the threshold estimate

2 Strasburger (2001a 2001b) gives good argumentsfor the desirability of knowing the shape of the PF He isinterested in differences between foveal and peripheralvisual processing and he uses the PF slope as an indica-tion of differences

3 Slopes are different for different stimuli and thiscan lead to misleading results if slope is ignored It mayhelp to describe my experience with the Modelfest pro-ject (Carney et al 2000) to make this issue concreteThis continuing project involves a dozen laboratoriesaround the world We have measured detection thresh-olds for a wide variety of visual stimuli The data areused for developing an improved model of spatial visionSome of the stimuli were simple easily memorized lowspatial frequency patterns that tend to have shallowerslopes because a stimulus-known-exactly template canbe used efficiently and with minimal uncertainty On theother hand tiny Modelfest stimuli can have a good dealof spatial uncertainty and are therefore expected to pro-duce PFs with steeper slopes Because of the slope changethe pattern of thresholds will depend on the level atwhich threshold is measured When this rich dataset isanalyzed with f ilter models that make estimates formechanism bandwidths and mechanism pooling thesedifferent slope-dependent threshold patterns will lead todifferent estimates for the model parameters Several ofus in the Modelfest data gathering group argued that weshould use a data-gathering methodology that wouldallow estimation of PF slope of each of the 45 stimuliHowever I must admit that when I was a subject my pa-tience wore thin and I too chose the quicker runs Al-though I fully understand the attraction of short runs fo-cused on thresholds and ignoring slope I still think thereare good reasons for spreading trials out For one thinginterspersing trials that are slightly visible helps the sub-ject maintain memory of the test pattern

4 Slopes can directly resolve differences between mod-els In my own research for example my colleagues and

I are able to use the PF slope to distinguish long-range fa-cilitation that produces a low-level excitation or noise re-duction (typical with a cross-orientation surround) from ahigher level uncertainty reduction (typical with a same-orientation surround)

Throwing-Out Rules Goodness-of-FitThere are occasions when a staircase goes sour Most

often this happens when the subjectrsquos sensitivity changesin the middle of a run possibly because of fatigue It istherefore good to have a well-specified rule that allowsnonstationary runs to be thrown out The throwing-out rulecould be based on whether the PF slope estimate is toolow or too high Alternatively a standard approach is tolook at the chi-square goodness-of-f it statistic whichwould involve an assumption about the shape of the un-derlying PF I have a commentary in Section III regard-ing biases in the chi-square metric that were found byWichmann and Hill (2001a) Biases make it difficult touse standard chi-square tables The trouble with this ap-proach of basing the goodness-of-fit just on the cumulateddata is that it ignores the sequential history of the run TheBrownian upndashdown staircase makes the sequential historyvividly visible with a plot of sequentially tested levelsWhen fatigue or adaptation sets in one sees the tested lev-els shift from one stable point to another Leek Hanna andMarshall (1991) developed a method of assessing the non-stationarity of a psychometric function over the course ofits measurement using two interleaved tracks EarlierHall (1983) also suggested a technique to address the sameproblem I encourage further research into the develop-ment of a non-stationarity statistic that can be used to dis-card bad runs This statistic would take the sequential his-tory into account as well as the PF shape and the chi-square

The development of a ldquothrowing-outrdquo rule is the ex-treme case of the notion of weighted averaging One im-portant reason for developing good estimators for good-ness-of-fit and good estimators for threshold variance isto be able to optimally combine thresholds from repeatedexperiments I shall return to this issue in connectionwith the Wichmann and Hill (2001a) analysis of bias ingoodness-of-fit metrics

Final Thought The Method Should Depend on the Situation

Also important is that the method chosen should be ap-propriate for the task For example suppose one is usingwell-experienced observers for testing and retesting sim-ilar conditions in which one is looking for small changesin threshold or one is interested in the PF shape In suchcases one has a good estimate of the expected thresholdand the signal detection method of constant stimuli withblanks and with ratings might be best On the other handfor clinical testing in which one is highly uncertain aboutan individualrsquos threshold and criterion stability a 2AFClikelihood or staircase method (with a confidence rating)might be best

MEASURING THE PSYCHOMETRIC FUNCTION 1439

III DATA-FITTING METHODS FOR ESTIMATING THRESHOLD AND SLOPEPLUS GOODNESS-OF-FIT ESTIMATION

This final section of the paper is concerned with whatto do with the data after they have been collected Wehave here the standard Bayesian situation Given the PFit is easy to generate samples of data with confidence in-tervals But we have the inverse situation in which we aregiven the data and need to estimate properties of the PFand get confidence limits for those estimates The pair ofarticles in this special issue by Wichmann and Hill (2001a2001b) as well as the articles by King-Smith et al (1994)Treutwein (1995) and Treutwein and Strasburger (1999)do an outstanding job of introducing the issues and providemany important insights For example Emerson (1986)and King-Smith et al emphasize the advantages of meanlikelihood over maximum likelihood The speed of mod-ern computers makes it just as easy to calculate meanlikelihood as it is to calculate maximum likelihood As an-other example Treutwein and Strasburger (1999) demon-strate the power of using Bayesian priors for estimatingparameters by showing how one can estimate all four pa-rameters of Equation 1A in fewer than 100 trials This im-portant finding deserves further examination

In this section I will focus on four items First I will dis-cuss the relative merits of parametric versus nonparamet-ric methods for estimating the PF properties The paperby Miller and Ulrich (2001) provides a detailed analysis ofa nonparametric method for obtaining all the moments ofthe PF I will discuss some advantages and limitations oftheir method Second is the question of goodness-of-fitWichmann and Hill (2001a) show that when the numberof trials is small the goodness-of-fit can be very biasedI will examine the cause of that bias Third is the issue dis-cussed by Wichmann and Hill (2001a) that of how lapseseffect slope estimates This topic is relevant to Strasburg-errsquos (2001b) data and analysis Finally there is the possiblycontroversial topic of an upward slope bias in adaptivemethods that is relevant to the papers of Kaernbach (2001b)and Strasburger (2001b)

Parametric Versus Nonparametric FitsThe main split between methods for estimating pa-

rameters of the psychometric function is the distinctionbetween parametric and nonparametric methods In aparametric method one uses a PF that involves severalparameters and uses the data to estimate the parametersFor example one might know the shape of the PF and es-timate just the threshold parameter McKee et al (1985)show how slope uncertainty affects threshold uncertaintyin probit analysis The two papers by Wichmann and Hill(2001a 2001b) are an excellent introduction to many is-sues in parametric fitting An example of a nonparamet-ric method for estimating threshold would be to averagethe levels of a Brownian staircase after excluding the firstfour or so reversals in order to avoid the starting level biasBefore I served as guest editor of this special symposium

issue I believed that if one had prior knowledge of the exactPF shape a parametric method for estimating threshold wasprobably better than nonparametric methods After read-ing the articles presented here as well as carrying out myrecent simulations I no longer believe this I think thatin the proper situation nonparametric methods are as goodas parametric methods Furthermore there are situationsin which the shape of the PF is unknown and nonparamet-ric methods can be better

Miller and Ulrichrsquos Article on the Nonparametric SpearmanndashKaumlrber Method of Analysis

The standard way to extract information from psycho-metric data is to assume a functional shape such as a Wei-bull cumulative normal logit dcent and then vary the pa-rameters until likelihood is maximized or chi-square isminimized Miller and Ulrich (2001) propose using theSpearmanndashKaumlrber (SK) nonparametric method whichdoes not require a functional shape They start with a prob-ability density function defined by them as the derivativeof the PF

(33)

where s(i) is the stimulus intensity at the ith level andP(i) is the probability correct at the that level The notationpdf(i) is meant to indicate that it is the pdf for the intervalbetween i ndash 1 and i Miller and Ulrich give the r th momentof this distribution as their Equation 3

(34)

The next four equations are my attempt to clarify their ap-proach I do this because when applicable it is a finemethod that deserves more attention Equation 34 prob-ably came from the mean value theorem of calculus

(35)

where f (s) 5 sr11 and f cent(smid) 5 (r 1 1)smidr is the deriv-

ative of f with respect to s somewhere within the intervalbetween s(i 1) and s(i) For a slowly varying functionsmid would be near the midpoint Given Equation35 Equa-tion 34 becomes

(36)

where Ds 5 s(i) s(i 1) Equation 36 is what one wouldmean by the rth moment of the distribution The case r 50 is important since it gives the area under the pdf

(37)

by using the definition in Equation 33 The summation inEquation 37 gives the area under the pdf For it to be aproper pdf its area must be unity which means that theprobability at the highest level P(imax) must be unity andthe probability at the lowest level Pmin must be zero asMiller and Ulrich (2001) point out

m0 = =aringpdf( ) ( ) ( ) max mini s P i P ii

D

mrr

i

i s s= aring pdf mid( ) D

f s i f s i f s s i s i( ) ( ) ( ) ( ) ( ) [ ] [ ] = cent [ ]1 1mid

mrr r

ri s i s i=

+ [ ]aring + +11

11 1pdf( ) ( ) ( )

pdf( )( ) ( )

( ) ( )i

P i P i

s i s i= 1

1

1440 KLEIN

The next simplest SK moment is for r 5 1

(38)

from Equation 33 This first moment provides an esti-mate of the centroid of the distribution corresponding tothe center of the psychometric function It could be usedas an estimate of threshold for detection or PSE for dis-crimination An alternate estimator would be the medianof the pdf corresponding to the 50 point of the PF (theproper definition of PSE for discrimination)

Miller and Ulrich (2001) use Monte Carlo simulationsto claim that the SK method does a better job of extract-ing properties of the psychometric function than do para-metric methods One must be a bit careful here since intheir parametric fitting they generate the data by usingone PF and then fit it with other PFs But still it is a sur-prising claim if this simple method is so good why havepeople not been using it for years There are two possibleanswers to this question One is that people never thoughtof using it before Miller and Ulrich The other possibil-ity is that it has problems I have concluded that the an-swer is a combination of both factors I think that there isan important place for the SK approach but its limitationsmust be understood That is my next topic

The most important limitation of the SK nonparamet-ric method is that it has only been shown to work well formethod of constant stimulus data in which the PF goesfrom 0 to 1 I question whether it can be applied with thesame confidence to 2AFC and to adaptive procedures withunequal numbers of trials at each level The reason for thislimitation is that the SK method does not offer a simpleway to weight the different samples Let us consider twosituations with unequal weighting adaptive methods and2AFC detection tasks

Adaptive methods place trials near threshold but withvery uneven distribution of number of trials at differentlevels The SK method gives equal weighting to sparselypopulated levels that can be highly variable thus con-tributing to a greater variance of parameter estimates Letus illustrate this point for an extreme case with five levelss 5 [1 2 3 4 5] Suppose the adaptive method placed [11 1 98 1] trials at the five levels and the number correctwere [0 1 1 49 1] giving probabilities [0 1 1 5 1] andpdf 5 [1 0 5 5] The following PEST-like (Taylor ampCreelman 1967) rules could lead to this unusual dataMove down a level if the presently tested level has greaterthan P1 correct move up three levels if the presentlytested level has less than P2 correct P1 can be anynumber less than 33 and P2 can be any number greaterthan 67 The example data would be produced by start-ing at Level 5 and getting four in a row correct followedby an error followed by a wide set possible of alternat-ing responses for which the PEST rules would leave thefourth level as the level to be tested The SK estimate of the

center of the PF is given by Equation 38 to be micro1 5 200Since a pdf with a negative probability is distasteful tosome one can monotonize the PF according to the proce-dure of Miller and Ulrich (which does include the numberof trials at each level) The new probabilities are [0 5151 51 1] with pdf 5 [51 0 0 49] The new estimate ofthreshold is micro1 5 297 However given the data with 4998correct at Level 4 an estimate that includes a weightingaccording to the number of trials at each level would placethe centroid of the distribution very close to micro1 5 4 Thuswe see that because the SK procedure ignores the numberof trials at each level it produces erroneous estimates ofthe PF properties This example with shifting thresholdswas chosen to dramatize the problems of unequal weight-ing and the effect of monotonizing

A similar problem can occur even with the method ofconstant stimuli with equal number of trials at each levelbut with unequal binomial weighting as occurs in 2AFCdata The unequal weighting occurs because of the 50lower asymptote The binomial statistics error bar of dataat 55 correct is much larger than for data at 95 cor-rect Parametric procedures such as probit analysis giveless weighting to the lower data This is why the most ef-ficient placement of trials in 2AFC is above 90 correctas I have discussed in Section II Thus it is expected thatfor 2AFC the SK method would give threshold estimatesless precise than the estimates given by a parametricmethod that includes weighting

I originally thought that the SK method would fail on alltypes of 2AFC data However in the discussion followingEquation 7 I pointed out that for 2AFC discriminationtasks there is a way of extending the data to the 0ndash100range that produces weightings similar to those for a yesnotask Rather than use the standard Equation 2 to deal withthe 50 lower asymptote one should use the method dis-cussed in connection with Figure 3 in which one can usefor the ordinate the percent of correct responses in Inter-val 2 and use the signal strength in Interval 2 minus thatin Interval 1 for the abscissa That procedure produces aPF that goes from 0 to 100 thereby making it suit-able for SK analysis while also removing a possible bias

Now let us examine the SK method as applied to datafor which it is expected to work such as discriminationPFs that go from 0 to 100 These are tasks in which thelocation of the psychometric function is the PSE and theslope is the threshold

A potentially important claim by Miller and Ulrich(2001) is that the SK method performs better than probitanalysis This is a surprising claim since their data aregenerated by a probit PF one would expect probit analy-sis to do well especially since probit analysis does espe-cially well with discrimination PFs that go from 0 to 100To help clarify this issue Miller and I did a number ofsimulations We focused on the case of N 5 40 with 8 tri-als at each of 5 levels The cumulative normal PF (Equa-tion 14) was used with equally spaced z scores and with thelowest and highest probabilities being 1 and 99 Theprobabilities at the sample points are P(i) 5 1 1222

m1

2 21

2

11

2

=

= [ ] +

aring

aring

pdf( )( ) ( )

( ) ( )( ) ( )

is i s i

P i P is i s i

MEASURING THE PSYCHOMETRIC FUNCTION 1441

50 8778 and 99 corresponding to z scores ofz(i) 5 ndash2328 1164 0 1164 and 2328 Nonmonot-onicity is eliminated as Miller and Ulrich suggest (seethe Appendix for Matlab Code 4 for monotonizing) I didMonte Carlo simulations to obtain the standard deviationof the location of the PF micro1 given by Equation38 I foundthat both the SK method and the parametric probit method(assuming a fixed unity slope) gave a standard deviationof between 028 and 029 Miller and Ulrichrsquos result re-ported in their Table 17 gives a standard error of 0071However their Gaussian had a standard deviation of 025whereas my z-score units correspond to a standard devi-ation of unity Thus their standard error must be multipliedby 4 giving a standard error of 0284 in excellent agree-ment with my value So at least in this instance the SKmethod did not do better than the parametric method (withslope fixed) and the surprise mentioned at the beginningof this paragraph did not materialize I did not examineother examples where nonparametric methods might dobetter than the matched parametric method However thesimulations of McKee et al (1985) give some insight intowhy probit analysis with floating slope can do poorly evenon data generated from probit PFs McKee et al showthat when the number of trials is low and if stimulus lev-els near both 0 and 100 are not tested the slope can bevery flat and the predicted threshold can be quite distantIf care is not taken to place a bound on these probit esti-mates one would expect to find deviant threshold esti-mates The SK method on the other hand has built-inbounds for threshold estimates so outliers are avoidedThese bounds could explain why SK can do better thanprobit When the PF slope is uncertain nonparametricmethods might work better than parametric methods adecided advantage for the SK method

It is instructive to also do an analytic calculation of thevariance based on an analysis similar to what was doneearlier If only the ith level of the five had been testedthe variance of the PF location (the PSE in a discrimina-tion task) would have been as follows (Gourevitch ampGalanter 1967)

(39)

where var(P) is given by Equation 27 and PF slope is

(40)

where for the cumulative normal dzi dxi 5 1s where sis the standard deviation of the pdf and

(41)

from Equation 3A By combining these equations one gets

(42)

Note that if all trials were placed at the optimal stimuluslocation of z 5 0 (P 5 5) Equation 42 would becomes2p 2Ni as discussed earlier When all five levels arecombined the total variance is smaller than each individ-ual variance as given by the variance summation formula

(43)

Finally upon plugging in the five values of zi rangingfrom ndash2328 to 12328 and Pi ranging from 1 to 99the standard error of the PF location is

(44)

This value is precisely the value obtained in our MonteCarlo simulations of the nonparametric SK method andalso by the slope fixed parametric probit analysis Thistriple confirmation shows that the SK method is excel-lent in the realm where it applies (equal number of trialsat levels that span 0 to 100) It also confirms our sim-ple analytic formula

The data presented in Miller and Ulrichrsquos Table 17 giveus a fine opportunity to ask about the efficiency of themethod of constant stimuli that they use For the N 5 40example discussed here the variance is

(45)

This value is more than twice the variance of p (2 Ntot)that is obtained for the same PF using adaptive methodsincluding the simple upndashdown staircase that place trialsnear the optimal point

The large variance for the Miller and Ulrich stimulusplacement is not surprising given that of the five stimu-lus levels the two at P 5 1 and 99 are quite inefficientThe inefficient placement of trials is a common problemfor the method of constant stimuli as opposed to adaptivemethods However as I mentioned earlier the method ofconstant stimuli combined with multiple response ratingsin a signal detection framework may be the best of allmethods for detection studies for revealing properties of thesystem near threshold while maintaining a high efficiency

A curious question comes up about the optimal placementof trials for measuring the location of the psychometric func-tion With the original function the optimal placement is atthe steepest part of the function (at P 5 0) However withthe SK method when one is determining the location ofthe pdf one might think that the optimal placement of sam-ples should occur at the points where the pdf is steepestThis contradiction is resolved when one realizes that onlythe original samples are independent The pdf samples areobtained by taking differences of adjacent PF values soneighboring samples are anticorrelated Thus one cannot

var tottot

= =SEN

2 3 24

SE = =var tot12 0 2844

var var tot = aring1 1i

var( )exp

PSE i i i

i

i

P Pz

N= ( ) ( )

2 12

2

ps

dP

dz

zi

i

iaeligegraveccedil

oumloslashdivide

=( )2 2

2

exp

p

dP

dx

dP

dz

dz

dxi

i

i

i

i

i

= times

var( )var

PSE i

i

i

i

P

dP

dx

=( )

aeligegraveccedil

oumloslashdivide

2

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

MEASURING THE PSYCHOMETRIC FUNCTION 1427

pattern If the reference pattern has zero contrast the taskis detection If the reference pattern has a high contrast thetask is discrimination Klein (1985) discusses these tasks interms of monopolar and bipolar cues For discriminationa bipolar cue must be available whereby the test pattern canbe either positive or negative in relation to the reference Ifone cannot discriminate the negative cue from the positivethen it can be called a detection task

Finally the constant stimuli versus adaptive method dis-tinction is based on the formerrsquos having preassigned testlevels and the latterrsquos having levels that shift to a desiredplacement The output of the constant stimulus method is

a full PF and is thus fully entitled to be included in this spe-cial issue The output of adaptive methods is typically onlya single number the threshold specifying the location butnot the shape of the PF Two of the papers in this issue(Kaernbach 2001b Strasburger 2001b) explore the pos-sibility of also extracting the slope from adaptive data thatconcentrates trials around one level Even though adaptivemethods do not measure much about the PF they are sopopular that they are well represented in this special issue

The 10 rows of Table 1 present the articles in this specialissue (including the present article) Columns 2ndash5 corre-spond to the four categories associated with the first two

Figure 3 2AFC discrimination PF from Figure 2 has been extended to the 0 to 100 range without rescaling the or-dinate In panels a and b the two right-hand panels of Figure 2 have been modified by flipping the sign of the abscissa ofthe negative slope branch where the test is in Interval 1 (I1) The new abscissa is now the stimulus strength in I2 minus thestrength in I1 The abscissa scaling has been modified to be in stimulus units In this example the test stimulus has a con-trast of 5 The ordinate in panel a is the probability that the response is I2 The ordinate in panel b is the z score of thepanel a ordinate The asterisks are at the same points as in Figure 2 Panel c is the standard signal detection picture whennoise is added to the signal The abscissa has been modified to being the activity in I2 minus the activity in I1 The unitsof activation have arbitrarily been chosen to be the same as the units of panels a and b Activity distributions are shownfor stimuli of 5 and 15 units corresponding to the asterisks of panels a and b The subjectrsquos criterion is at 5 units ofactivation The upper abscissa is in z-score units where the Gaussians have unit variance The area under the two distri-butions above the criterion are 50 and 95 in agreement with the probabilities shown in panel a

15 10 5 0 5 10 150

5

1

prob

abili

ty o

f res

pons

e =

2

(a)

15 10 5 0 5 10 152

0

2

4

z-sc

ore

of r

espo

nse

= 2

(b)

contrast of I2 minus contrast of I1

15 10 5 0 5 10 150

05

1

activity in I2 minus activity in I1

responsewhen test in I1

(c)

response when test in I2

00822 0822z-score of activity in I2 minus activity in I1 (unit variance Gaussians)

d cent = 1645

crite

rion

16451645

resp

onse

1428 KLEIN

distinctions The last column classifies the articles accord-ing to the adaptive versus constant stimuli distinction Inorder to open up the full variety of PFs for discussion andto enable a deeper understanding of the relationship amongdifferent types of PFs I will now clarify the interrelation-ship of the various categories of PFs and their connectionto the signal detection approach

Lack of adaptive method for yesno tasks with con-trolled false alarm rate In Section II I will bring up anumber of advantages of the yesno task in comparisonwith to the 2AFC task (see also Kaernbach 1990) Giventhose yesno advantages one may wonder why the 2AFCmethod is so popular The usual answer is that 2AFC hasno response bias However as has been discussed the yesno method allows an unbiased dcent to be calculated and the2AFC method does allow an interval bias that affects dcentAnother reason for the prevalence of 2AFC experimentsis that a multitude of adaptive methods are available for2AFC but barely any available for an objective yesnotask in which the false alarm rate is measured so that dcentcan be calculated In Table 1 with one exception the rowswith adaptive methods are associated with the forcedchoice method The exception is Leek (2001) who dis-cusses adaptive yesno methods in which no blank trialsare presented In that case the false alarm rate is notmeasured so dcent cannot be calculated This type of yesnomethod does not belong in the ldquoobjectiverdquo category of con-cern for the present paper The ldquo1990rdquo entries in Table 1refer to Kaernbachrsquos (1990) description of a staircasemethod for an objective yesno task in which an equalnumber of blanks are intermixed with the signal trialsSince Kaernbach could have used that method for Kaern-bach (2001b) I placed the ldquo1990rdquo entry in his slot

Kaernbachrsquos (1990) yesno staircase rules are simpleThe signal level changes according to a rule such as thefollowing Move down one level for correct responses ahit ltyes|signalgt or a correct rejection ltno|blankgt moveup three levels for wrong responses a miss ltno|signalgtor a false alarm ltyes|blankgt This rule similar to the onedownndashthree up rule used in 2AFC places the trials so thatthe average of the hit rate and correct rejection rate is 75

What is now needed is a mechanism to get the observer toestablish an optimal criterion that equalizes the number ofldquoyesrdquo and ldquonordquo responses (the ROC negative diagonal)This situation is identical to the problem of getting 2AFCobservers to equalize the number of responses to Inter-vals 1 and 2 The quadratic bias in dcent is the same in bothcases The simplest way to get subjects to equalize theirresponses is to give them feedback about any bias in theirresponses With equal ldquoyesrdquo and ldquonordquo responses the 75correct corresponds to a z score of 0674 and a dcent 5 2z 51349 If the subject does not have equal numbers of ldquoyesrdquoand ldquonordquo responses then the dcent would be calculated bydcent 5 zhit zfalse alarm I do hope that Kaernbachrsquos clever yesno objective staircase will be explored by others Oneshould be able to enhance it with ratings and multiple stim-uli (Klein 2002)

Forced choice detection As can be seen in the secondcolumn of Table 1 a popular category in this specialissue is the forced choice method for detection Thismethod is used by Strasburger (2001b) Kaernbach(2001a) Linschoten et al (2001) and Wichmann and Hill(2001a 2001b) These researchers use PFs based on prob-ability ordinates The connection between P and dcent is dif-ferent from the yesno case given by Equation 5 For2AFC signal detection theory provides a simple connec-tion between dcent and probability correct dcent(x) 5 Iuml2 z(x)In the preceding section I discussed the option of averag-ing the probabilities and then taking the z score (highthreshold approach) or averaging the z scores and then cal-culating the probability (signal detection approach) Thesignal detection method is better because it has a strongerempirical basis and avoids bias

For an m-AFC task with m 2 the connection betweendcent and P is more complicated than it is for m 5 2 The con-nection given by a fairly simple integral (Green amp Swets1966 Macmillan amp Creelman 1991) has been tabulatedby Hacker and Ratcliff (1979) and by Macmillan and Creel-man (1991) One problem with these tables is that they arebased on the assumption of unity ROC slope It is knownthat there are many cases in which the ROC slope is notunity (Green amp Swets 1966) so these tables connecting

Table 1Classification of the 10 Articles in This Special Issue

According to Three Distinctions Forced Choice Versus YesNo Detection Versus Discrimination Adaptive Versus Constant Stimuli

Detection Discrimination Adaptive or Source m-AFC YesNo m-AFC YesNo Constant Stimuli

Leek (2001) General loose g 3 loose g AWichmann amp Hill (2001a) 2AFC (3) (3) (3) CWichmann amp Hill (2001b) 2AFC (3) (3) (3) CLinschoten et al (2001) 2AFC AStrasburger (2001a) General 3 3 CStrasburger (2001b) 10AFC AKaernbach (2001a) General 3 AKaernbach (2001b) General (1990) 3 (1990) AMiller amp Ulrich (2001) for g 5 0 3 (for 0ndash100) 3 CKlein (present) General 3 3 3 Both

MEASURING THE PSYCHOMETRIC FUNCTION 1429

dcent and probability correct should be treated cautiouslyW P Banks and I (unpublished) investigated this topic andfound that near the P 5 50 correct point the dependenceof dcent on ROC slope is minimal Away from the 50 pointthe dependence can be strong

YesNo detection Linschoten et al (2001) comparethree methods (limits staircase 2AFC likelihood) for mea-suring thresholds with a small number of trials In themethod of limits one starts with a subthreshold stimulusand gradually increases the strength On each trial the ob-server says ldquoyesrdquo or ldquonordquo with respect to whether or not thestimulus is detected Although this is a classic yesno de-tection method it will not be discussed in this article be-cause there is no control of response bias That is blankswere not intermixed with signals

In Table 1 the Wichmann and Hill (2001a 2001b) arti-cles are marked with 3s in parentheses because althoughthese authors write only about 2AFC they mention that alltheir methods are equally applicable for yesno or m-AFCof any type (detectiondiscrimination) And their Matlabimplementations are fully general All the special issue ar-ticles reporting experimental data used a forced choicemethod This bias in favor of the forced choice methodol-ogy is found not only in this special issue it is widespreadin the psychophysics community Given the advantages ofthe yesno method (see discussion in Section II) I hopethat once yesno adaptive methods are accepted they willbecome the method of choice

m-AFC discrimination It is common practice to rep-resent 2AFC results as a plot of percent correct averagedover all trials versus stimulus strength This PF goes from50 to 100 The asymmetry between the lower andupper asymptotes introduces some inefficiency in thresh-old estimation as will be discussed Another problem withthe standard 50 to 100 plot is that a bias in choice ofinterval will produce an underestimate of dcent as has beenshown earlier Researchers often use the 2AFC method be-cause they believe that it avoids biased threshold estimatesIt is therefore surprising that the relatively simple correc-tion for 2AFC bias discussed earlier is rarely done

The issue of bias in m-AFC also occurs when m 2 InStrasburgerrsquos 10AFC letter discrimination task it is com-mon for subjects to have biases for responding with par-ticular letters when guessing Any imbalance in the re-sponse bias for different letters will result in a reductionof dcent as it did in 2AFC

There are several benefits of viewing the 2AFC dis-crimination data in terms of a PF going from 0 to 100Not only does it provide a simple way of viewing and cal-culating the interval bias it also enables new methods forestimating the PF parameters such as those proposed byMiller and Ulrich (2001) as will be discussed in Sec-tion III In my original comments on the Miller and Ulrichpaper I pointed out that because their nonparametric pro-cedure has uniform weighting of the different PF levelstheir method does not apply to 2AFC tasks The asymmet-ric binomial error bars near the 50 and 100 levelscause the uniform weighting of the Miller and Ulrich ap-

proach to be nonoptimal However I now realize that the2AFC discrimination task can be fit by a cumulative nor-mal going from 0 to 100 Because of that insight Ihave marked the discrimination forced choice column ofTable 1 for the Miller and Ulrich (2001) paper with theproviso that the PF goes from 0 to 100 Owing to thepopularity of 2AFC this modification greatly expands therelevance of their nonparametric approach

YesNo discrimination Kaernbach (2001b) and Millerand Ulrich (2001) offer theoretical articles that examineproperties of PFs that go from P 5 0 to 100 (P 5 pin Equation 1) In both cases the PF is the cumulative nor-mal (Equation 3) Although these PFs with g 5 0 couldbe for a yesno detection task with a zero false alarm rate(not plausible) or a forced choice detection task with an in-finite number of alternatives (not plausible either) I sus-pect that the authors had in mind a yesno discriminationtask (Table 1 col 5) A typical discrimination task in vi-sion is contrast discrimination in which the observer re-sponds to whether the presented contrast is greater than orless than a memorized reference Feedback reinforces thestability of the reference In a typical discrimination taskthe reference is one exemplar from a continuum of stimu-lus strengths If the reference is at a special zero pointrather than being an element of a smooth continuum thetask is no longer a simple discrimination task Zero con-trast would be an example of a special reference Klein(1985) discusses several examples which illustrate how anatural zero can complicate the analysis One might won-der how to connect the PF from the detection regime inwhich the reference is zero contrast to the discriminationregime in which the reference (pedestal) is at a high con-trast The dcent function to be introduced in Equation 20 doesa reasonably good job of fitting data across the full range ofpedestal strength going from detection to discrimination

The connection of the discrimination PF to the signaldetection PF is the same as that for yesno detection givenin Equation 5 dcent(x) 5 z(x) z(0) where z is the z scoreof the probability of a ldquogreater thanrdquo judgment In a dis-crimination task the bias z(0) is the z score of the prob-ability of saying ldquogreaterrdquo when the reference stimulus ispresented The stimulus strength x that gives z(x) 5 0 isthe point of subjective equality (x 5 PSE) If the cumu-lative normal PF of Equation 3 is used then the z scoreis linearly proportional to the stimulus strength z(x) 5(x PSE)threshold where threshold is defined to be thepoint at which dcent 5 1 I will come back to these distinc-tions between detection and discrimination PFs after pre-senting more groundwork regarding thresholds log ab-scissas and PF shapes (see Equations 14 and 17)

Definition of ThresholdThreshold is often defined as the stimulus strength that

produces a probability correct halfway up the PF If humansoperated according to a high-threshold assumption thisdefinition of threshold would be stable across different ex-perimental methods However as I discussed followingEquation 2 high-threshold theory has been discredited

1430 KLEIN

According to the more successful signal detection theory(Green amp Swets 1966) the dcent at the midpoint of the PFchanges according to the number of alternatives in aforced choice method and according to the false alarm ratein a yesno method This variability of dcent with method is agood reason not to define threshold as the halfway pointof the PF

A definition of threshold that is relatively independent ofthe method used for its measurement is to define thresholdas the stimulus strength that gives a fixed value of dcent Thestimulus strength that gives dcent 5 1 (76 correct for 2AFC)is a common definition of threshold Although I will showthat higher dcent levels have the advantage of giving more pre-cise threshold estimates unless otherwise stated I will takethreshold to be at dcent 5 1 for simplicity This definition ap-plies to both yesno and m-AFC tasks and to both detectionand discrimination tasks

As an example consider the case shown in Figure 1where the lower asymptote (false alarm rate) in a yesnodetection task is g 5 1587 corresponding to a z score ofz 5 ndash1 If threshold is defined to be at dcent 5 10 then fromEquation 5 the z score for threshold is z 5 0 correspond-ing to a hit rate of 50 (not quite halfway up the PF) Thisexample with a 50 hit rate corresponds to defining dcentalong the horizontal ROC axis If threshold had been de-fined to be dcent 5 2 in Figure 1 then the probability correctat threshold would be 8413 This example in which boththe hit rate and correct rejection rate are equal (both are8413) corresponds to the ROC negative diagonal

Strasburgerrsquos Suggestion on Specifying Slope A Logarithmic Abscissa

In many of the articles in this special issue a logarithmicabscissa such as decibels is used Many shapes of PFs havebeen used with a log abscissa Most delightful but frus-trating is Strasburgerrsquos (2001a) paper on the PF maximumslope He compares the Weibull logistic Quick cumula-tive normal hyperbolic tangent and signal detection dcentusing a logarithmic abscissa The present section is the out-come of my struggles with a number of issues raised byStrasburger (2001a) and my attempt to clarify them

I will typically express stimulus strength x in thresh-old units

(8)

where a is the threshold Stimulus strength will be ex-pressed in natural logarithmic units y as well as in lin-ear units x

(9)

where y 5 loge(x) is the natural log of the stimulus and Y5 loge(a) is the threshold on the log abscissa

The slope of the psychometric function P( yt) with a log-arithmic abscissa is

and the maximum slope is called bcent by Strasburger (2001a)A very different definition of slope is sometimes used bypsychophysicists the logndashlog slope of the dcent function aslope that is constant at low stimulus strengths for manyPFs The logndashlog dcent slope of the Weibull function is shownin the bottom pair of panels in Figure 1 The low-contrastlogndashlog slope is b for a Weibull PF (Equation 12) and bfor a dcent PF (Equation 20) Strasburger (2001a) shows howthe maximum slope using a probability ordinate is con-nected to the logndashlog slope using a dcent ordinate

A frustrating aspect of Strasburgerrsquos article is that theslope units of P (probability correct per loge) are not fa-miliar Then it dawned on me that there is a simple con-nection between slope with a loge abscissa slopelog 5[dP( yt)](dyt) and slope with a linear abscissa slopelin 5[dP(xt)](dxt) namely

(11)

because dyt dxt 5 [d loge(xt)] (dxt) 5 1xt At thresholdxt 5 1 ( yt 5 0) so at that point Strasburgerrsquos slope with alogarithmic axis is identical to my familiar slope plotted inthreshold units on a linear axis The simple connection be-tween slope on the log and linear abscissas converted me tobeing a strong supporter of using a natural log abscissa

The Weibull and cumulative normal psychometricfunctions To provide a background for Strasburgerrsquos ar-ticle I will discuss three PFs Weibull cumulative nor-mal and dcent as well as their close connections A com-mon parameterization of the PF is given by the Weibullfunction

(12)

where pweib(xt) the PF that goes from 0 to 100 is re-lated to Pweib(xt) the probability of a correct responseby Equation 1 b is the slope and k controls the definitionof threshold pweib(1) 5 1 k is the percent correct atthreshold (xt 5 1) One reason for the Weibullrsquos popular-ity is that it does a good job of fitting actual data In termsof logarithmic units the Weibull function (Equation 12) be-comes

(13)

Panel a of Figure 1 is a plot of Equation 12 (Weibull func-tion as a function of x on a linear abscissa) for the case b 52 and k 5 exp( 1) 5 0368 The choice b 5 2 makes theWeibull an upside-down Gaussian Panel d is the samefunction this time plotted as a function of y correspondingto a natural log abscissa With this choice of k the point ofmaximum slope as a function of yt is the threshold point( yt 5 0) The point of maximum slope at yt 5 0 is markedwith an asterisk in panel d The same point in panel a atxt 5 1 is not the point of maximum slope on a linear ab-scissa because of Equation 11 When plotted as a dcent func-tion [a z-score transform of P(xt)] in panel b the Weibullaccelerates below threshold and decelerates above thresh-

p y ktyt

weib( ) = ( )[ ]1exp

b

p x ktxt

weib ( ) = 1b

slopeslope

lin =( )

=( )

times =dP x

dx

dP y

dy

dy

dx xt

t

t

t

t

t t

log

slope y dPdy

dpdy

tt

t

( ) =

= ( )1 g

y x x y Yt t= ( ) = =log log e e a

x xt = a

(10A)

(10B)

MEASURING THE PSYCHOMETRIC FUNCTION 1431

old in agreement with a wide range of experimental dataThe acceleration and deceleration are most clearly seen bythe slope of dcent in the logndashlog coordinates of panel e Thelogndashlog dcent slope is plotted in panels c and f At xt 5 0 thelogndashlog slope is 2 corresponding to our choice of b 5 2The slope falls surprisingly rapidly as stimulus strength in-creases and the slope is near 1 at xt 5 2 If we had chosenb 5 4 for the Weibull function then the logndashlog dcent slopewould have gone from 4 at xt 5 0 to near 2 at xt 5 2Equation 20 will present a dcent function that captures thisbehavior

Another PF commonly used is based on the cumulativenormal (Equation 3)

(14A)

or

(14B)

where s is the standard error of the underlying Gaussianfunction (its connection to b will be clarified later) Notethat the cumulative normal PF in Equation 14 should beused only with a logarithmic abscissa because y goes to

yen needed for the 0 lower asymptote whereas theWeibull function can be used with both the linear (Equa-tion 12) and the log (Equation 13) abscissa

Two examples of Strasburgerrsquos maximum slope (hisEquations 7 and 17) are

(15)

for the Weibull function (Equation 13) and

(16)

for the cumulative normal (Equation 14) In Equation 15 Iset k in Equation 12 to be k 5 exp( 1) This choiceamounts to defining threshold so that the maximum slopeoccurs at threshold ( yt 5 0) Note that Equations 12ndash16are missing the (1 g) factor that are present in Strasburg-errsquos Equations 7 and 17 because we are dealing with prather than P Equations 15 and 16 provide a clear simpleand close connection between the slopes of the Weibull andcumulative normal functions so I am grateful to Stras-burger for that insight

An example might help in showing how Equation 16works Consider a 2AFC task (g 5 05) assuming a cumu-lative normal PF with s 5 10 According to Equation 16bcent 5 0399 At threshold (assumed for now to be the pointof maximum slope) P(xt 5 1) 5 75 At 10 abovethreshold P(xt 5 11) 75 1 01 bcent 5 07899 which isquite close to the exact value of 7896 This example showshow bcent defined with a natural log abscissa yt is relevantto a linear abscissa xt

Now that the logarithmic abscissa has been introducedthis is a good place to stop and point out that the log ab-scissa is fine for detection but not for discrimination since

the x 5 0 point is often in the middle of the range How-ever the cumulative normal PF is especially relevant to dis-crimination where the PF goes from 0 to 100 andwould be written as

(17A)

or

(17B)

where x 5 PSE is the point of subjective equality Theparameter a is the threshold since dcent(a) 5 zF(a)zF(0) 5 1 The threshold a is also s standard deviationsof the Gaussian probability density function (pdf ) Thereciprocal of a is the PF slope I tend to use the letter sfor a unitless standard deviation as occurs for the loga-rithmic variable y I use a for a standard deviation thathas the units of the stimulus x (like percent contrast) Thecomparison of Equation 14B for detection and Equation17B for discrimination is useful Although Equation 14is used in most of the articles in this special issue whichare concerned with detection tasks the techniques that I willbe discussing are also relevant to discrimination tasks forwhich Equation 17 is used

Threshold and the Weibull FunctionTo illustrate how a dcent 5 1 definition of threshold works

for a Weibull function let us start with a yesno task inwhich the false alarm rate is P(0) 5 1587 correspondingto a z score of zFA 5 1 as in Figure 1 The z score atthreshold (dcent 5 1) is zTh 5 zFA11 5 0 corresponding to aprobability of P(1) 5 50 From Equations 1 and 12 thek value for this definition of threshold is given by k 5[1 P(1)][1 P(0)] 5 05943 The Weibull function be-comes

(18A)

If I had defined dcent 5 2 to be threshold then zTh 5 zFA125 1 leading to k 5 (1 08413)(1 01587) 5 01886giving

(18B)

As another example suppose the false alarm rate in ayesno task is at 50 not uncommon in a signal detectionexperiment with blanks and test stimuli intermixed Thenthreshold at dcent 5 1 would occur at 8413 and the PFwould be

(18C)

with Pweib(0) 5 50 and Pweib(1) 5 8413 This casecorresponds to defining dcent on the ROC vertical intercept

For 2AFC the connection between dcent and z is z 5 dcent205Thus zTh 5 2 05 5 07071 corresponding to Pweib(1) 57602 leading to k 5 04795 in Equation 12 These con-nections will be clarified when an explicit form (Equation20)is given for the dcent function dcent(xt)

P xtxt

weib ( ) = 1 0 5 0 3173 b

P xtxt

weib ( ) = 1 1 0 1587 0 1886( ) b

P xtxt

weib ( ) = 1 1 0 1587 0 5943( ) b

z x xtF ( ) = PSE

a

p x P x xt tF F F( ) = ( ) = aelig

egraveoumloslash

PSEa

cent = =bs p s

12

0 399

cent = =b b bexp( ) 1 0 368

z yy y Y

tt( ) = =s s

p yy

tt

F F( ) =aeligegraveccedil

oumloslashdivides

1432 KLEIN

Complications With Strasburgerrsquos Advocacy of Maximum Slope

Here I will mention three complications implicated inStrasburgerrsquos suggestion of defining slope at the point ofmaximum slope on a logarithmic abscissa

1 As can be seen in Equation 11 the slopes on linearand logarithmic abscissas are related by a factor of 1xtBecause of this extra factor the maximum slope occursat a lower stimulus strength on linear as compared withlog axes This makes the notion of maximum slope lessfundamental However since we are usually interested inthe percent error of threshold estimates it turns out thata logarithmic abscissa is the most relevant axis support-ing Strasburgerrsquos log abscissa definition

2 The maximum slope is not necessarily at threshold(as Strasburger points out) For the Weibull functions de-fined in Equation 13 with k 5 exp( 1) and the cumula-tive normal function in Equation 14 the maximum slope(on a log axis) does occur at threshold However the twopoints are decoupled in the generalized Weibull functiondefined in Equation 12 For the Quick version of theWeibull function (Equation 13 with k 5 05 placingthreshold halfway up the PF) the threshold is below thepoint of maximum slope the derivative of P(xt) atthreshold is

which is slightly different from the maximum slope asgiven by Equation 15 Similarly when threshold is de-fined at dcent 5 1 the threshold is not at the point of max-imum slope People who fit psychometric functions wouldprobably prefer reporting slopes at the detection thresh-old rather than at the point of maximum slope

3 One of the most important considerations in selectinga point at which to measure threshold is the question ofhow to minimize the bias and standard error of the thresh-old estimate The goal of adaptive methods is to place tri-als at one level In order to avoid a threshold bias due to aimproper estimate of PF slope the test level should be atthe defined threshold The variance of the threshold esti-mate when the data are concentrated as a single level (aswith adaptive procedures) is given by Gourevitch and Gal-anter (1967) as

(19)

If the binomial error factor of P(1 P)N were not presentthe optimal placement of trials for measuring thresholdwould be at the point of maximum slope However thepresence of the P(1 P) factor shifts the optimal point toa higher level Wichmann and Hill (2001b) consider howtrial placement affects threshold variance for the methodof constant stimuli This topic will be considered in Sec-tion II

Connecting the Weibull and dcent FunctionsFigures 5ndash7 of Strasburger (2001a) compare the logndash

log dcent slope b with the Weibull PF slope b Strasburgerrsquosconnection of b 5 088b is problematic since the dcent ver-sion of the Weibull PF does not have a fixed logndashlog slopeAn improved dcent representation of the Weibull function isthe topic of the present section A useful parameterizationfor dcent which I shall refer to as the StromeyerndashFoley func-tion was introduced by Stromeyer and Klein (1974) andused extensively by Foley (1994)

(20)

where xt is the stimulus strength in threshold units as inEquation 8 The factors with a in the denominator are pres-ent so that dcent equals unity at threshold (xt 5 1 or x 5 a)At low xt dcent x t

ba The exponent b (the logndashlog slope ofdcent at very low contrasts) controls the amount of facilita-tion near threshold At high xt Equation 20 becomes dcentx t

1 w (1 a) The parameter w is the logndash log slope of thetest threshold versus pedestal contrast function (the tvc orldquodipperrdquo function) at strong pedestals One must be a bitcautious however because in yesno procedures the tvcslope can be decoupled from w if the signal detection ROCcurve has nonunity slope (Stromeyer amp Klein 1974) Theparameter a controls the point at which the dcent function be-gins to saturate Typical values of these unitless parametersare b 5 2 w 5 05 and a 5 06 (Yu Klein amp Levi 2001)The function in Equation 20 accelerates at low contrast(logndashlog slope of b) and decelerates at high contrasts (logndashlog slope of 1 w) in general agreement with a broadrange of data

For 2AFC z 5 dcentIuml2 so from Equation 3 the connec-tion between dcent and probability correct is

(21)

To establish the connection between the PFs specified inEquations12 and 20ndash21 one must first have the two agreeat threshold For 2AFC with the dcent 5 1 the threshold atP 5 7602 can be enforced by choosing k 5 04795 (seethe discussion following Equation 18C) so that Equa-tions 1 and 12 become

(22)

Modifying k as in Equation 22 leaves the Weibull shapeunchanged and shifts only the definition of threshold Ifb 5 106b w 5 1 039b and a 5 0614 then for allvalues of b the Weibull and dcent functions (Equations 21and 22 vs Equation 12) differ by less than 00004 for allstimulus strengths At very low stimulus strengths b 5b The value b 5 106b is a compromise for getting anoverall optimal fit

Strasburger (2001a) is concerned with the same issueIn his Table 1 he reports dcent logndashlog slopes of [8847

P xtxt( ) = 1 1 5 4795( )

b

P xd x

tt( ) = +

cent( )eacute

eumlecircecirc

5 5erf2

cent( ) =d xx

a a xt

tb

tb w+ 1 + 1( )

var( )( )

YP P

N

dP y

dy

t

t

=( )eacute

eumlecircecirc

12

cent =( )

= =b g b g bthresh

dP x

dx

t

t

e( )log ( )

( ) 12

2347 1

MEASURING THE PSYCHOMETRIC FUNCTION 1433

18379 3131 4421] for b 5 [1 2 35 5] If the dcent slopeis taken to be a measure of the dcent exponent b then theratio bb is [08847 09190 08946 08842] for the fourb values Our value of bb 5 106 differs substantiallyfrom 08 (Pelli 1985) and 088 (Strasburger 2001a) Pelli(1985) and Strasburger (2001a) used dcent functions with a 51 in Equation 20 With a 5 0614 our dcent function (Equa-tion 20) starts saturating near threshold in agreement withexperimental data The saturation lowers the effective valueof b I was very surprised to find that the StromeyerndashFoley dcent function did such a good job in matching theWeibull function across the whole range of b and x Fora long time I had thought a different fit would be neededfor each value of b

For the same Weibull function in a yesno method (falsealarm rate 5 50) the parameters b and w are identicalto the 2AFC case above Only the value of a shifts froma 5 0614 (2AFC) to a 5 054 (yesno) In order to makext 5 1 at dcent 5 1 k becomes 03173 (see Equation 18C)instead of 04795 (2AFC) As with 2AFC the differencebetween the Weibull and dcent function is less than 0004(lt04) for all stimulus strengths and all values of b andfor both a linear and a logarithmic abscissa These valuesmean that for both the yes no and the 2AFC methods thedcent function (Equation 20) corresponding to a Weibullfunction with b 5 2 has a low-contrast logndashlog slope ofb 5 212 and a high-contrast logndashlog slope of 1 w 5078 The high-contrast tvc (test contrast vs pedestal con-trast discrimination function sometimes called the ldquodip-perrdquo function) logndashlog slope would be w 5 022

I also carried out a similar analysis asking what cumu-lative normal s is best matched to the Weibull I founds 5 11720b However the match is poor with errors of4 (rather than lt004 in the preceding analysis) Thepoor fit of the two functions makes the fitting proceduresensitive to the weighting used in the fitting procedure Ifone uses a weighting factor of [P(1 P)N] where P is thePF one will find a quite different value for s than if oneignored the weighting or if one chose the weighting on thebasis of the data rather than the fitting function

II COMPARING EXPERIMENTAL TECHNIQUES FOR MEASURING

THE PSYCHOMETRIC FUNCTION

This section inspired by Linschoten et al (2001) is anexamination of various experimental methods for gather-ing data to estimate thresholds Linschoten et al comparethree 2AFC methods for measuring smell and taste thresh-olds a maximum-likelihood adaptive method an upndashdownstaircase method and an ascending method of limits Thespecial aspect of their experiments was that in measuringsmell and taste each 2AFC trial takes between 40 and60 sec (Lew Harvey personal communication) because ofthe long duration needed for the nostrils or mouth to re-cover their sensitivity The purpose of the Linschoten et alpaper is to examine which of the three methods is mostaccurate when the number of trials is low Linschoten et al

conclude that the 2AFC method is a good method for thistask My prior experience with forced choice tasks and lownumber of trials (Manny amp Klein 1985 McKee et al1985) made me wonder whether there might be better meth-ods to use in circumstances in which the number of trialsis limited This topic is considered in Section II

Compare 2AFC to YesNoThe commonly accepted framework for comparing

2AFC and yesno threshold estimates is signal detectiontheory Initially I will make the common assumption thatthe dcent function is a power function (Equation 20 witha 5 1)

(23)

Later in this section the experimentally more accurateform Equation 20 with a 1 will be used The varianceof the threshold estimate will now be calculated for 2AFCand yesno for the simplest case of trials placed at a sin-gle stimulus level y the goal of most adaptive methodsThe PF slope relates ordinate variance to abscissa vari-ance (from Gourevitch amp Galanter 1967)

(24)

where Y 5 y yt is the threshold estimate for a natural logabscissa as given in Equation 9 This formula for var(Y )is called the asymptotic formula in that it becomes exactin the limit of large number of total trials N Wichmannand Hill (2001b) show that for the method of constantstimuli a proper choice of testing levels brings one to theasymptotic region for N 120 (the smallest value thatthey explored) Improper choice of testing levels requiresa larger N in order to get to the asymptotic region Equa-tion 24 is based on having the data at a single level as oc-curs asymptotically in adaptive procedures In addition thedefinition of threshold must be at the point being testedIf levels are tested away from threshold there will be anadditional contribution to the threshold variance (Finney1971 McKee et al 1985) if the slope is a variable param-eter in the fitting procedure Equation 24 reaches its as-ymptotic value more rapidly for adaptive methods with fo-cused placement of trials than with the constant stimulusmethod when slope is a free parameter

Equation 24 needs analytic expressions for the deriva-tive and the variance of dcent The derivative is obtained fromEquation 23

(25)

The variance of dcent for both yesno (dcent 5 zhit 1 zcorrect rejection)and 2AFC (dcent 5 Iuml2zave) is

(26)var var( ) exp var( ) cent( ) = = ( )d z z P2 4 2p

dddy

bdt

cent = cent

var( )var( )

Yd

dddyt

= cent

centaeligegraveccedil

oumloslashdivide

2

cent = = ( )d c bytb

texp

1434 KLEIN

For the yes no case Equation 26 is based on a unityslope ROC curve with the subjectrsquos criterion at the neg-ative diagonal of the ROC curve where the hit rate equalsthe correct rejection rate This would be the most effi-cient operating point The variance for a nonunity slopeROC curve was considered by Klein Stromeyer andGanz (1974) From binomial statistics

(27)

with n 5 N for 2AFC and n 5 N2 for yesno since forthis binary yesno task the number of signal and blanktrials is half the total number of trials

Putting all of these factors together gives

(28)

for both the yesno and the 2AFC cases The only dif-ference between the two cases is the definition of n andthe definition of dcent (dcent2AFC 5 205z and dcentYN 5 2z for asymmetric criterion) When this is expressed in terms ofN (number of trials) and z (z score of probability cor-rect) we get

(29)

for both 2AFC and yesno That is when the percent cor-rects are all equal (P2AFC 5 Phit 5 Pcorrect rejection) thethreshold variance for 2AFC and yesno are equal Theminimum value of var(Y ) is 164(Nb2) occurring at z 5157 (P 5 94) This value of 94 correct is the optimumvalue at which to test for both 2AFC and yesno tasks in-dependent of b

This result that the threshold variance is the same for2AFC and yesno methods seems to disagree with Figure 4of McKee et al (1985) where the standard errors of thethreshold estimates are plotted for 2AFC and yesno asa function of the number of trials The 2AFC standarderror (their Figure 4) is twice that of yesno correspond-ing to a factor of four in variance However the McKeeet al analysis is misleading since all they (we) did for theyesno task was to use Equation 2 to expand the PF to a 0to 100 range thereby doubling the PF slope Thatrescaling is not the way to convert a 2AFC detection taskto the yesno task of the same detectability A signal de-tection methodology is needed for that conversion aswas done in the present analysis

One surprise is that the optimal testing probability isindependent of the PF slope b This finding is a generalproperty of PFs in which the dependence on the slopeparameter has the form xb The 94 level corresponds todcentYN 5 315 and dcent2AFC 5 223 For the commonly usedP 5 75 (z 5 0765 dcentYN 5 135 dcent2AFC 5 0954)var(Y ) 5 408(Nb2) which is about 25 times the opti-mal value obtained by placing the stimuli at 94 Green(1990) had previously pointed out that the 94 and 92levels were the optimal points to test for the cumulative

normal and logit functions respectively because of theminimum threshold estimate variability when one is test-ing at that point on the PF Other PFs can have optimalpoints at slightly lower levels but these are still above thelevels to which adaptive methods are normally directed

It is also useful to compare the 2AFC and yesno at thesame dcent rather than at their optimal points In that case fordcent values fixed at [1 2 3] the ratio of the 2AFC varianceto the yesno variance is [182 136 079] At low dcent thereis an advantage for the 2AFC method and that reverses athigh dcent as is expected from the optimal dcent being higher foryesno than for 2AFC In general one should run the yesno method at dcent levels above 20 (84 correct in the blankand signal intervals corresponds to dcent 5 2)

The equality of the optimal var(Y ) for 2AFC and yesno methods is not only surprising it seems paradoxicalmdashas can be seen by the following gedankenexperiment Sup-pose subjects close their eyes on the first interval of the2AFC trial and base their judgments on just the secondinterval as if they were doing a yesno task since blanksand stimuli are randomly intermixed Instead of sayingldquoyesrdquo or ldquonordquo they would respond ldquoInterval 2rdquo or ldquoInter-val 1rdquo respectively so that the 2AFC task would become ayesno task The paradox is that one would expect the vari-ance to be worse than for the eyes open 2AFC case sinceone is ignoring information However Equation29 showsthat the 2AFC and yesno methods have identical optimalvariance I suspect that the resolution of this paradox is thatin the yesno method with a stable criterion the optimalvariance occurs at a higher dcent value (315) than the 2AFCvalue (223) The dcent power law function that I chose doesnot saturate at large dcent levels giving a benefit to the yesno method

In order to check whether the paradox still holds witha more realistic PF (a dcent function that saturates at largedcent ) I now compare the 2AFC and yesno variance usinga Weibull function with a slope b In order to connect2AFC and yesno I need to use signal detection theoryand the StromeyerndashFoley dcent function (Equation 20) withthe ldquoWeibull parametersrdquo b 5 106b a 5 0614 and w 51 39b Following Equation 22 I pointed out that withthese parameter values the difference between the 2AFCWeibull function and the 2AFC dcent function (converted toa probability ordinate) is less than 0004 After the de-rivative Equation 25 is appropriately modified we findthat the optimal 2AFC variance is 342(Nb2) For the yesno case the optimal variance is 402(Nb2) This result re-solves the paradox The optimal 2AFC variance is about18 lower than the yesno variance so closing onersquos eyesin one of the intervals does indeed hurt

We shouldnrsquot forget that if one counts stimulus presen-tations Np rather than trials N the optimal 2AFC variancebecomes 684(Npb2) as opposed to 402(Npb2) for yesnoThus for the Linschoten experiments with each stimuluspresentation being slow the yesno methodology is moreaccurate for a given number of stimulus presentationseven at low dcent values Kaernbach (1990) compares 2AFC

var( ) exp( )

( )Y z

P P

N bz= ( )2

122

p

var( ) exp( )

( )Y z

P P

n bd= ( ) cent

412

2p

var( )( )

PP P

n= 1

MEASURING THE PSYCHOMETRIC FUNCTION 1435

and yesno adaptive methods with simulations and ac-tual experiments His abscissa is number of presenta-tions and his results are compatible with our findings

The objective yesno method that I have been discussingis inefficient since 50 of the trials are blanks Greaterefficiency is possible if several points on the PF are mea-sured (method of constant stimuli) with the number of blanktrials the same as at the other levels For example if fivelevels of stimuli were measured including blanks then theblanks would be only 20 of the total rather than 50 Forthe 2AFC paradigm the subject would still have to look at50 of the presentations being blanks The yesno eff i-ciency can be further improved by having the subjectrespond to the multiple stimuli with multiple ratings ratherthan a binary yesno response This rating scale methodof constant stimuli is my favorite method and I have beenusing it for 20 years (Klein amp Stromeyer 1980) The mul-tiple ratings allow one to measure the psychometric func-tion shape for dcents as large as 4 (well into the dipper regionwhere stimuli are slightly above threshold) It also allowsone to measure the ROC slope (presence of multiplicativenoise) Simulations of its benefits will be presented inKlein (2002)

Insight into how the number of responses affects thresh-old variance can be obtained by using the cumulative nor-mal PF (Equation 17) considered by Kaernbach (2001b)and Miller and Ulrich (2001) with g 5 0 Typically this isthe case of yesno discrimination but it can also be 2AFCdiscrimination with an ordinate going from 0 to 100as I have discussed in connection with Figure 3 For a cu-mulative normal PF with a binary response and for a sin-gle stimulus level being tested the threshold variance is(Equation 19) as follows

(30)

from Equation 27 and forthcoming Equations 40 and 41The optimal level to test is at P 5 05 so the optimal vari-ance becomes

(31)

If instead of a binary response the observer gives an ana-log response (many many rating categories) then the vari-ance is the familiar connection between standard deviationand standard error

(32)

With four response categories the variance is 119s2Nwhich is closer to the analog than to the binary case Ifstimuli with multiple strengths are intermixed (methodof constant stimuli) then the benefit of going from a bi-nary response to multiple responses is greater than that in-dicated in Equations 31ndash32 (Klein 2002)

A brief list of other problems with the 2AFC method ap-plied to detection comprises the following

1 2AFC requires the subject to memorize and thencompare the results of two subjective responses It is cog-nitively simpler to report each response as it arrives Sub-jects without much experience can easily make mistakes(lapses) simply becoming confused about the order of thestimuli Klein (1985) discusses memory load issues rele-vant to 2AFC

2 The 2AFC methodology makes various types ofanalysis and modeling more difficult because it requiresthat one average over all criteria The response variable in2AFC is the Interval 2 activation minus the Interval 1 ac-tivation (see the abscissa in Figure 2) This variable goesfrom minus infinity to plus infinity whereas the yesnovariable goes from zero to infinity It therefore seems moredifficult to model the effects of probability summation anduncertainty for 2AFC than for yesno

3 Models that relate psychophysical performance to un-derlying processes require information about how the noisevaries with signal strength (multiplicative noise) The rat-ing scale method of constant stimuli not only measures dcentas a function of contrast it also is able to provide an esti-mate of how the variance of the signal distribution (theROC slope) increases with contrast The 2AFC methodlacks that information

4 Both the yesno and 2AFC methods are supposed toeliminate the effects of bias Earlier I pointed out that in myrecent 2AFC discrimination experiments when the stim-ulus is near threshold I find I have a strong bias in favor ofresponding ldquoStimulus 2rdquo Until I learned to compensatefor this bias (not typically done by naiumlve subjects) my dcentwas underestimated As I have discussed it is possible tocompensate for this bias but that is rarely done

Improving the Brownian (Up ndashDown) StaircaseAdaptive methods with a goal of placing trials at an

optimal point on the PF are efficient Leek (2001) pro-vides a nice overview of these methods There are twogeneral categories of adaptive methods those based onmaximum (or mean) likelihood and those based on sim-pler staircase rules The present section is concernedwith the older staircase methods which I will callBrownian methods I used the name ldquoBrownianrdquo becauseof the upndashdown staircasersquos similarity to one-dimensionalBrownian motion in which the direction of each step israndom The familiar upndashdown staircase is Brownianbecause at each trial the decision about whether to go upor down is decided by a random factor governed by prob-ability P(x) I want to make this connection to Brownianmotion because I suspect that there are results in thephysics literature that are relevant to our psychophysicalstaircases I hope these connections get made

A Brownian staircase has a minimal memory of the runhistory it needs to remember the previous step (includingdirection) and the previous number of reversals or numberof trials (used for when to stop and for when to decreasestep size) A typical Brownian rule is to decrease signalstrength by one step (like 1 dB) after a correct responseand to increase it three steps after an incorrect response (a

var( thresh) = s 2

N

var( thresh) = timesp s2

2

N

var(var( )

( )exp thresh) =aeligegraveccedil

oumloslashdivide

= ( )P

N dPdy

P P zN2

22

2 1p s

1436 KLEIN

one downndashthree up rule) I was pleased to learn recentlythat his rule works nicely for an objective yesno task(Kaernbach 1990) as well as 2AFC as discussed earlier

In recent years likelihood methods (Pentland 1980Watson amp Pelli 1983) have become popular and havesomewhat displaced Brownian methods There is a wide-spread feeling that likelihood methods have substantiallysmaller threshold variance than do staircases in whichthresholds are obtained by simply averaging the levelstested Given the importance of this issue for informingresearchers about how best to measure thresholds it issurprising that there have been so few simulations com-paring staircases and optimal methods Pentlandrsquos (1980)results showing very large threshold variance for a stair-case method are so dramatically different from the re-sults of my simulations that I am skeptical The best pre-vious simulations of which I am aware are those of Green(1990) who compared the threshold variance from sev-eral 2AFC staircase methods with the ideal variance(Equation 24) He found that as long as the staircase ruleplaced trials above 80 correct the ratio of observed toideal threshold variance was about 14 This value is thesquare of the inverse of Greenrsquos finding that the ratio ofthe expected to the observed sigma is about 85

In my own simulations (Klein 2002) I found that aslong as the final step size was less than 02s the thresh-old variance from simply averaging over levels was closeto the optimal value Thus the staircase threshold varianceis about the same as the variance found by other optimalmethods One especially interesting finding was that thecommon method of averaging an even number of rever-sal points gave a variance that was about 10 higher thanaveraging over all levels This result made me wonderhow the common practice of averaging reversals ever gotstarted Since Green (1990) averaged over just the reversalpoints I suspect that his results are an overestimate of thestaircase variance In addition since Green specifies stepsize in decibels it is difficult for me to determine whetherhis final step size was small enough Since the thresholdvariance is inversely related to slope it is important to re-late step size to slope rather than to decibels The most nat-ural reference unit is s the standard deviation associatedwith the cumulative normal In summary I suggest that thisgeneral feeling of distrust of the simple Brownian staircaseis misguided and that simple staircases are often as goodas likelihood methods In the section Summary of Advan-tages of Brownian Staircase I point out several advantagesof the simple staircase over the likelihood methods

A qualification should be made to the claim above thataveraging levels of a simple staircase is as good as a fancylikelihood method If the staircase method uses a too smallstep size at the beginning and if the starting point is farfrom threshold then the simple staircase can be inefficientin reaching the threshold region However at the end ofthat run the astute experimenter should notice the largediscrepancy between the starting point and the thresholdand the run should be redone with an improved startingpoint or with a larger initial step size that will be reducedas the staircase proceeds

Increase number of 2AFC response categories In-dependent of the type of adaptive method used the 2AFCtask has a flaw whereby a series of lucky guesses at lowstimulus levels can erroneously drive adaptive methods tolevels that are too low Some adaptive methods with a smallnumber of trials are unable to fully recover from a string oflucky guesses early in the run One solution is to allow thesubject to have more than two responses For reasons thatI have never understood people like to have only two re-sponse categories in a 2AFC task Just as I am reluctant totake part in 2AFC experiments I am even more reluctantto take part in two response experiments The reason issimple I always feel that I have within me more than onebit of information after looking at a stimulus I usually haveat least four or more subjective states (2 bits) for a yesnotask HN LN LY HY where H and L are high and lowconfidence and N and Y are did not and did see it For a2AFC my internal states are H1 L1 L2 H2 for high andlow confidence on Intervals 1 and 2 Kaernbach (2001a) inthis special issue does an excellent job of justifying thelow-confidence response category He provides solid ar-guments simulations and experimental data on how alow-confidence category can minimize the ldquolucky guessrdquoproblem

Kaernbach (2001a) groups the L1 and L2 categoriestogether into a ldquodonrsquot knowrdquo category and argues per-suasively that the inclusion of this third response can in-crease the precision and accuracy of threshold estimateswhile also leaving the subject a happier person He callsit the ldquounforced choicerdquo method Most of his article con-cerns the understandable fear of introducing a responsebias exactly what 2AFC was invented to avoid He has de-veloped an intelligent rule for what stimulus level shouldfollow a ldquodonrsquot knowrdquo response Kaernbach shows withboth simulations and real data that with his method thispotential response bias problem has only a second-ordereffect on threshold estimates (similar to the 2AFC inter-val bias discussed around Equation 7) and its advantagesoutweigh the disadvantages I urge anyone using the 2AFCmethod to read it I conjecture that Kaernbachrsquos method isespecially useful in trimming the lower tail of the distri-bution of threshold estimates and also in reducing the in-terval bias of 2AFC tasks since the low confidence ratingwould tend to be used for trials on which there is the mostuncertainty

Although Kaernbach has managed to convince me ofthe usefulness of his three-response method he might havetrouble convincing others because of the response biasquestion I should like to offer a solution that might quellthe response bias fears Rather than using 2AFC with threeresponses one can increase the number of responses tofourmdashH1 L1 L2 H2mdashas discussed two paragraphsabove The L rating can be used when subjects feel theyare guessing

To illustrate how this staircase would work considerthe case in which we would like the 2AFC staircase to con-verge to about 84 correct A 5 upndash1 down staircase(5 levels up for a wrong response and 1 level down for acorrect response) leads to an average probability correct

MEASURING THE PSYCHOMETRIC FUNCTION 1437

of 56 5 8333 If one had zero information about thetrial and guessed then one would be correct half the timeand the staircase would shift an average of (5 1)2 512 levels Thus in his scheme with a single ldquodonrsquot knowrdquoresponse Kaernbach would have the staircase move up2 levels following a ldquodonrsquot knowrdquo response One mightworry that continued use of the ldquodonrsquot knowrdquo responsewould continuously increase the level so that one wouldget an upward bias But Kaernbach points out that theldquodonrsquot knowrdquo response replaces incorrect responses to agreater extent than it replaces correct responses so thatthe 12 level shift is actually a more negative shift than the15 shift associated with a wrong response For my sug-gestion of four responses the step shifts would be ndash1 1113 and 15 levels for responses HC LC LI and HIwhere H and L stand for high and low confidence and Cand I stand for correct and incorrect A slightly more con-servative set of responses would be ndash1 0 14 15 inwhich case the low-confidence correct judgment leaves thelevel unchanged In all these examples including Kaern-bachrsquos three-response method the low-confidence re-sponse has an average level shift of 12 when one is guess-ing The most important feature of the low confidencerating is that when one is below threshold giving a lowconfidence rating will prevent one from long strings oflucky guesses that bring one to an inappropriately lowlevel This is the situation that can be difficult to overcomeif it occurs in the early phase of a staircase when the stepsize can be large The use of the confidence rating wouldbe especially important for short runs where the extra bitof information from the subject is useful for minimizingerroneous estimates of threshold

Another reason not to be fearful of the multiple-response2AFC method is that after the run is finished one can es-timate threshold by using a maximum likelihood proce-dure rather than by averaging the run locations Standardlikelihood analysis would have to be modified for Kaern-bachrsquos three-response method in order to deal with theldquodonrsquot knowrdquo category For the suggested four-responsemethod one could simply ignore the confidence ratings forthe likelihood analysis My simulations discussed earlierin this section and Greenrsquos (1990) reveal that averagingover the levels (excluding a number of initial levels toavoid the starting point bias) has about as good a preci-sion and accuracy as the best of the likelihood methodsIf the threshold estimates from the likelihood method andfrom the averaging levels method disagree that run couldbe thrown out

Summary of advantages of Brownian staircaseSince there is no substantial variance advantage to thefancier likelihood methods one can pay more attention toa number of advantages associated with simple staircases

1 At the beginning of a run likelihood methods mayhave too little inertia (the sequential stimulus levels jumparound easily) unless a ldquostiff rdquo prior is carefully chosenBy inertia I refer to how difficult it is for the next level todeviate from the present level At the beginning of a runone would like to familiarize the subject with the stimu-

lus by presenting a series of stimuli that gradually increasein difficulty This natural feature of the Brownian stair-case is usually missing in QUEST-like methods Theslightly inefficient first few trials of a simple staircasemay actually be a benefit since it gives the subject somepractice with slightly visible stimuli

2 Toward the end of a maximum likelihood run one canhave the opposite problem of excessive inertia wherebyit is difficult to have substantial shifts of test contrastThis can be a problem if nonstationary effects are presentFor example thresholds can increase in mid-run owingto fatigue or to adaptation if a suprathreshold pedestal ormask is present In an old-fashioned Brownian staircasewith moderate inertia a quick look at the staircase trackshows the change of threshold in mid-run That observa-tion can provide an important clue to possible problemswith the current procedures A method that reveals non-stationarity is especially important for difficult subjectssuch as infants or patients in whom fatigue or inatten-tion can be an important factor

3 Several researchers have told me that the simplicity ofthe staircase rules is itself an advantage It makes writingprograms easier It is nicer for teaching students It alsosimplifies the uncertain business of guessing likelihoodpriors and communicating the methodology when writ-ing articles

Methods That Can Be Used to Estimate Slope as Well as Threshold

There are good reasons why one might want to learnmore about the PF than just the single threshold valueEstimating the PF slope is the first step in this directionTo estimate slope one must test the PF at more than onepoint Methods are available to do this (Kaernbach 2001bKontsevich amp Tyler 1999) and one could even adapt sim-ple Brownian staircase procedures to give better slope es-timates with minimum harm to threshold estimates (seethe comments in Strasburger 2001b)

Probably the best method for getting both thresholds andslopes is the Y (Psi) method of Kontsevich and Tyler(1999) The Y method is not totally novel it borrows tech-niques from many previous researchersmdashas Kontsevichand Tyler point out in a delightful paragraph that both clar-ifies their method and states its ancestry This paragraphsummarizes what is probably the most accurate and preciseapproach to estimating thresholds and slope so I reproduceit here

To summarize the Y method is a combination of solutionsknown from the literature The method updates the poste-rior probability distribution across the sampled space ofthe psychometric functions based on Bayesrsquo rule (Hall1968 Watson amp Pelli 1983) The space of the psychome-tric functions is two-dimensional (King-Smith amp Rose1997 Watson amp Pelli 1983) Evaluation of the psycho-metric function is based on computing the mean of theposterior probability distribution (Emerson 1986 King-Smith et al 1994) The termination rule is based on thenumber of trials as the most practical option (King-Smith

1438 KLEIN

et al 1994 Watson amp Pelli 1983) The placement of eachnew trial is based on one-step ahead minimum search(King-Smith 1984) of the expected entropy cost function(Pelli 1987)

A simplified rough description of the Y trial placementfor 2AFC runs is that initially trials are placed at about the90 correct level to establish a solid threshold Oncethreshold has been roughly established trials are placed atabout the 90 and the 70 levels to pin down the slopeThe precise levels depend on the assumed PF shape

Before one starts measuring slopes however one mustfirst be convinced that knowledge of slope can be impor-tant It is so for several reasons

1 Parametric fitting procedures either require knowl-edge of slope b or must have data that adequately con-strain the slope estimate Part of the discussion that fol-lows will be on how to obtain reliable threshold estimateswhen slope is not well known A tight knowledge of slopecan minimize the error of the threshold estimate

2 Strasburger (2001a 2001b) gives good argumentsfor the desirability of knowing the shape of the PF He isinterested in differences between foveal and peripheralvisual processing and he uses the PF slope as an indica-tion of differences

3 Slopes are different for different stimuli and thiscan lead to misleading results if slope is ignored It mayhelp to describe my experience with the Modelfest pro-ject (Carney et al 2000) to make this issue concreteThis continuing project involves a dozen laboratoriesaround the world We have measured detection thresh-olds for a wide variety of visual stimuli The data areused for developing an improved model of spatial visionSome of the stimuli were simple easily memorized lowspatial frequency patterns that tend to have shallowerslopes because a stimulus-known-exactly template canbe used efficiently and with minimal uncertainty On theother hand tiny Modelfest stimuli can have a good dealof spatial uncertainty and are therefore expected to pro-duce PFs with steeper slopes Because of the slope changethe pattern of thresholds will depend on the level atwhich threshold is measured When this rich dataset isanalyzed with f ilter models that make estimates formechanism bandwidths and mechanism pooling thesedifferent slope-dependent threshold patterns will lead todifferent estimates for the model parameters Several ofus in the Modelfest data gathering group argued that weshould use a data-gathering methodology that wouldallow estimation of PF slope of each of the 45 stimuliHowever I must admit that when I was a subject my pa-tience wore thin and I too chose the quicker runs Al-though I fully understand the attraction of short runs fo-cused on thresholds and ignoring slope I still think thereare good reasons for spreading trials out For one thinginterspersing trials that are slightly visible helps the sub-ject maintain memory of the test pattern

4 Slopes can directly resolve differences between mod-els In my own research for example my colleagues and

I are able to use the PF slope to distinguish long-range fa-cilitation that produces a low-level excitation or noise re-duction (typical with a cross-orientation surround) from ahigher level uncertainty reduction (typical with a same-orientation surround)

Throwing-Out Rules Goodness-of-FitThere are occasions when a staircase goes sour Most

often this happens when the subjectrsquos sensitivity changesin the middle of a run possibly because of fatigue It istherefore good to have a well-specified rule that allowsnonstationary runs to be thrown out The throwing-out rulecould be based on whether the PF slope estimate is toolow or too high Alternatively a standard approach is tolook at the chi-square goodness-of-f it statistic whichwould involve an assumption about the shape of the un-derlying PF I have a commentary in Section III regard-ing biases in the chi-square metric that were found byWichmann and Hill (2001a) Biases make it difficult touse standard chi-square tables The trouble with this ap-proach of basing the goodness-of-fit just on the cumulateddata is that it ignores the sequential history of the run TheBrownian upndashdown staircase makes the sequential historyvividly visible with a plot of sequentially tested levelsWhen fatigue or adaptation sets in one sees the tested lev-els shift from one stable point to another Leek Hanna andMarshall (1991) developed a method of assessing the non-stationarity of a psychometric function over the course ofits measurement using two interleaved tracks EarlierHall (1983) also suggested a technique to address the sameproblem I encourage further research into the develop-ment of a non-stationarity statistic that can be used to dis-card bad runs This statistic would take the sequential his-tory into account as well as the PF shape and the chi-square

The development of a ldquothrowing-outrdquo rule is the ex-treme case of the notion of weighted averaging One im-portant reason for developing good estimators for good-ness-of-fit and good estimators for threshold variance isto be able to optimally combine thresholds from repeatedexperiments I shall return to this issue in connectionwith the Wichmann and Hill (2001a) analysis of bias ingoodness-of-fit metrics

Final Thought The Method Should Depend on the Situation

Also important is that the method chosen should be ap-propriate for the task For example suppose one is usingwell-experienced observers for testing and retesting sim-ilar conditions in which one is looking for small changesin threshold or one is interested in the PF shape In suchcases one has a good estimate of the expected thresholdand the signal detection method of constant stimuli withblanks and with ratings might be best On the other handfor clinical testing in which one is highly uncertain aboutan individualrsquos threshold and criterion stability a 2AFClikelihood or staircase method (with a confidence rating)might be best

MEASURING THE PSYCHOMETRIC FUNCTION 1439

III DATA-FITTING METHODS FOR ESTIMATING THRESHOLD AND SLOPEPLUS GOODNESS-OF-FIT ESTIMATION

This final section of the paper is concerned with whatto do with the data after they have been collected Wehave here the standard Bayesian situation Given the PFit is easy to generate samples of data with confidence in-tervals But we have the inverse situation in which we aregiven the data and need to estimate properties of the PFand get confidence limits for those estimates The pair ofarticles in this special issue by Wichmann and Hill (2001a2001b) as well as the articles by King-Smith et al (1994)Treutwein (1995) and Treutwein and Strasburger (1999)do an outstanding job of introducing the issues and providemany important insights For example Emerson (1986)and King-Smith et al emphasize the advantages of meanlikelihood over maximum likelihood The speed of mod-ern computers makes it just as easy to calculate meanlikelihood as it is to calculate maximum likelihood As an-other example Treutwein and Strasburger (1999) demon-strate the power of using Bayesian priors for estimatingparameters by showing how one can estimate all four pa-rameters of Equation 1A in fewer than 100 trials This im-portant finding deserves further examination

In this section I will focus on four items First I will dis-cuss the relative merits of parametric versus nonparamet-ric methods for estimating the PF properties The paperby Miller and Ulrich (2001) provides a detailed analysis ofa nonparametric method for obtaining all the moments ofthe PF I will discuss some advantages and limitations oftheir method Second is the question of goodness-of-fitWichmann and Hill (2001a) show that when the numberof trials is small the goodness-of-fit can be very biasedI will examine the cause of that bias Third is the issue dis-cussed by Wichmann and Hill (2001a) that of how lapseseffect slope estimates This topic is relevant to Strasburg-errsquos (2001b) data and analysis Finally there is the possiblycontroversial topic of an upward slope bias in adaptivemethods that is relevant to the papers of Kaernbach (2001b)and Strasburger (2001b)

Parametric Versus Nonparametric FitsThe main split between methods for estimating pa-

rameters of the psychometric function is the distinctionbetween parametric and nonparametric methods In aparametric method one uses a PF that involves severalparameters and uses the data to estimate the parametersFor example one might know the shape of the PF and es-timate just the threshold parameter McKee et al (1985)show how slope uncertainty affects threshold uncertaintyin probit analysis The two papers by Wichmann and Hill(2001a 2001b) are an excellent introduction to many is-sues in parametric fitting An example of a nonparamet-ric method for estimating threshold would be to averagethe levels of a Brownian staircase after excluding the firstfour or so reversals in order to avoid the starting level biasBefore I served as guest editor of this special symposium

issue I believed that if one had prior knowledge of the exactPF shape a parametric method for estimating threshold wasprobably better than nonparametric methods After read-ing the articles presented here as well as carrying out myrecent simulations I no longer believe this I think thatin the proper situation nonparametric methods are as goodas parametric methods Furthermore there are situationsin which the shape of the PF is unknown and nonparamet-ric methods can be better

Miller and Ulrichrsquos Article on the Nonparametric SpearmanndashKaumlrber Method of Analysis

The standard way to extract information from psycho-metric data is to assume a functional shape such as a Wei-bull cumulative normal logit dcent and then vary the pa-rameters until likelihood is maximized or chi-square isminimized Miller and Ulrich (2001) propose using theSpearmanndashKaumlrber (SK) nonparametric method whichdoes not require a functional shape They start with a prob-ability density function defined by them as the derivativeof the PF

(33)

where s(i) is the stimulus intensity at the ith level andP(i) is the probability correct at the that level The notationpdf(i) is meant to indicate that it is the pdf for the intervalbetween i ndash 1 and i Miller and Ulrich give the r th momentof this distribution as their Equation 3

(34)

The next four equations are my attempt to clarify their ap-proach I do this because when applicable it is a finemethod that deserves more attention Equation 34 prob-ably came from the mean value theorem of calculus

(35)

where f (s) 5 sr11 and f cent(smid) 5 (r 1 1)smidr is the deriv-

ative of f with respect to s somewhere within the intervalbetween s(i 1) and s(i) For a slowly varying functionsmid would be near the midpoint Given Equation35 Equa-tion 34 becomes

(36)

where Ds 5 s(i) s(i 1) Equation 36 is what one wouldmean by the rth moment of the distribution The case r 50 is important since it gives the area under the pdf

(37)

by using the definition in Equation 33 The summation inEquation 37 gives the area under the pdf For it to be aproper pdf its area must be unity which means that theprobability at the highest level P(imax) must be unity andthe probability at the lowest level Pmin must be zero asMiller and Ulrich (2001) point out

m0 = =aringpdf( ) ( ) ( ) max mini s P i P ii

D

mrr

i

i s s= aring pdf mid( ) D

f s i f s i f s s i s i( ) ( ) ( ) ( ) ( ) [ ] [ ] = cent [ ]1 1mid

mrr r

ri s i s i=

+ [ ]aring + +11

11 1pdf( ) ( ) ( )

pdf( )( ) ( )

( ) ( )i

P i P i

s i s i= 1

1

1440 KLEIN

The next simplest SK moment is for r 5 1

(38)

from Equation 33 This first moment provides an esti-mate of the centroid of the distribution corresponding tothe center of the psychometric function It could be usedas an estimate of threshold for detection or PSE for dis-crimination An alternate estimator would be the medianof the pdf corresponding to the 50 point of the PF (theproper definition of PSE for discrimination)

Miller and Ulrich (2001) use Monte Carlo simulationsto claim that the SK method does a better job of extract-ing properties of the psychometric function than do para-metric methods One must be a bit careful here since intheir parametric fitting they generate the data by usingone PF and then fit it with other PFs But still it is a sur-prising claim if this simple method is so good why havepeople not been using it for years There are two possibleanswers to this question One is that people never thoughtof using it before Miller and Ulrich The other possibil-ity is that it has problems I have concluded that the an-swer is a combination of both factors I think that there isan important place for the SK approach but its limitationsmust be understood That is my next topic

The most important limitation of the SK nonparamet-ric method is that it has only been shown to work well formethod of constant stimulus data in which the PF goesfrom 0 to 1 I question whether it can be applied with thesame confidence to 2AFC and to adaptive procedures withunequal numbers of trials at each level The reason for thislimitation is that the SK method does not offer a simpleway to weight the different samples Let us consider twosituations with unequal weighting adaptive methods and2AFC detection tasks

Adaptive methods place trials near threshold but withvery uneven distribution of number of trials at differentlevels The SK method gives equal weighting to sparselypopulated levels that can be highly variable thus con-tributing to a greater variance of parameter estimates Letus illustrate this point for an extreme case with five levelss 5 [1 2 3 4 5] Suppose the adaptive method placed [11 1 98 1] trials at the five levels and the number correctwere [0 1 1 49 1] giving probabilities [0 1 1 5 1] andpdf 5 [1 0 5 5] The following PEST-like (Taylor ampCreelman 1967) rules could lead to this unusual dataMove down a level if the presently tested level has greaterthan P1 correct move up three levels if the presentlytested level has less than P2 correct P1 can be anynumber less than 33 and P2 can be any number greaterthan 67 The example data would be produced by start-ing at Level 5 and getting four in a row correct followedby an error followed by a wide set possible of alternat-ing responses for which the PEST rules would leave thefourth level as the level to be tested The SK estimate of the

center of the PF is given by Equation 38 to be micro1 5 200Since a pdf with a negative probability is distasteful tosome one can monotonize the PF according to the proce-dure of Miller and Ulrich (which does include the numberof trials at each level) The new probabilities are [0 5151 51 1] with pdf 5 [51 0 0 49] The new estimate ofthreshold is micro1 5 297 However given the data with 4998correct at Level 4 an estimate that includes a weightingaccording to the number of trials at each level would placethe centroid of the distribution very close to micro1 5 4 Thuswe see that because the SK procedure ignores the numberof trials at each level it produces erroneous estimates ofthe PF properties This example with shifting thresholdswas chosen to dramatize the problems of unequal weight-ing and the effect of monotonizing

A similar problem can occur even with the method ofconstant stimuli with equal number of trials at each levelbut with unequal binomial weighting as occurs in 2AFCdata The unequal weighting occurs because of the 50lower asymptote The binomial statistics error bar of dataat 55 correct is much larger than for data at 95 cor-rect Parametric procedures such as probit analysis giveless weighting to the lower data This is why the most ef-ficient placement of trials in 2AFC is above 90 correctas I have discussed in Section II Thus it is expected thatfor 2AFC the SK method would give threshold estimatesless precise than the estimates given by a parametricmethod that includes weighting

I originally thought that the SK method would fail on alltypes of 2AFC data However in the discussion followingEquation 7 I pointed out that for 2AFC discriminationtasks there is a way of extending the data to the 0ndash100range that produces weightings similar to those for a yesnotask Rather than use the standard Equation 2 to deal withthe 50 lower asymptote one should use the method dis-cussed in connection with Figure 3 in which one can usefor the ordinate the percent of correct responses in Inter-val 2 and use the signal strength in Interval 2 minus thatin Interval 1 for the abscissa That procedure produces aPF that goes from 0 to 100 thereby making it suit-able for SK analysis while also removing a possible bias

Now let us examine the SK method as applied to datafor which it is expected to work such as discriminationPFs that go from 0 to 100 These are tasks in which thelocation of the psychometric function is the PSE and theslope is the threshold

A potentially important claim by Miller and Ulrich(2001) is that the SK method performs better than probitanalysis This is a surprising claim since their data aregenerated by a probit PF one would expect probit analy-sis to do well especially since probit analysis does espe-cially well with discrimination PFs that go from 0 to 100To help clarify this issue Miller and I did a number ofsimulations We focused on the case of N 5 40 with 8 tri-als at each of 5 levels The cumulative normal PF (Equa-tion 14) was used with equally spaced z scores and with thelowest and highest probabilities being 1 and 99 Theprobabilities at the sample points are P(i) 5 1 1222

m1

2 21

2

11

2

=

= [ ] +

aring

aring

pdf( )( ) ( )

( ) ( )( ) ( )

is i s i

P i P is i s i

MEASURING THE PSYCHOMETRIC FUNCTION 1441

50 8778 and 99 corresponding to z scores ofz(i) 5 ndash2328 1164 0 1164 and 2328 Nonmonot-onicity is eliminated as Miller and Ulrich suggest (seethe Appendix for Matlab Code 4 for monotonizing) I didMonte Carlo simulations to obtain the standard deviationof the location of the PF micro1 given by Equation38 I foundthat both the SK method and the parametric probit method(assuming a fixed unity slope) gave a standard deviationof between 028 and 029 Miller and Ulrichrsquos result re-ported in their Table 17 gives a standard error of 0071However their Gaussian had a standard deviation of 025whereas my z-score units correspond to a standard devi-ation of unity Thus their standard error must be multipliedby 4 giving a standard error of 0284 in excellent agree-ment with my value So at least in this instance the SKmethod did not do better than the parametric method (withslope fixed) and the surprise mentioned at the beginningof this paragraph did not materialize I did not examineother examples where nonparametric methods might dobetter than the matched parametric method However thesimulations of McKee et al (1985) give some insight intowhy probit analysis with floating slope can do poorly evenon data generated from probit PFs McKee et al showthat when the number of trials is low and if stimulus lev-els near both 0 and 100 are not tested the slope can bevery flat and the predicted threshold can be quite distantIf care is not taken to place a bound on these probit esti-mates one would expect to find deviant threshold esti-mates The SK method on the other hand has built-inbounds for threshold estimates so outliers are avoidedThese bounds could explain why SK can do better thanprobit When the PF slope is uncertain nonparametricmethods might work better than parametric methods adecided advantage for the SK method

It is instructive to also do an analytic calculation of thevariance based on an analysis similar to what was doneearlier If only the ith level of the five had been testedthe variance of the PF location (the PSE in a discrimina-tion task) would have been as follows (Gourevitch ampGalanter 1967)

(39)

where var(P) is given by Equation 27 and PF slope is

(40)

where for the cumulative normal dzi dxi 5 1s where sis the standard deviation of the pdf and

(41)

from Equation 3A By combining these equations one gets

(42)

Note that if all trials were placed at the optimal stimuluslocation of z 5 0 (P 5 5) Equation 42 would becomes2p 2Ni as discussed earlier When all five levels arecombined the total variance is smaller than each individ-ual variance as given by the variance summation formula

(43)

Finally upon plugging in the five values of zi rangingfrom ndash2328 to 12328 and Pi ranging from 1 to 99the standard error of the PF location is

(44)

This value is precisely the value obtained in our MonteCarlo simulations of the nonparametric SK method andalso by the slope fixed parametric probit analysis Thistriple confirmation shows that the SK method is excel-lent in the realm where it applies (equal number of trialsat levels that span 0 to 100) It also confirms our sim-ple analytic formula

The data presented in Miller and Ulrichrsquos Table 17 giveus a fine opportunity to ask about the efficiency of themethod of constant stimuli that they use For the N 5 40example discussed here the variance is

(45)

This value is more than twice the variance of p (2 Ntot)that is obtained for the same PF using adaptive methodsincluding the simple upndashdown staircase that place trialsnear the optimal point

The large variance for the Miller and Ulrich stimulusplacement is not surprising given that of the five stimu-lus levels the two at P 5 1 and 99 are quite inefficientThe inefficient placement of trials is a common problemfor the method of constant stimuli as opposed to adaptivemethods However as I mentioned earlier the method ofconstant stimuli combined with multiple response ratingsin a signal detection framework may be the best of allmethods for detection studies for revealing properties of thesystem near threshold while maintaining a high efficiency

A curious question comes up about the optimal placementof trials for measuring the location of the psychometric func-tion With the original function the optimal placement is atthe steepest part of the function (at P 5 0) However withthe SK method when one is determining the location ofthe pdf one might think that the optimal placement of sam-ples should occur at the points where the pdf is steepestThis contradiction is resolved when one realizes that onlythe original samples are independent The pdf samples areobtained by taking differences of adjacent PF values soneighboring samples are anticorrelated Thus one cannot

var tottot

= =SEN

2 3 24

SE = =var tot12 0 2844

var var tot = aring1 1i

var( )exp

PSE i i i

i

i

P Pz

N= ( ) ( )

2 12

2

ps

dP

dz

zi

i

iaeligegraveccedil

oumloslashdivide

=( )2 2

2

exp

p

dP

dx

dP

dz

dz

dxi

i

i

i

i

i

= times

var( )var

PSE i

i

i

i

P

dP

dx

=( )

aeligegraveccedil

oumloslashdivide

2

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

1428 KLEIN

distinctions The last column classifies the articles accord-ing to the adaptive versus constant stimuli distinction Inorder to open up the full variety of PFs for discussion andto enable a deeper understanding of the relationship amongdifferent types of PFs I will now clarify the interrelation-ship of the various categories of PFs and their connectionto the signal detection approach

Lack of adaptive method for yesno tasks with con-trolled false alarm rate In Section II I will bring up anumber of advantages of the yesno task in comparisonwith to the 2AFC task (see also Kaernbach 1990) Giventhose yesno advantages one may wonder why the 2AFCmethod is so popular The usual answer is that 2AFC hasno response bias However as has been discussed the yesno method allows an unbiased dcent to be calculated and the2AFC method does allow an interval bias that affects dcentAnother reason for the prevalence of 2AFC experimentsis that a multitude of adaptive methods are available for2AFC but barely any available for an objective yesnotask in which the false alarm rate is measured so that dcentcan be calculated In Table 1 with one exception the rowswith adaptive methods are associated with the forcedchoice method The exception is Leek (2001) who dis-cusses adaptive yesno methods in which no blank trialsare presented In that case the false alarm rate is notmeasured so dcent cannot be calculated This type of yesnomethod does not belong in the ldquoobjectiverdquo category of con-cern for the present paper The ldquo1990rdquo entries in Table 1refer to Kaernbachrsquos (1990) description of a staircasemethod for an objective yesno task in which an equalnumber of blanks are intermixed with the signal trialsSince Kaernbach could have used that method for Kaern-bach (2001b) I placed the ldquo1990rdquo entry in his slot

Kaernbachrsquos (1990) yesno staircase rules are simpleThe signal level changes according to a rule such as thefollowing Move down one level for correct responses ahit ltyes|signalgt or a correct rejection ltno|blankgt moveup three levels for wrong responses a miss ltno|signalgtor a false alarm ltyes|blankgt This rule similar to the onedownndashthree up rule used in 2AFC places the trials so thatthe average of the hit rate and correct rejection rate is 75

What is now needed is a mechanism to get the observer toestablish an optimal criterion that equalizes the number ofldquoyesrdquo and ldquonordquo responses (the ROC negative diagonal)This situation is identical to the problem of getting 2AFCobservers to equalize the number of responses to Inter-vals 1 and 2 The quadratic bias in dcent is the same in bothcases The simplest way to get subjects to equalize theirresponses is to give them feedback about any bias in theirresponses With equal ldquoyesrdquo and ldquonordquo responses the 75correct corresponds to a z score of 0674 and a dcent 5 2z 51349 If the subject does not have equal numbers of ldquoyesrdquoand ldquonordquo responses then the dcent would be calculated bydcent 5 zhit zfalse alarm I do hope that Kaernbachrsquos clever yesno objective staircase will be explored by others Oneshould be able to enhance it with ratings and multiple stim-uli (Klein 2002)

Forced choice detection As can be seen in the secondcolumn of Table 1 a popular category in this specialissue is the forced choice method for detection Thismethod is used by Strasburger (2001b) Kaernbach(2001a) Linschoten et al (2001) and Wichmann and Hill(2001a 2001b) These researchers use PFs based on prob-ability ordinates The connection between P and dcent is dif-ferent from the yesno case given by Equation 5 For2AFC signal detection theory provides a simple connec-tion between dcent and probability correct dcent(x) 5 Iuml2 z(x)In the preceding section I discussed the option of averag-ing the probabilities and then taking the z score (highthreshold approach) or averaging the z scores and then cal-culating the probability (signal detection approach) Thesignal detection method is better because it has a strongerempirical basis and avoids bias

For an m-AFC task with m 2 the connection betweendcent and P is more complicated than it is for m 5 2 The con-nection given by a fairly simple integral (Green amp Swets1966 Macmillan amp Creelman 1991) has been tabulatedby Hacker and Ratcliff (1979) and by Macmillan and Creel-man (1991) One problem with these tables is that they arebased on the assumption of unity ROC slope It is knownthat there are many cases in which the ROC slope is notunity (Green amp Swets 1966) so these tables connecting

Table 1Classification of the 10 Articles in This Special Issue

According to Three Distinctions Forced Choice Versus YesNo Detection Versus Discrimination Adaptive Versus Constant Stimuli

Detection Discrimination Adaptive or Source m-AFC YesNo m-AFC YesNo Constant Stimuli

Leek (2001) General loose g 3 loose g AWichmann amp Hill (2001a) 2AFC (3) (3) (3) CWichmann amp Hill (2001b) 2AFC (3) (3) (3) CLinschoten et al (2001) 2AFC AStrasburger (2001a) General 3 3 CStrasburger (2001b) 10AFC AKaernbach (2001a) General 3 AKaernbach (2001b) General (1990) 3 (1990) AMiller amp Ulrich (2001) for g 5 0 3 (for 0ndash100) 3 CKlein (present) General 3 3 3 Both

MEASURING THE PSYCHOMETRIC FUNCTION 1429

dcent and probability correct should be treated cautiouslyW P Banks and I (unpublished) investigated this topic andfound that near the P 5 50 correct point the dependenceof dcent on ROC slope is minimal Away from the 50 pointthe dependence can be strong

YesNo detection Linschoten et al (2001) comparethree methods (limits staircase 2AFC likelihood) for mea-suring thresholds with a small number of trials In themethod of limits one starts with a subthreshold stimulusand gradually increases the strength On each trial the ob-server says ldquoyesrdquo or ldquonordquo with respect to whether or not thestimulus is detected Although this is a classic yesno de-tection method it will not be discussed in this article be-cause there is no control of response bias That is blankswere not intermixed with signals

In Table 1 the Wichmann and Hill (2001a 2001b) arti-cles are marked with 3s in parentheses because althoughthese authors write only about 2AFC they mention that alltheir methods are equally applicable for yesno or m-AFCof any type (detectiondiscrimination) And their Matlabimplementations are fully general All the special issue ar-ticles reporting experimental data used a forced choicemethod This bias in favor of the forced choice methodol-ogy is found not only in this special issue it is widespreadin the psychophysics community Given the advantages ofthe yesno method (see discussion in Section II) I hopethat once yesno adaptive methods are accepted they willbecome the method of choice

m-AFC discrimination It is common practice to rep-resent 2AFC results as a plot of percent correct averagedover all trials versus stimulus strength This PF goes from50 to 100 The asymmetry between the lower andupper asymptotes introduces some inefficiency in thresh-old estimation as will be discussed Another problem withthe standard 50 to 100 plot is that a bias in choice ofinterval will produce an underestimate of dcent as has beenshown earlier Researchers often use the 2AFC method be-cause they believe that it avoids biased threshold estimatesIt is therefore surprising that the relatively simple correc-tion for 2AFC bias discussed earlier is rarely done

The issue of bias in m-AFC also occurs when m 2 InStrasburgerrsquos 10AFC letter discrimination task it is com-mon for subjects to have biases for responding with par-ticular letters when guessing Any imbalance in the re-sponse bias for different letters will result in a reductionof dcent as it did in 2AFC

There are several benefits of viewing the 2AFC dis-crimination data in terms of a PF going from 0 to 100Not only does it provide a simple way of viewing and cal-culating the interval bias it also enables new methods forestimating the PF parameters such as those proposed byMiller and Ulrich (2001) as will be discussed in Sec-tion III In my original comments on the Miller and Ulrichpaper I pointed out that because their nonparametric pro-cedure has uniform weighting of the different PF levelstheir method does not apply to 2AFC tasks The asymmet-ric binomial error bars near the 50 and 100 levelscause the uniform weighting of the Miller and Ulrich ap-

proach to be nonoptimal However I now realize that the2AFC discrimination task can be fit by a cumulative nor-mal going from 0 to 100 Because of that insight Ihave marked the discrimination forced choice column ofTable 1 for the Miller and Ulrich (2001) paper with theproviso that the PF goes from 0 to 100 Owing to thepopularity of 2AFC this modification greatly expands therelevance of their nonparametric approach

YesNo discrimination Kaernbach (2001b) and Millerand Ulrich (2001) offer theoretical articles that examineproperties of PFs that go from P 5 0 to 100 (P 5 pin Equation 1) In both cases the PF is the cumulative nor-mal (Equation 3) Although these PFs with g 5 0 couldbe for a yesno detection task with a zero false alarm rate(not plausible) or a forced choice detection task with an in-finite number of alternatives (not plausible either) I sus-pect that the authors had in mind a yesno discriminationtask (Table 1 col 5) A typical discrimination task in vi-sion is contrast discrimination in which the observer re-sponds to whether the presented contrast is greater than orless than a memorized reference Feedback reinforces thestability of the reference In a typical discrimination taskthe reference is one exemplar from a continuum of stimu-lus strengths If the reference is at a special zero pointrather than being an element of a smooth continuum thetask is no longer a simple discrimination task Zero con-trast would be an example of a special reference Klein(1985) discusses several examples which illustrate how anatural zero can complicate the analysis One might won-der how to connect the PF from the detection regime inwhich the reference is zero contrast to the discriminationregime in which the reference (pedestal) is at a high con-trast The dcent function to be introduced in Equation 20 doesa reasonably good job of fitting data across the full range ofpedestal strength going from detection to discrimination

The connection of the discrimination PF to the signaldetection PF is the same as that for yesno detection givenin Equation 5 dcent(x) 5 z(x) z(0) where z is the z scoreof the probability of a ldquogreater thanrdquo judgment In a dis-crimination task the bias z(0) is the z score of the prob-ability of saying ldquogreaterrdquo when the reference stimulus ispresented The stimulus strength x that gives z(x) 5 0 isthe point of subjective equality (x 5 PSE) If the cumu-lative normal PF of Equation 3 is used then the z scoreis linearly proportional to the stimulus strength z(x) 5(x PSE)threshold where threshold is defined to be thepoint at which dcent 5 1 I will come back to these distinc-tions between detection and discrimination PFs after pre-senting more groundwork regarding thresholds log ab-scissas and PF shapes (see Equations 14 and 17)

Definition of ThresholdThreshold is often defined as the stimulus strength that

produces a probability correct halfway up the PF If humansoperated according to a high-threshold assumption thisdefinition of threshold would be stable across different ex-perimental methods However as I discussed followingEquation 2 high-threshold theory has been discredited

1430 KLEIN

According to the more successful signal detection theory(Green amp Swets 1966) the dcent at the midpoint of the PFchanges according to the number of alternatives in aforced choice method and according to the false alarm ratein a yesno method This variability of dcent with method is agood reason not to define threshold as the halfway pointof the PF

A definition of threshold that is relatively independent ofthe method used for its measurement is to define thresholdas the stimulus strength that gives a fixed value of dcent Thestimulus strength that gives dcent 5 1 (76 correct for 2AFC)is a common definition of threshold Although I will showthat higher dcent levels have the advantage of giving more pre-cise threshold estimates unless otherwise stated I will takethreshold to be at dcent 5 1 for simplicity This definition ap-plies to both yesno and m-AFC tasks and to both detectionand discrimination tasks

As an example consider the case shown in Figure 1where the lower asymptote (false alarm rate) in a yesnodetection task is g 5 1587 corresponding to a z score ofz 5 ndash1 If threshold is defined to be at dcent 5 10 then fromEquation 5 the z score for threshold is z 5 0 correspond-ing to a hit rate of 50 (not quite halfway up the PF) Thisexample with a 50 hit rate corresponds to defining dcentalong the horizontal ROC axis If threshold had been de-fined to be dcent 5 2 in Figure 1 then the probability correctat threshold would be 8413 This example in which boththe hit rate and correct rejection rate are equal (both are8413) corresponds to the ROC negative diagonal

Strasburgerrsquos Suggestion on Specifying Slope A Logarithmic Abscissa

In many of the articles in this special issue a logarithmicabscissa such as decibels is used Many shapes of PFs havebeen used with a log abscissa Most delightful but frus-trating is Strasburgerrsquos (2001a) paper on the PF maximumslope He compares the Weibull logistic Quick cumula-tive normal hyperbolic tangent and signal detection dcentusing a logarithmic abscissa The present section is the out-come of my struggles with a number of issues raised byStrasburger (2001a) and my attempt to clarify them

I will typically express stimulus strength x in thresh-old units

(8)

where a is the threshold Stimulus strength will be ex-pressed in natural logarithmic units y as well as in lin-ear units x

(9)

where y 5 loge(x) is the natural log of the stimulus and Y5 loge(a) is the threshold on the log abscissa

The slope of the psychometric function P( yt) with a log-arithmic abscissa is

and the maximum slope is called bcent by Strasburger (2001a)A very different definition of slope is sometimes used bypsychophysicists the logndashlog slope of the dcent function aslope that is constant at low stimulus strengths for manyPFs The logndashlog dcent slope of the Weibull function is shownin the bottom pair of panels in Figure 1 The low-contrastlogndashlog slope is b for a Weibull PF (Equation 12) and bfor a dcent PF (Equation 20) Strasburger (2001a) shows howthe maximum slope using a probability ordinate is con-nected to the logndashlog slope using a dcent ordinate

A frustrating aspect of Strasburgerrsquos article is that theslope units of P (probability correct per loge) are not fa-miliar Then it dawned on me that there is a simple con-nection between slope with a loge abscissa slopelog 5[dP( yt)](dyt) and slope with a linear abscissa slopelin 5[dP(xt)](dxt) namely

(11)

because dyt dxt 5 [d loge(xt)] (dxt) 5 1xt At thresholdxt 5 1 ( yt 5 0) so at that point Strasburgerrsquos slope with alogarithmic axis is identical to my familiar slope plotted inthreshold units on a linear axis The simple connection be-tween slope on the log and linear abscissas converted me tobeing a strong supporter of using a natural log abscissa

The Weibull and cumulative normal psychometricfunctions To provide a background for Strasburgerrsquos ar-ticle I will discuss three PFs Weibull cumulative nor-mal and dcent as well as their close connections A com-mon parameterization of the PF is given by the Weibullfunction

(12)

where pweib(xt) the PF that goes from 0 to 100 is re-lated to Pweib(xt) the probability of a correct responseby Equation 1 b is the slope and k controls the definitionof threshold pweib(1) 5 1 k is the percent correct atthreshold (xt 5 1) One reason for the Weibullrsquos popular-ity is that it does a good job of fitting actual data In termsof logarithmic units the Weibull function (Equation 12) be-comes

(13)

Panel a of Figure 1 is a plot of Equation 12 (Weibull func-tion as a function of x on a linear abscissa) for the case b 52 and k 5 exp( 1) 5 0368 The choice b 5 2 makes theWeibull an upside-down Gaussian Panel d is the samefunction this time plotted as a function of y correspondingto a natural log abscissa With this choice of k the point ofmaximum slope as a function of yt is the threshold point( yt 5 0) The point of maximum slope at yt 5 0 is markedwith an asterisk in panel d The same point in panel a atxt 5 1 is not the point of maximum slope on a linear ab-scissa because of Equation 11 When plotted as a dcent func-tion [a z-score transform of P(xt)] in panel b the Weibullaccelerates below threshold and decelerates above thresh-

p y ktyt

weib( ) = ( )[ ]1exp

b

p x ktxt

weib ( ) = 1b

slopeslope

lin =( )

=( )

times =dP x

dx

dP y

dy

dy

dx xt

t

t

t

t

t t

log

slope y dPdy

dpdy

tt

t

( ) =

= ( )1 g

y x x y Yt t= ( ) = =log log e e a

x xt = a

(10A)

(10B)

MEASURING THE PSYCHOMETRIC FUNCTION 1431

old in agreement with a wide range of experimental dataThe acceleration and deceleration are most clearly seen bythe slope of dcent in the logndashlog coordinates of panel e Thelogndashlog dcent slope is plotted in panels c and f At xt 5 0 thelogndashlog slope is 2 corresponding to our choice of b 5 2The slope falls surprisingly rapidly as stimulus strength in-creases and the slope is near 1 at xt 5 2 If we had chosenb 5 4 for the Weibull function then the logndashlog dcent slopewould have gone from 4 at xt 5 0 to near 2 at xt 5 2Equation 20 will present a dcent function that captures thisbehavior

Another PF commonly used is based on the cumulativenormal (Equation 3)

(14A)

or

(14B)

where s is the standard error of the underlying Gaussianfunction (its connection to b will be clarified later) Notethat the cumulative normal PF in Equation 14 should beused only with a logarithmic abscissa because y goes to

yen needed for the 0 lower asymptote whereas theWeibull function can be used with both the linear (Equa-tion 12) and the log (Equation 13) abscissa

Two examples of Strasburgerrsquos maximum slope (hisEquations 7 and 17) are

(15)

for the Weibull function (Equation 13) and

(16)

for the cumulative normal (Equation 14) In Equation 15 Iset k in Equation 12 to be k 5 exp( 1) This choiceamounts to defining threshold so that the maximum slopeoccurs at threshold ( yt 5 0) Note that Equations 12ndash16are missing the (1 g) factor that are present in Strasburg-errsquos Equations 7 and 17 because we are dealing with prather than P Equations 15 and 16 provide a clear simpleand close connection between the slopes of the Weibull andcumulative normal functions so I am grateful to Stras-burger for that insight

An example might help in showing how Equation 16works Consider a 2AFC task (g 5 05) assuming a cumu-lative normal PF with s 5 10 According to Equation 16bcent 5 0399 At threshold (assumed for now to be the pointof maximum slope) P(xt 5 1) 5 75 At 10 abovethreshold P(xt 5 11) 75 1 01 bcent 5 07899 which isquite close to the exact value of 7896 This example showshow bcent defined with a natural log abscissa yt is relevantto a linear abscissa xt

Now that the logarithmic abscissa has been introducedthis is a good place to stop and point out that the log ab-scissa is fine for detection but not for discrimination since

the x 5 0 point is often in the middle of the range How-ever the cumulative normal PF is especially relevant to dis-crimination where the PF goes from 0 to 100 andwould be written as

(17A)

or

(17B)

where x 5 PSE is the point of subjective equality Theparameter a is the threshold since dcent(a) 5 zF(a)zF(0) 5 1 The threshold a is also s standard deviationsof the Gaussian probability density function (pdf ) Thereciprocal of a is the PF slope I tend to use the letter sfor a unitless standard deviation as occurs for the loga-rithmic variable y I use a for a standard deviation thathas the units of the stimulus x (like percent contrast) Thecomparison of Equation 14B for detection and Equation17B for discrimination is useful Although Equation 14is used in most of the articles in this special issue whichare concerned with detection tasks the techniques that I willbe discussing are also relevant to discrimination tasks forwhich Equation 17 is used

Threshold and the Weibull FunctionTo illustrate how a dcent 5 1 definition of threshold works

for a Weibull function let us start with a yesno task inwhich the false alarm rate is P(0) 5 1587 correspondingto a z score of zFA 5 1 as in Figure 1 The z score atthreshold (dcent 5 1) is zTh 5 zFA11 5 0 corresponding to aprobability of P(1) 5 50 From Equations 1 and 12 thek value for this definition of threshold is given by k 5[1 P(1)][1 P(0)] 5 05943 The Weibull function be-comes

(18A)

If I had defined dcent 5 2 to be threshold then zTh 5 zFA125 1 leading to k 5 (1 08413)(1 01587) 5 01886giving

(18B)

As another example suppose the false alarm rate in ayesno task is at 50 not uncommon in a signal detectionexperiment with blanks and test stimuli intermixed Thenthreshold at dcent 5 1 would occur at 8413 and the PFwould be

(18C)

with Pweib(0) 5 50 and Pweib(1) 5 8413 This casecorresponds to defining dcent on the ROC vertical intercept

For 2AFC the connection between dcent and z is z 5 dcent205Thus zTh 5 2 05 5 07071 corresponding to Pweib(1) 57602 leading to k 5 04795 in Equation 12 These con-nections will be clarified when an explicit form (Equation20)is given for the dcent function dcent(xt)

P xtxt

weib ( ) = 1 0 5 0 3173 b

P xtxt

weib ( ) = 1 1 0 1587 0 1886( ) b

P xtxt

weib ( ) = 1 1 0 1587 0 5943( ) b

z x xtF ( ) = PSE

a

p x P x xt tF F F( ) = ( ) = aelig

egraveoumloslash

PSEa

cent = =bs p s

12

0 399

cent = =b b bexp( ) 1 0 368

z yy y Y

tt( ) = =s s

p yy

tt

F F( ) =aeligegraveccedil

oumloslashdivides

1432 KLEIN

Complications With Strasburgerrsquos Advocacy of Maximum Slope

Here I will mention three complications implicated inStrasburgerrsquos suggestion of defining slope at the point ofmaximum slope on a logarithmic abscissa

1 As can be seen in Equation 11 the slopes on linearand logarithmic abscissas are related by a factor of 1xtBecause of this extra factor the maximum slope occursat a lower stimulus strength on linear as compared withlog axes This makes the notion of maximum slope lessfundamental However since we are usually interested inthe percent error of threshold estimates it turns out thata logarithmic abscissa is the most relevant axis support-ing Strasburgerrsquos log abscissa definition

2 The maximum slope is not necessarily at threshold(as Strasburger points out) For the Weibull functions de-fined in Equation 13 with k 5 exp( 1) and the cumula-tive normal function in Equation 14 the maximum slope(on a log axis) does occur at threshold However the twopoints are decoupled in the generalized Weibull functiondefined in Equation 12 For the Quick version of theWeibull function (Equation 13 with k 5 05 placingthreshold halfway up the PF) the threshold is below thepoint of maximum slope the derivative of P(xt) atthreshold is

which is slightly different from the maximum slope asgiven by Equation 15 Similarly when threshold is de-fined at dcent 5 1 the threshold is not at the point of max-imum slope People who fit psychometric functions wouldprobably prefer reporting slopes at the detection thresh-old rather than at the point of maximum slope

3 One of the most important considerations in selectinga point at which to measure threshold is the question ofhow to minimize the bias and standard error of the thresh-old estimate The goal of adaptive methods is to place tri-als at one level In order to avoid a threshold bias due to aimproper estimate of PF slope the test level should be atthe defined threshold The variance of the threshold esti-mate when the data are concentrated as a single level (aswith adaptive procedures) is given by Gourevitch and Gal-anter (1967) as

(19)

If the binomial error factor of P(1 P)N were not presentthe optimal placement of trials for measuring thresholdwould be at the point of maximum slope However thepresence of the P(1 P) factor shifts the optimal point toa higher level Wichmann and Hill (2001b) consider howtrial placement affects threshold variance for the methodof constant stimuli This topic will be considered in Sec-tion II

Connecting the Weibull and dcent FunctionsFigures 5ndash7 of Strasburger (2001a) compare the logndash

log dcent slope b with the Weibull PF slope b Strasburgerrsquosconnection of b 5 088b is problematic since the dcent ver-sion of the Weibull PF does not have a fixed logndashlog slopeAn improved dcent representation of the Weibull function isthe topic of the present section A useful parameterizationfor dcent which I shall refer to as the StromeyerndashFoley func-tion was introduced by Stromeyer and Klein (1974) andused extensively by Foley (1994)

(20)

where xt is the stimulus strength in threshold units as inEquation 8 The factors with a in the denominator are pres-ent so that dcent equals unity at threshold (xt 5 1 or x 5 a)At low xt dcent x t

ba The exponent b (the logndashlog slope ofdcent at very low contrasts) controls the amount of facilita-tion near threshold At high xt Equation 20 becomes dcentx t

1 w (1 a) The parameter w is the logndash log slope of thetest threshold versus pedestal contrast function (the tvc orldquodipperrdquo function) at strong pedestals One must be a bitcautious however because in yesno procedures the tvcslope can be decoupled from w if the signal detection ROCcurve has nonunity slope (Stromeyer amp Klein 1974) Theparameter a controls the point at which the dcent function be-gins to saturate Typical values of these unitless parametersare b 5 2 w 5 05 and a 5 06 (Yu Klein amp Levi 2001)The function in Equation 20 accelerates at low contrast(logndashlog slope of b) and decelerates at high contrasts (logndashlog slope of 1 w) in general agreement with a broadrange of data

For 2AFC z 5 dcentIuml2 so from Equation 3 the connec-tion between dcent and probability correct is

(21)

To establish the connection between the PFs specified inEquations12 and 20ndash21 one must first have the two agreeat threshold For 2AFC with the dcent 5 1 the threshold atP 5 7602 can be enforced by choosing k 5 04795 (seethe discussion following Equation 18C) so that Equa-tions 1 and 12 become

(22)

Modifying k as in Equation 22 leaves the Weibull shapeunchanged and shifts only the definition of threshold Ifb 5 106b w 5 1 039b and a 5 0614 then for allvalues of b the Weibull and dcent functions (Equations 21and 22 vs Equation 12) differ by less than 00004 for allstimulus strengths At very low stimulus strengths b 5b The value b 5 106b is a compromise for getting anoverall optimal fit

Strasburger (2001a) is concerned with the same issueIn his Table 1 he reports dcent logndashlog slopes of [8847

P xtxt( ) = 1 1 5 4795( )

b

P xd x

tt( ) = +

cent( )eacute

eumlecircecirc

5 5erf2

cent( ) =d xx

a a xt

tb

tb w+ 1 + 1( )

var( )( )

YP P

N

dP y

dy

t

t

=( )eacute

eumlecircecirc

12

cent =( )

= =b g b g bthresh

dP x

dx

t

t

e( )log ( )

( ) 12

2347 1

MEASURING THE PSYCHOMETRIC FUNCTION 1433

18379 3131 4421] for b 5 [1 2 35 5] If the dcent slopeis taken to be a measure of the dcent exponent b then theratio bb is [08847 09190 08946 08842] for the fourb values Our value of bb 5 106 differs substantiallyfrom 08 (Pelli 1985) and 088 (Strasburger 2001a) Pelli(1985) and Strasburger (2001a) used dcent functions with a 51 in Equation 20 With a 5 0614 our dcent function (Equa-tion 20) starts saturating near threshold in agreement withexperimental data The saturation lowers the effective valueof b I was very surprised to find that the StromeyerndashFoley dcent function did such a good job in matching theWeibull function across the whole range of b and x Fora long time I had thought a different fit would be neededfor each value of b

For the same Weibull function in a yesno method (falsealarm rate 5 50) the parameters b and w are identicalto the 2AFC case above Only the value of a shifts froma 5 0614 (2AFC) to a 5 054 (yesno) In order to makext 5 1 at dcent 5 1 k becomes 03173 (see Equation 18C)instead of 04795 (2AFC) As with 2AFC the differencebetween the Weibull and dcent function is less than 0004(lt04) for all stimulus strengths and all values of b andfor both a linear and a logarithmic abscissa These valuesmean that for both the yes no and the 2AFC methods thedcent function (Equation 20) corresponding to a Weibullfunction with b 5 2 has a low-contrast logndashlog slope ofb 5 212 and a high-contrast logndashlog slope of 1 w 5078 The high-contrast tvc (test contrast vs pedestal con-trast discrimination function sometimes called the ldquodip-perrdquo function) logndashlog slope would be w 5 022

I also carried out a similar analysis asking what cumu-lative normal s is best matched to the Weibull I founds 5 11720b However the match is poor with errors of4 (rather than lt004 in the preceding analysis) Thepoor fit of the two functions makes the fitting proceduresensitive to the weighting used in the fitting procedure Ifone uses a weighting factor of [P(1 P)N] where P is thePF one will find a quite different value for s than if oneignored the weighting or if one chose the weighting on thebasis of the data rather than the fitting function

II COMPARING EXPERIMENTAL TECHNIQUES FOR MEASURING

THE PSYCHOMETRIC FUNCTION

This section inspired by Linschoten et al (2001) is anexamination of various experimental methods for gather-ing data to estimate thresholds Linschoten et al comparethree 2AFC methods for measuring smell and taste thresh-olds a maximum-likelihood adaptive method an upndashdownstaircase method and an ascending method of limits Thespecial aspect of their experiments was that in measuringsmell and taste each 2AFC trial takes between 40 and60 sec (Lew Harvey personal communication) because ofthe long duration needed for the nostrils or mouth to re-cover their sensitivity The purpose of the Linschoten et alpaper is to examine which of the three methods is mostaccurate when the number of trials is low Linschoten et al

conclude that the 2AFC method is a good method for thistask My prior experience with forced choice tasks and lownumber of trials (Manny amp Klein 1985 McKee et al1985) made me wonder whether there might be better meth-ods to use in circumstances in which the number of trialsis limited This topic is considered in Section II

Compare 2AFC to YesNoThe commonly accepted framework for comparing

2AFC and yesno threshold estimates is signal detectiontheory Initially I will make the common assumption thatthe dcent function is a power function (Equation 20 witha 5 1)

(23)

Later in this section the experimentally more accurateform Equation 20 with a 1 will be used The varianceof the threshold estimate will now be calculated for 2AFCand yesno for the simplest case of trials placed at a sin-gle stimulus level y the goal of most adaptive methodsThe PF slope relates ordinate variance to abscissa vari-ance (from Gourevitch amp Galanter 1967)

(24)

where Y 5 y yt is the threshold estimate for a natural logabscissa as given in Equation 9 This formula for var(Y )is called the asymptotic formula in that it becomes exactin the limit of large number of total trials N Wichmannand Hill (2001b) show that for the method of constantstimuli a proper choice of testing levels brings one to theasymptotic region for N 120 (the smallest value thatthey explored) Improper choice of testing levels requiresa larger N in order to get to the asymptotic region Equa-tion 24 is based on having the data at a single level as oc-curs asymptotically in adaptive procedures In addition thedefinition of threshold must be at the point being testedIf levels are tested away from threshold there will be anadditional contribution to the threshold variance (Finney1971 McKee et al 1985) if the slope is a variable param-eter in the fitting procedure Equation 24 reaches its as-ymptotic value more rapidly for adaptive methods with fo-cused placement of trials than with the constant stimulusmethod when slope is a free parameter

Equation 24 needs analytic expressions for the deriva-tive and the variance of dcent The derivative is obtained fromEquation 23

(25)

The variance of dcent for both yesno (dcent 5 zhit 1 zcorrect rejection)and 2AFC (dcent 5 Iuml2zave) is

(26)var var( ) exp var( ) cent( ) = = ( )d z z P2 4 2p

dddy

bdt

cent = cent

var( )var( )

Yd

dddyt

= cent

centaeligegraveccedil

oumloslashdivide

2

cent = = ( )d c bytb

texp

1434 KLEIN

For the yes no case Equation 26 is based on a unityslope ROC curve with the subjectrsquos criterion at the neg-ative diagonal of the ROC curve where the hit rate equalsthe correct rejection rate This would be the most effi-cient operating point The variance for a nonunity slopeROC curve was considered by Klein Stromeyer andGanz (1974) From binomial statistics

(27)

with n 5 N for 2AFC and n 5 N2 for yesno since forthis binary yesno task the number of signal and blanktrials is half the total number of trials

Putting all of these factors together gives

(28)

for both the yesno and the 2AFC cases The only dif-ference between the two cases is the definition of n andthe definition of dcent (dcent2AFC 5 205z and dcentYN 5 2z for asymmetric criterion) When this is expressed in terms ofN (number of trials) and z (z score of probability cor-rect) we get

(29)

for both 2AFC and yesno That is when the percent cor-rects are all equal (P2AFC 5 Phit 5 Pcorrect rejection) thethreshold variance for 2AFC and yesno are equal Theminimum value of var(Y ) is 164(Nb2) occurring at z 5157 (P 5 94) This value of 94 correct is the optimumvalue at which to test for both 2AFC and yesno tasks in-dependent of b

This result that the threshold variance is the same for2AFC and yesno methods seems to disagree with Figure 4of McKee et al (1985) where the standard errors of thethreshold estimates are plotted for 2AFC and yesno asa function of the number of trials The 2AFC standarderror (their Figure 4) is twice that of yesno correspond-ing to a factor of four in variance However the McKeeet al analysis is misleading since all they (we) did for theyesno task was to use Equation 2 to expand the PF to a 0to 100 range thereby doubling the PF slope Thatrescaling is not the way to convert a 2AFC detection taskto the yesno task of the same detectability A signal de-tection methodology is needed for that conversion aswas done in the present analysis

One surprise is that the optimal testing probability isindependent of the PF slope b This finding is a generalproperty of PFs in which the dependence on the slopeparameter has the form xb The 94 level corresponds todcentYN 5 315 and dcent2AFC 5 223 For the commonly usedP 5 75 (z 5 0765 dcentYN 5 135 dcent2AFC 5 0954)var(Y ) 5 408(Nb2) which is about 25 times the opti-mal value obtained by placing the stimuli at 94 Green(1990) had previously pointed out that the 94 and 92levels were the optimal points to test for the cumulative

normal and logit functions respectively because of theminimum threshold estimate variability when one is test-ing at that point on the PF Other PFs can have optimalpoints at slightly lower levels but these are still above thelevels to which adaptive methods are normally directed

It is also useful to compare the 2AFC and yesno at thesame dcent rather than at their optimal points In that case fordcent values fixed at [1 2 3] the ratio of the 2AFC varianceto the yesno variance is [182 136 079] At low dcent thereis an advantage for the 2AFC method and that reverses athigh dcent as is expected from the optimal dcent being higher foryesno than for 2AFC In general one should run the yesno method at dcent levels above 20 (84 correct in the blankand signal intervals corresponds to dcent 5 2)

The equality of the optimal var(Y ) for 2AFC and yesno methods is not only surprising it seems paradoxicalmdashas can be seen by the following gedankenexperiment Sup-pose subjects close their eyes on the first interval of the2AFC trial and base their judgments on just the secondinterval as if they were doing a yesno task since blanksand stimuli are randomly intermixed Instead of sayingldquoyesrdquo or ldquonordquo they would respond ldquoInterval 2rdquo or ldquoInter-val 1rdquo respectively so that the 2AFC task would become ayesno task The paradox is that one would expect the vari-ance to be worse than for the eyes open 2AFC case sinceone is ignoring information However Equation29 showsthat the 2AFC and yesno methods have identical optimalvariance I suspect that the resolution of this paradox is thatin the yesno method with a stable criterion the optimalvariance occurs at a higher dcent value (315) than the 2AFCvalue (223) The dcent power law function that I chose doesnot saturate at large dcent levels giving a benefit to the yesno method

In order to check whether the paradox still holds witha more realistic PF (a dcent function that saturates at largedcent ) I now compare the 2AFC and yesno variance usinga Weibull function with a slope b In order to connect2AFC and yesno I need to use signal detection theoryand the StromeyerndashFoley dcent function (Equation 20) withthe ldquoWeibull parametersrdquo b 5 106b a 5 0614 and w 51 39b Following Equation 22 I pointed out that withthese parameter values the difference between the 2AFCWeibull function and the 2AFC dcent function (converted toa probability ordinate) is less than 0004 After the de-rivative Equation 25 is appropriately modified we findthat the optimal 2AFC variance is 342(Nb2) For the yesno case the optimal variance is 402(Nb2) This result re-solves the paradox The optimal 2AFC variance is about18 lower than the yesno variance so closing onersquos eyesin one of the intervals does indeed hurt

We shouldnrsquot forget that if one counts stimulus presen-tations Np rather than trials N the optimal 2AFC variancebecomes 684(Npb2) as opposed to 402(Npb2) for yesnoThus for the Linschoten experiments with each stimuluspresentation being slow the yesno methodology is moreaccurate for a given number of stimulus presentationseven at low dcent values Kaernbach (1990) compares 2AFC

var( ) exp( )

( )Y z

P P

N bz= ( )2

122

p

var( ) exp( )

( )Y z

P P

n bd= ( ) cent

412

2p

var( )( )

PP P

n= 1

MEASURING THE PSYCHOMETRIC FUNCTION 1435

and yesno adaptive methods with simulations and ac-tual experiments His abscissa is number of presenta-tions and his results are compatible with our findings

The objective yesno method that I have been discussingis inefficient since 50 of the trials are blanks Greaterefficiency is possible if several points on the PF are mea-sured (method of constant stimuli) with the number of blanktrials the same as at the other levels For example if fivelevels of stimuli were measured including blanks then theblanks would be only 20 of the total rather than 50 Forthe 2AFC paradigm the subject would still have to look at50 of the presentations being blanks The yesno eff i-ciency can be further improved by having the subjectrespond to the multiple stimuli with multiple ratings ratherthan a binary yesno response This rating scale methodof constant stimuli is my favorite method and I have beenusing it for 20 years (Klein amp Stromeyer 1980) The mul-tiple ratings allow one to measure the psychometric func-tion shape for dcents as large as 4 (well into the dipper regionwhere stimuli are slightly above threshold) It also allowsone to measure the ROC slope (presence of multiplicativenoise) Simulations of its benefits will be presented inKlein (2002)

Insight into how the number of responses affects thresh-old variance can be obtained by using the cumulative nor-mal PF (Equation 17) considered by Kaernbach (2001b)and Miller and Ulrich (2001) with g 5 0 Typically this isthe case of yesno discrimination but it can also be 2AFCdiscrimination with an ordinate going from 0 to 100as I have discussed in connection with Figure 3 For a cu-mulative normal PF with a binary response and for a sin-gle stimulus level being tested the threshold variance is(Equation 19) as follows

(30)

from Equation 27 and forthcoming Equations 40 and 41The optimal level to test is at P 5 05 so the optimal vari-ance becomes

(31)

If instead of a binary response the observer gives an ana-log response (many many rating categories) then the vari-ance is the familiar connection between standard deviationand standard error

(32)

With four response categories the variance is 119s2Nwhich is closer to the analog than to the binary case Ifstimuli with multiple strengths are intermixed (methodof constant stimuli) then the benefit of going from a bi-nary response to multiple responses is greater than that in-dicated in Equations 31ndash32 (Klein 2002)

A brief list of other problems with the 2AFC method ap-plied to detection comprises the following

1 2AFC requires the subject to memorize and thencompare the results of two subjective responses It is cog-nitively simpler to report each response as it arrives Sub-jects without much experience can easily make mistakes(lapses) simply becoming confused about the order of thestimuli Klein (1985) discusses memory load issues rele-vant to 2AFC

2 The 2AFC methodology makes various types ofanalysis and modeling more difficult because it requiresthat one average over all criteria The response variable in2AFC is the Interval 2 activation minus the Interval 1 ac-tivation (see the abscissa in Figure 2) This variable goesfrom minus infinity to plus infinity whereas the yesnovariable goes from zero to infinity It therefore seems moredifficult to model the effects of probability summation anduncertainty for 2AFC than for yesno

3 Models that relate psychophysical performance to un-derlying processes require information about how the noisevaries with signal strength (multiplicative noise) The rat-ing scale method of constant stimuli not only measures dcentas a function of contrast it also is able to provide an esti-mate of how the variance of the signal distribution (theROC slope) increases with contrast The 2AFC methodlacks that information

4 Both the yesno and 2AFC methods are supposed toeliminate the effects of bias Earlier I pointed out that in myrecent 2AFC discrimination experiments when the stim-ulus is near threshold I find I have a strong bias in favor ofresponding ldquoStimulus 2rdquo Until I learned to compensatefor this bias (not typically done by naiumlve subjects) my dcentwas underestimated As I have discussed it is possible tocompensate for this bias but that is rarely done

Improving the Brownian (Up ndashDown) StaircaseAdaptive methods with a goal of placing trials at an

optimal point on the PF are efficient Leek (2001) pro-vides a nice overview of these methods There are twogeneral categories of adaptive methods those based onmaximum (or mean) likelihood and those based on sim-pler staircase rules The present section is concernedwith the older staircase methods which I will callBrownian methods I used the name ldquoBrownianrdquo becauseof the upndashdown staircasersquos similarity to one-dimensionalBrownian motion in which the direction of each step israndom The familiar upndashdown staircase is Brownianbecause at each trial the decision about whether to go upor down is decided by a random factor governed by prob-ability P(x) I want to make this connection to Brownianmotion because I suspect that there are results in thephysics literature that are relevant to our psychophysicalstaircases I hope these connections get made

A Brownian staircase has a minimal memory of the runhistory it needs to remember the previous step (includingdirection) and the previous number of reversals or numberof trials (used for when to stop and for when to decreasestep size) A typical Brownian rule is to decrease signalstrength by one step (like 1 dB) after a correct responseand to increase it three steps after an incorrect response (a

var( thresh) = s 2

N

var( thresh) = timesp s2

2

N

var(var( )

( )exp thresh) =aeligegraveccedil

oumloslashdivide

= ( )P

N dPdy

P P zN2

22

2 1p s

1436 KLEIN

one downndashthree up rule) I was pleased to learn recentlythat his rule works nicely for an objective yesno task(Kaernbach 1990) as well as 2AFC as discussed earlier

In recent years likelihood methods (Pentland 1980Watson amp Pelli 1983) have become popular and havesomewhat displaced Brownian methods There is a wide-spread feeling that likelihood methods have substantiallysmaller threshold variance than do staircases in whichthresholds are obtained by simply averaging the levelstested Given the importance of this issue for informingresearchers about how best to measure thresholds it issurprising that there have been so few simulations com-paring staircases and optimal methods Pentlandrsquos (1980)results showing very large threshold variance for a stair-case method are so dramatically different from the re-sults of my simulations that I am skeptical The best pre-vious simulations of which I am aware are those of Green(1990) who compared the threshold variance from sev-eral 2AFC staircase methods with the ideal variance(Equation 24) He found that as long as the staircase ruleplaced trials above 80 correct the ratio of observed toideal threshold variance was about 14 This value is thesquare of the inverse of Greenrsquos finding that the ratio ofthe expected to the observed sigma is about 85

In my own simulations (Klein 2002) I found that aslong as the final step size was less than 02s the thresh-old variance from simply averaging over levels was closeto the optimal value Thus the staircase threshold varianceis about the same as the variance found by other optimalmethods One especially interesting finding was that thecommon method of averaging an even number of rever-sal points gave a variance that was about 10 higher thanaveraging over all levels This result made me wonderhow the common practice of averaging reversals ever gotstarted Since Green (1990) averaged over just the reversalpoints I suspect that his results are an overestimate of thestaircase variance In addition since Green specifies stepsize in decibels it is difficult for me to determine whetherhis final step size was small enough Since the thresholdvariance is inversely related to slope it is important to re-late step size to slope rather than to decibels The most nat-ural reference unit is s the standard deviation associatedwith the cumulative normal In summary I suggest that thisgeneral feeling of distrust of the simple Brownian staircaseis misguided and that simple staircases are often as goodas likelihood methods In the section Summary of Advan-tages of Brownian Staircase I point out several advantagesof the simple staircase over the likelihood methods

A qualification should be made to the claim above thataveraging levels of a simple staircase is as good as a fancylikelihood method If the staircase method uses a too smallstep size at the beginning and if the starting point is farfrom threshold then the simple staircase can be inefficientin reaching the threshold region However at the end ofthat run the astute experimenter should notice the largediscrepancy between the starting point and the thresholdand the run should be redone with an improved startingpoint or with a larger initial step size that will be reducedas the staircase proceeds

Increase number of 2AFC response categories In-dependent of the type of adaptive method used the 2AFCtask has a flaw whereby a series of lucky guesses at lowstimulus levels can erroneously drive adaptive methods tolevels that are too low Some adaptive methods with a smallnumber of trials are unable to fully recover from a string oflucky guesses early in the run One solution is to allow thesubject to have more than two responses For reasons thatI have never understood people like to have only two re-sponse categories in a 2AFC task Just as I am reluctant totake part in 2AFC experiments I am even more reluctantto take part in two response experiments The reason issimple I always feel that I have within me more than onebit of information after looking at a stimulus I usually haveat least four or more subjective states (2 bits) for a yesnotask HN LN LY HY where H and L are high and lowconfidence and N and Y are did not and did see it For a2AFC my internal states are H1 L1 L2 H2 for high andlow confidence on Intervals 1 and 2 Kaernbach (2001a) inthis special issue does an excellent job of justifying thelow-confidence response category He provides solid ar-guments simulations and experimental data on how alow-confidence category can minimize the ldquolucky guessrdquoproblem

Kaernbach (2001a) groups the L1 and L2 categoriestogether into a ldquodonrsquot knowrdquo category and argues per-suasively that the inclusion of this third response can in-crease the precision and accuracy of threshold estimateswhile also leaving the subject a happier person He callsit the ldquounforced choicerdquo method Most of his article con-cerns the understandable fear of introducing a responsebias exactly what 2AFC was invented to avoid He has de-veloped an intelligent rule for what stimulus level shouldfollow a ldquodonrsquot knowrdquo response Kaernbach shows withboth simulations and real data that with his method thispotential response bias problem has only a second-ordereffect on threshold estimates (similar to the 2AFC inter-val bias discussed around Equation 7) and its advantagesoutweigh the disadvantages I urge anyone using the 2AFCmethod to read it I conjecture that Kaernbachrsquos method isespecially useful in trimming the lower tail of the distri-bution of threshold estimates and also in reducing the in-terval bias of 2AFC tasks since the low confidence ratingwould tend to be used for trials on which there is the mostuncertainty

Although Kaernbach has managed to convince me ofthe usefulness of his three-response method he might havetrouble convincing others because of the response biasquestion I should like to offer a solution that might quellthe response bias fears Rather than using 2AFC with threeresponses one can increase the number of responses tofourmdashH1 L1 L2 H2mdashas discussed two paragraphsabove The L rating can be used when subjects feel theyare guessing

To illustrate how this staircase would work considerthe case in which we would like the 2AFC staircase to con-verge to about 84 correct A 5 upndash1 down staircase(5 levels up for a wrong response and 1 level down for acorrect response) leads to an average probability correct

MEASURING THE PSYCHOMETRIC FUNCTION 1437

of 56 5 8333 If one had zero information about thetrial and guessed then one would be correct half the timeand the staircase would shift an average of (5 1)2 512 levels Thus in his scheme with a single ldquodonrsquot knowrdquoresponse Kaernbach would have the staircase move up2 levels following a ldquodonrsquot knowrdquo response One mightworry that continued use of the ldquodonrsquot knowrdquo responsewould continuously increase the level so that one wouldget an upward bias But Kaernbach points out that theldquodonrsquot knowrdquo response replaces incorrect responses to agreater extent than it replaces correct responses so thatthe 12 level shift is actually a more negative shift than the15 shift associated with a wrong response For my sug-gestion of four responses the step shifts would be ndash1 1113 and 15 levels for responses HC LC LI and HIwhere H and L stand for high and low confidence and Cand I stand for correct and incorrect A slightly more con-servative set of responses would be ndash1 0 14 15 inwhich case the low-confidence correct judgment leaves thelevel unchanged In all these examples including Kaern-bachrsquos three-response method the low-confidence re-sponse has an average level shift of 12 when one is guess-ing The most important feature of the low confidencerating is that when one is below threshold giving a lowconfidence rating will prevent one from long strings oflucky guesses that bring one to an inappropriately lowlevel This is the situation that can be difficult to overcomeif it occurs in the early phase of a staircase when the stepsize can be large The use of the confidence rating wouldbe especially important for short runs where the extra bitof information from the subject is useful for minimizingerroneous estimates of threshold

Another reason not to be fearful of the multiple-response2AFC method is that after the run is finished one can es-timate threshold by using a maximum likelihood proce-dure rather than by averaging the run locations Standardlikelihood analysis would have to be modified for Kaern-bachrsquos three-response method in order to deal with theldquodonrsquot knowrdquo category For the suggested four-responsemethod one could simply ignore the confidence ratings forthe likelihood analysis My simulations discussed earlierin this section and Greenrsquos (1990) reveal that averagingover the levels (excluding a number of initial levels toavoid the starting point bias) has about as good a preci-sion and accuracy as the best of the likelihood methodsIf the threshold estimates from the likelihood method andfrom the averaging levels method disagree that run couldbe thrown out

Summary of advantages of Brownian staircaseSince there is no substantial variance advantage to thefancier likelihood methods one can pay more attention toa number of advantages associated with simple staircases

1 At the beginning of a run likelihood methods mayhave too little inertia (the sequential stimulus levels jumparound easily) unless a ldquostiff rdquo prior is carefully chosenBy inertia I refer to how difficult it is for the next level todeviate from the present level At the beginning of a runone would like to familiarize the subject with the stimu-

lus by presenting a series of stimuli that gradually increasein difficulty This natural feature of the Brownian stair-case is usually missing in QUEST-like methods Theslightly inefficient first few trials of a simple staircasemay actually be a benefit since it gives the subject somepractice with slightly visible stimuli

2 Toward the end of a maximum likelihood run one canhave the opposite problem of excessive inertia wherebyit is difficult to have substantial shifts of test contrastThis can be a problem if nonstationary effects are presentFor example thresholds can increase in mid-run owingto fatigue or to adaptation if a suprathreshold pedestal ormask is present In an old-fashioned Brownian staircasewith moderate inertia a quick look at the staircase trackshows the change of threshold in mid-run That observa-tion can provide an important clue to possible problemswith the current procedures A method that reveals non-stationarity is especially important for difficult subjectssuch as infants or patients in whom fatigue or inatten-tion can be an important factor

3 Several researchers have told me that the simplicity ofthe staircase rules is itself an advantage It makes writingprograms easier It is nicer for teaching students It alsosimplifies the uncertain business of guessing likelihoodpriors and communicating the methodology when writ-ing articles

Methods That Can Be Used to Estimate Slope as Well as Threshold

There are good reasons why one might want to learnmore about the PF than just the single threshold valueEstimating the PF slope is the first step in this directionTo estimate slope one must test the PF at more than onepoint Methods are available to do this (Kaernbach 2001bKontsevich amp Tyler 1999) and one could even adapt sim-ple Brownian staircase procedures to give better slope es-timates with minimum harm to threshold estimates (seethe comments in Strasburger 2001b)

Probably the best method for getting both thresholds andslopes is the Y (Psi) method of Kontsevich and Tyler(1999) The Y method is not totally novel it borrows tech-niques from many previous researchersmdashas Kontsevichand Tyler point out in a delightful paragraph that both clar-ifies their method and states its ancestry This paragraphsummarizes what is probably the most accurate and preciseapproach to estimating thresholds and slope so I reproduceit here

To summarize the Y method is a combination of solutionsknown from the literature The method updates the poste-rior probability distribution across the sampled space ofthe psychometric functions based on Bayesrsquo rule (Hall1968 Watson amp Pelli 1983) The space of the psychome-tric functions is two-dimensional (King-Smith amp Rose1997 Watson amp Pelli 1983) Evaluation of the psycho-metric function is based on computing the mean of theposterior probability distribution (Emerson 1986 King-Smith et al 1994) The termination rule is based on thenumber of trials as the most practical option (King-Smith

1438 KLEIN

et al 1994 Watson amp Pelli 1983) The placement of eachnew trial is based on one-step ahead minimum search(King-Smith 1984) of the expected entropy cost function(Pelli 1987)

A simplified rough description of the Y trial placementfor 2AFC runs is that initially trials are placed at about the90 correct level to establish a solid threshold Oncethreshold has been roughly established trials are placed atabout the 90 and the 70 levels to pin down the slopeThe precise levels depend on the assumed PF shape

Before one starts measuring slopes however one mustfirst be convinced that knowledge of slope can be impor-tant It is so for several reasons

1 Parametric fitting procedures either require knowl-edge of slope b or must have data that adequately con-strain the slope estimate Part of the discussion that fol-lows will be on how to obtain reliable threshold estimateswhen slope is not well known A tight knowledge of slopecan minimize the error of the threshold estimate

2 Strasburger (2001a 2001b) gives good argumentsfor the desirability of knowing the shape of the PF He isinterested in differences between foveal and peripheralvisual processing and he uses the PF slope as an indica-tion of differences

3 Slopes are different for different stimuli and thiscan lead to misleading results if slope is ignored It mayhelp to describe my experience with the Modelfest pro-ject (Carney et al 2000) to make this issue concreteThis continuing project involves a dozen laboratoriesaround the world We have measured detection thresh-olds for a wide variety of visual stimuli The data areused for developing an improved model of spatial visionSome of the stimuli were simple easily memorized lowspatial frequency patterns that tend to have shallowerslopes because a stimulus-known-exactly template canbe used efficiently and with minimal uncertainty On theother hand tiny Modelfest stimuli can have a good dealof spatial uncertainty and are therefore expected to pro-duce PFs with steeper slopes Because of the slope changethe pattern of thresholds will depend on the level atwhich threshold is measured When this rich dataset isanalyzed with f ilter models that make estimates formechanism bandwidths and mechanism pooling thesedifferent slope-dependent threshold patterns will lead todifferent estimates for the model parameters Several ofus in the Modelfest data gathering group argued that weshould use a data-gathering methodology that wouldallow estimation of PF slope of each of the 45 stimuliHowever I must admit that when I was a subject my pa-tience wore thin and I too chose the quicker runs Al-though I fully understand the attraction of short runs fo-cused on thresholds and ignoring slope I still think thereare good reasons for spreading trials out For one thinginterspersing trials that are slightly visible helps the sub-ject maintain memory of the test pattern

4 Slopes can directly resolve differences between mod-els In my own research for example my colleagues and

I are able to use the PF slope to distinguish long-range fa-cilitation that produces a low-level excitation or noise re-duction (typical with a cross-orientation surround) from ahigher level uncertainty reduction (typical with a same-orientation surround)

Throwing-Out Rules Goodness-of-FitThere are occasions when a staircase goes sour Most

often this happens when the subjectrsquos sensitivity changesin the middle of a run possibly because of fatigue It istherefore good to have a well-specified rule that allowsnonstationary runs to be thrown out The throwing-out rulecould be based on whether the PF slope estimate is toolow or too high Alternatively a standard approach is tolook at the chi-square goodness-of-f it statistic whichwould involve an assumption about the shape of the un-derlying PF I have a commentary in Section III regard-ing biases in the chi-square metric that were found byWichmann and Hill (2001a) Biases make it difficult touse standard chi-square tables The trouble with this ap-proach of basing the goodness-of-fit just on the cumulateddata is that it ignores the sequential history of the run TheBrownian upndashdown staircase makes the sequential historyvividly visible with a plot of sequentially tested levelsWhen fatigue or adaptation sets in one sees the tested lev-els shift from one stable point to another Leek Hanna andMarshall (1991) developed a method of assessing the non-stationarity of a psychometric function over the course ofits measurement using two interleaved tracks EarlierHall (1983) also suggested a technique to address the sameproblem I encourage further research into the develop-ment of a non-stationarity statistic that can be used to dis-card bad runs This statistic would take the sequential his-tory into account as well as the PF shape and the chi-square

The development of a ldquothrowing-outrdquo rule is the ex-treme case of the notion of weighted averaging One im-portant reason for developing good estimators for good-ness-of-fit and good estimators for threshold variance isto be able to optimally combine thresholds from repeatedexperiments I shall return to this issue in connectionwith the Wichmann and Hill (2001a) analysis of bias ingoodness-of-fit metrics

Final Thought The Method Should Depend on the Situation

Also important is that the method chosen should be ap-propriate for the task For example suppose one is usingwell-experienced observers for testing and retesting sim-ilar conditions in which one is looking for small changesin threshold or one is interested in the PF shape In suchcases one has a good estimate of the expected thresholdand the signal detection method of constant stimuli withblanks and with ratings might be best On the other handfor clinical testing in which one is highly uncertain aboutan individualrsquos threshold and criterion stability a 2AFClikelihood or staircase method (with a confidence rating)might be best

MEASURING THE PSYCHOMETRIC FUNCTION 1439

III DATA-FITTING METHODS FOR ESTIMATING THRESHOLD AND SLOPEPLUS GOODNESS-OF-FIT ESTIMATION

This final section of the paper is concerned with whatto do with the data after they have been collected Wehave here the standard Bayesian situation Given the PFit is easy to generate samples of data with confidence in-tervals But we have the inverse situation in which we aregiven the data and need to estimate properties of the PFand get confidence limits for those estimates The pair ofarticles in this special issue by Wichmann and Hill (2001a2001b) as well as the articles by King-Smith et al (1994)Treutwein (1995) and Treutwein and Strasburger (1999)do an outstanding job of introducing the issues and providemany important insights For example Emerson (1986)and King-Smith et al emphasize the advantages of meanlikelihood over maximum likelihood The speed of mod-ern computers makes it just as easy to calculate meanlikelihood as it is to calculate maximum likelihood As an-other example Treutwein and Strasburger (1999) demon-strate the power of using Bayesian priors for estimatingparameters by showing how one can estimate all four pa-rameters of Equation 1A in fewer than 100 trials This im-portant finding deserves further examination

In this section I will focus on four items First I will dis-cuss the relative merits of parametric versus nonparamet-ric methods for estimating the PF properties The paperby Miller and Ulrich (2001) provides a detailed analysis ofa nonparametric method for obtaining all the moments ofthe PF I will discuss some advantages and limitations oftheir method Second is the question of goodness-of-fitWichmann and Hill (2001a) show that when the numberof trials is small the goodness-of-fit can be very biasedI will examine the cause of that bias Third is the issue dis-cussed by Wichmann and Hill (2001a) that of how lapseseffect slope estimates This topic is relevant to Strasburg-errsquos (2001b) data and analysis Finally there is the possiblycontroversial topic of an upward slope bias in adaptivemethods that is relevant to the papers of Kaernbach (2001b)and Strasburger (2001b)

Parametric Versus Nonparametric FitsThe main split between methods for estimating pa-

rameters of the psychometric function is the distinctionbetween parametric and nonparametric methods In aparametric method one uses a PF that involves severalparameters and uses the data to estimate the parametersFor example one might know the shape of the PF and es-timate just the threshold parameter McKee et al (1985)show how slope uncertainty affects threshold uncertaintyin probit analysis The two papers by Wichmann and Hill(2001a 2001b) are an excellent introduction to many is-sues in parametric fitting An example of a nonparamet-ric method for estimating threshold would be to averagethe levels of a Brownian staircase after excluding the firstfour or so reversals in order to avoid the starting level biasBefore I served as guest editor of this special symposium

issue I believed that if one had prior knowledge of the exactPF shape a parametric method for estimating threshold wasprobably better than nonparametric methods After read-ing the articles presented here as well as carrying out myrecent simulations I no longer believe this I think thatin the proper situation nonparametric methods are as goodas parametric methods Furthermore there are situationsin which the shape of the PF is unknown and nonparamet-ric methods can be better

Miller and Ulrichrsquos Article on the Nonparametric SpearmanndashKaumlrber Method of Analysis

The standard way to extract information from psycho-metric data is to assume a functional shape such as a Wei-bull cumulative normal logit dcent and then vary the pa-rameters until likelihood is maximized or chi-square isminimized Miller and Ulrich (2001) propose using theSpearmanndashKaumlrber (SK) nonparametric method whichdoes not require a functional shape They start with a prob-ability density function defined by them as the derivativeof the PF

(33)

where s(i) is the stimulus intensity at the ith level andP(i) is the probability correct at the that level The notationpdf(i) is meant to indicate that it is the pdf for the intervalbetween i ndash 1 and i Miller and Ulrich give the r th momentof this distribution as their Equation 3

(34)

The next four equations are my attempt to clarify their ap-proach I do this because when applicable it is a finemethod that deserves more attention Equation 34 prob-ably came from the mean value theorem of calculus

(35)

where f (s) 5 sr11 and f cent(smid) 5 (r 1 1)smidr is the deriv-

ative of f with respect to s somewhere within the intervalbetween s(i 1) and s(i) For a slowly varying functionsmid would be near the midpoint Given Equation35 Equa-tion 34 becomes

(36)

where Ds 5 s(i) s(i 1) Equation 36 is what one wouldmean by the rth moment of the distribution The case r 50 is important since it gives the area under the pdf

(37)

by using the definition in Equation 33 The summation inEquation 37 gives the area under the pdf For it to be aproper pdf its area must be unity which means that theprobability at the highest level P(imax) must be unity andthe probability at the lowest level Pmin must be zero asMiller and Ulrich (2001) point out

m0 = =aringpdf( ) ( ) ( ) max mini s P i P ii

D

mrr

i

i s s= aring pdf mid( ) D

f s i f s i f s s i s i( ) ( ) ( ) ( ) ( ) [ ] [ ] = cent [ ]1 1mid

mrr r

ri s i s i=

+ [ ]aring + +11

11 1pdf( ) ( ) ( )

pdf( )( ) ( )

( ) ( )i

P i P i

s i s i= 1

1

1440 KLEIN

The next simplest SK moment is for r 5 1

(38)

from Equation 33 This first moment provides an esti-mate of the centroid of the distribution corresponding tothe center of the psychometric function It could be usedas an estimate of threshold for detection or PSE for dis-crimination An alternate estimator would be the medianof the pdf corresponding to the 50 point of the PF (theproper definition of PSE for discrimination)

Miller and Ulrich (2001) use Monte Carlo simulationsto claim that the SK method does a better job of extract-ing properties of the psychometric function than do para-metric methods One must be a bit careful here since intheir parametric fitting they generate the data by usingone PF and then fit it with other PFs But still it is a sur-prising claim if this simple method is so good why havepeople not been using it for years There are two possibleanswers to this question One is that people never thoughtof using it before Miller and Ulrich The other possibil-ity is that it has problems I have concluded that the an-swer is a combination of both factors I think that there isan important place for the SK approach but its limitationsmust be understood That is my next topic

The most important limitation of the SK nonparamet-ric method is that it has only been shown to work well formethod of constant stimulus data in which the PF goesfrom 0 to 1 I question whether it can be applied with thesame confidence to 2AFC and to adaptive procedures withunequal numbers of trials at each level The reason for thislimitation is that the SK method does not offer a simpleway to weight the different samples Let us consider twosituations with unequal weighting adaptive methods and2AFC detection tasks

Adaptive methods place trials near threshold but withvery uneven distribution of number of trials at differentlevels The SK method gives equal weighting to sparselypopulated levels that can be highly variable thus con-tributing to a greater variance of parameter estimates Letus illustrate this point for an extreme case with five levelss 5 [1 2 3 4 5] Suppose the adaptive method placed [11 1 98 1] trials at the five levels and the number correctwere [0 1 1 49 1] giving probabilities [0 1 1 5 1] andpdf 5 [1 0 5 5] The following PEST-like (Taylor ampCreelman 1967) rules could lead to this unusual dataMove down a level if the presently tested level has greaterthan P1 correct move up three levels if the presentlytested level has less than P2 correct P1 can be anynumber less than 33 and P2 can be any number greaterthan 67 The example data would be produced by start-ing at Level 5 and getting four in a row correct followedby an error followed by a wide set possible of alternat-ing responses for which the PEST rules would leave thefourth level as the level to be tested The SK estimate of the

center of the PF is given by Equation 38 to be micro1 5 200Since a pdf with a negative probability is distasteful tosome one can monotonize the PF according to the proce-dure of Miller and Ulrich (which does include the numberof trials at each level) The new probabilities are [0 5151 51 1] with pdf 5 [51 0 0 49] The new estimate ofthreshold is micro1 5 297 However given the data with 4998correct at Level 4 an estimate that includes a weightingaccording to the number of trials at each level would placethe centroid of the distribution very close to micro1 5 4 Thuswe see that because the SK procedure ignores the numberof trials at each level it produces erroneous estimates ofthe PF properties This example with shifting thresholdswas chosen to dramatize the problems of unequal weight-ing and the effect of monotonizing

A similar problem can occur even with the method ofconstant stimuli with equal number of trials at each levelbut with unequal binomial weighting as occurs in 2AFCdata The unequal weighting occurs because of the 50lower asymptote The binomial statistics error bar of dataat 55 correct is much larger than for data at 95 cor-rect Parametric procedures such as probit analysis giveless weighting to the lower data This is why the most ef-ficient placement of trials in 2AFC is above 90 correctas I have discussed in Section II Thus it is expected thatfor 2AFC the SK method would give threshold estimatesless precise than the estimates given by a parametricmethod that includes weighting

I originally thought that the SK method would fail on alltypes of 2AFC data However in the discussion followingEquation 7 I pointed out that for 2AFC discriminationtasks there is a way of extending the data to the 0ndash100range that produces weightings similar to those for a yesnotask Rather than use the standard Equation 2 to deal withthe 50 lower asymptote one should use the method dis-cussed in connection with Figure 3 in which one can usefor the ordinate the percent of correct responses in Inter-val 2 and use the signal strength in Interval 2 minus thatin Interval 1 for the abscissa That procedure produces aPF that goes from 0 to 100 thereby making it suit-able for SK analysis while also removing a possible bias

Now let us examine the SK method as applied to datafor which it is expected to work such as discriminationPFs that go from 0 to 100 These are tasks in which thelocation of the psychometric function is the PSE and theslope is the threshold

A potentially important claim by Miller and Ulrich(2001) is that the SK method performs better than probitanalysis This is a surprising claim since their data aregenerated by a probit PF one would expect probit analy-sis to do well especially since probit analysis does espe-cially well with discrimination PFs that go from 0 to 100To help clarify this issue Miller and I did a number ofsimulations We focused on the case of N 5 40 with 8 tri-als at each of 5 levels The cumulative normal PF (Equa-tion 14) was used with equally spaced z scores and with thelowest and highest probabilities being 1 and 99 Theprobabilities at the sample points are P(i) 5 1 1222

m1

2 21

2

11

2

=

= [ ] +

aring

aring

pdf( )( ) ( )

( ) ( )( ) ( )

is i s i

P i P is i s i

MEASURING THE PSYCHOMETRIC FUNCTION 1441

50 8778 and 99 corresponding to z scores ofz(i) 5 ndash2328 1164 0 1164 and 2328 Nonmonot-onicity is eliminated as Miller and Ulrich suggest (seethe Appendix for Matlab Code 4 for monotonizing) I didMonte Carlo simulations to obtain the standard deviationof the location of the PF micro1 given by Equation38 I foundthat both the SK method and the parametric probit method(assuming a fixed unity slope) gave a standard deviationof between 028 and 029 Miller and Ulrichrsquos result re-ported in their Table 17 gives a standard error of 0071However their Gaussian had a standard deviation of 025whereas my z-score units correspond to a standard devi-ation of unity Thus their standard error must be multipliedby 4 giving a standard error of 0284 in excellent agree-ment with my value So at least in this instance the SKmethod did not do better than the parametric method (withslope fixed) and the surprise mentioned at the beginningof this paragraph did not materialize I did not examineother examples where nonparametric methods might dobetter than the matched parametric method However thesimulations of McKee et al (1985) give some insight intowhy probit analysis with floating slope can do poorly evenon data generated from probit PFs McKee et al showthat when the number of trials is low and if stimulus lev-els near both 0 and 100 are not tested the slope can bevery flat and the predicted threshold can be quite distantIf care is not taken to place a bound on these probit esti-mates one would expect to find deviant threshold esti-mates The SK method on the other hand has built-inbounds for threshold estimates so outliers are avoidedThese bounds could explain why SK can do better thanprobit When the PF slope is uncertain nonparametricmethods might work better than parametric methods adecided advantage for the SK method

It is instructive to also do an analytic calculation of thevariance based on an analysis similar to what was doneearlier If only the ith level of the five had been testedthe variance of the PF location (the PSE in a discrimina-tion task) would have been as follows (Gourevitch ampGalanter 1967)

(39)

where var(P) is given by Equation 27 and PF slope is

(40)

where for the cumulative normal dzi dxi 5 1s where sis the standard deviation of the pdf and

(41)

from Equation 3A By combining these equations one gets

(42)

Note that if all trials were placed at the optimal stimuluslocation of z 5 0 (P 5 5) Equation 42 would becomes2p 2Ni as discussed earlier When all five levels arecombined the total variance is smaller than each individ-ual variance as given by the variance summation formula

(43)

Finally upon plugging in the five values of zi rangingfrom ndash2328 to 12328 and Pi ranging from 1 to 99the standard error of the PF location is

(44)

This value is precisely the value obtained in our MonteCarlo simulations of the nonparametric SK method andalso by the slope fixed parametric probit analysis Thistriple confirmation shows that the SK method is excel-lent in the realm where it applies (equal number of trialsat levels that span 0 to 100) It also confirms our sim-ple analytic formula

The data presented in Miller and Ulrichrsquos Table 17 giveus a fine opportunity to ask about the efficiency of themethod of constant stimuli that they use For the N 5 40example discussed here the variance is

(45)

This value is more than twice the variance of p (2 Ntot)that is obtained for the same PF using adaptive methodsincluding the simple upndashdown staircase that place trialsnear the optimal point

The large variance for the Miller and Ulrich stimulusplacement is not surprising given that of the five stimu-lus levels the two at P 5 1 and 99 are quite inefficientThe inefficient placement of trials is a common problemfor the method of constant stimuli as opposed to adaptivemethods However as I mentioned earlier the method ofconstant stimuli combined with multiple response ratingsin a signal detection framework may be the best of allmethods for detection studies for revealing properties of thesystem near threshold while maintaining a high efficiency

A curious question comes up about the optimal placementof trials for measuring the location of the psychometric func-tion With the original function the optimal placement is atthe steepest part of the function (at P 5 0) However withthe SK method when one is determining the location ofthe pdf one might think that the optimal placement of sam-ples should occur at the points where the pdf is steepestThis contradiction is resolved when one realizes that onlythe original samples are independent The pdf samples areobtained by taking differences of adjacent PF values soneighboring samples are anticorrelated Thus one cannot

var tottot

= =SEN

2 3 24

SE = =var tot12 0 2844

var var tot = aring1 1i

var( )exp

PSE i i i

i

i

P Pz

N= ( ) ( )

2 12

2

ps

dP

dz

zi

i

iaeligegraveccedil

oumloslashdivide

=( )2 2

2

exp

p

dP

dx

dP

dz

dz

dxi

i

i

i

i

i

= times

var( )var

PSE i

i

i

i

P

dP

dx

=( )

aeligegraveccedil

oumloslashdivide

2

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

MEASURING THE PSYCHOMETRIC FUNCTION 1429

dcent and probability correct should be treated cautiouslyW P Banks and I (unpublished) investigated this topic andfound that near the P 5 50 correct point the dependenceof dcent on ROC slope is minimal Away from the 50 pointthe dependence can be strong

YesNo detection Linschoten et al (2001) comparethree methods (limits staircase 2AFC likelihood) for mea-suring thresholds with a small number of trials In themethod of limits one starts with a subthreshold stimulusand gradually increases the strength On each trial the ob-server says ldquoyesrdquo or ldquonordquo with respect to whether or not thestimulus is detected Although this is a classic yesno de-tection method it will not be discussed in this article be-cause there is no control of response bias That is blankswere not intermixed with signals

In Table 1 the Wichmann and Hill (2001a 2001b) arti-cles are marked with 3s in parentheses because althoughthese authors write only about 2AFC they mention that alltheir methods are equally applicable for yesno or m-AFCof any type (detectiondiscrimination) And their Matlabimplementations are fully general All the special issue ar-ticles reporting experimental data used a forced choicemethod This bias in favor of the forced choice methodol-ogy is found not only in this special issue it is widespreadin the psychophysics community Given the advantages ofthe yesno method (see discussion in Section II) I hopethat once yesno adaptive methods are accepted they willbecome the method of choice

m-AFC discrimination It is common practice to rep-resent 2AFC results as a plot of percent correct averagedover all trials versus stimulus strength This PF goes from50 to 100 The asymmetry between the lower andupper asymptotes introduces some inefficiency in thresh-old estimation as will be discussed Another problem withthe standard 50 to 100 plot is that a bias in choice ofinterval will produce an underestimate of dcent as has beenshown earlier Researchers often use the 2AFC method be-cause they believe that it avoids biased threshold estimatesIt is therefore surprising that the relatively simple correc-tion for 2AFC bias discussed earlier is rarely done

The issue of bias in m-AFC also occurs when m 2 InStrasburgerrsquos 10AFC letter discrimination task it is com-mon for subjects to have biases for responding with par-ticular letters when guessing Any imbalance in the re-sponse bias for different letters will result in a reductionof dcent as it did in 2AFC

There are several benefits of viewing the 2AFC dis-crimination data in terms of a PF going from 0 to 100Not only does it provide a simple way of viewing and cal-culating the interval bias it also enables new methods forestimating the PF parameters such as those proposed byMiller and Ulrich (2001) as will be discussed in Sec-tion III In my original comments on the Miller and Ulrichpaper I pointed out that because their nonparametric pro-cedure has uniform weighting of the different PF levelstheir method does not apply to 2AFC tasks The asymmet-ric binomial error bars near the 50 and 100 levelscause the uniform weighting of the Miller and Ulrich ap-

proach to be nonoptimal However I now realize that the2AFC discrimination task can be fit by a cumulative nor-mal going from 0 to 100 Because of that insight Ihave marked the discrimination forced choice column ofTable 1 for the Miller and Ulrich (2001) paper with theproviso that the PF goes from 0 to 100 Owing to thepopularity of 2AFC this modification greatly expands therelevance of their nonparametric approach

YesNo discrimination Kaernbach (2001b) and Millerand Ulrich (2001) offer theoretical articles that examineproperties of PFs that go from P 5 0 to 100 (P 5 pin Equation 1) In both cases the PF is the cumulative nor-mal (Equation 3) Although these PFs with g 5 0 couldbe for a yesno detection task with a zero false alarm rate(not plausible) or a forced choice detection task with an in-finite number of alternatives (not plausible either) I sus-pect that the authors had in mind a yesno discriminationtask (Table 1 col 5) A typical discrimination task in vi-sion is contrast discrimination in which the observer re-sponds to whether the presented contrast is greater than orless than a memorized reference Feedback reinforces thestability of the reference In a typical discrimination taskthe reference is one exemplar from a continuum of stimu-lus strengths If the reference is at a special zero pointrather than being an element of a smooth continuum thetask is no longer a simple discrimination task Zero con-trast would be an example of a special reference Klein(1985) discusses several examples which illustrate how anatural zero can complicate the analysis One might won-der how to connect the PF from the detection regime inwhich the reference is zero contrast to the discriminationregime in which the reference (pedestal) is at a high con-trast The dcent function to be introduced in Equation 20 doesa reasonably good job of fitting data across the full range ofpedestal strength going from detection to discrimination

The connection of the discrimination PF to the signaldetection PF is the same as that for yesno detection givenin Equation 5 dcent(x) 5 z(x) z(0) where z is the z scoreof the probability of a ldquogreater thanrdquo judgment In a dis-crimination task the bias z(0) is the z score of the prob-ability of saying ldquogreaterrdquo when the reference stimulus ispresented The stimulus strength x that gives z(x) 5 0 isthe point of subjective equality (x 5 PSE) If the cumu-lative normal PF of Equation 3 is used then the z scoreis linearly proportional to the stimulus strength z(x) 5(x PSE)threshold where threshold is defined to be thepoint at which dcent 5 1 I will come back to these distinc-tions between detection and discrimination PFs after pre-senting more groundwork regarding thresholds log ab-scissas and PF shapes (see Equations 14 and 17)

Definition of ThresholdThreshold is often defined as the stimulus strength that

produces a probability correct halfway up the PF If humansoperated according to a high-threshold assumption thisdefinition of threshold would be stable across different ex-perimental methods However as I discussed followingEquation 2 high-threshold theory has been discredited

1430 KLEIN

According to the more successful signal detection theory(Green amp Swets 1966) the dcent at the midpoint of the PFchanges according to the number of alternatives in aforced choice method and according to the false alarm ratein a yesno method This variability of dcent with method is agood reason not to define threshold as the halfway pointof the PF

A definition of threshold that is relatively independent ofthe method used for its measurement is to define thresholdas the stimulus strength that gives a fixed value of dcent Thestimulus strength that gives dcent 5 1 (76 correct for 2AFC)is a common definition of threshold Although I will showthat higher dcent levels have the advantage of giving more pre-cise threshold estimates unless otherwise stated I will takethreshold to be at dcent 5 1 for simplicity This definition ap-plies to both yesno and m-AFC tasks and to both detectionand discrimination tasks

As an example consider the case shown in Figure 1where the lower asymptote (false alarm rate) in a yesnodetection task is g 5 1587 corresponding to a z score ofz 5 ndash1 If threshold is defined to be at dcent 5 10 then fromEquation 5 the z score for threshold is z 5 0 correspond-ing to a hit rate of 50 (not quite halfway up the PF) Thisexample with a 50 hit rate corresponds to defining dcentalong the horizontal ROC axis If threshold had been de-fined to be dcent 5 2 in Figure 1 then the probability correctat threshold would be 8413 This example in which boththe hit rate and correct rejection rate are equal (both are8413) corresponds to the ROC negative diagonal

Strasburgerrsquos Suggestion on Specifying Slope A Logarithmic Abscissa

In many of the articles in this special issue a logarithmicabscissa such as decibels is used Many shapes of PFs havebeen used with a log abscissa Most delightful but frus-trating is Strasburgerrsquos (2001a) paper on the PF maximumslope He compares the Weibull logistic Quick cumula-tive normal hyperbolic tangent and signal detection dcentusing a logarithmic abscissa The present section is the out-come of my struggles with a number of issues raised byStrasburger (2001a) and my attempt to clarify them

I will typically express stimulus strength x in thresh-old units

(8)

where a is the threshold Stimulus strength will be ex-pressed in natural logarithmic units y as well as in lin-ear units x

(9)

where y 5 loge(x) is the natural log of the stimulus and Y5 loge(a) is the threshold on the log abscissa

The slope of the psychometric function P( yt) with a log-arithmic abscissa is

and the maximum slope is called bcent by Strasburger (2001a)A very different definition of slope is sometimes used bypsychophysicists the logndashlog slope of the dcent function aslope that is constant at low stimulus strengths for manyPFs The logndashlog dcent slope of the Weibull function is shownin the bottom pair of panels in Figure 1 The low-contrastlogndashlog slope is b for a Weibull PF (Equation 12) and bfor a dcent PF (Equation 20) Strasburger (2001a) shows howthe maximum slope using a probability ordinate is con-nected to the logndashlog slope using a dcent ordinate

A frustrating aspect of Strasburgerrsquos article is that theslope units of P (probability correct per loge) are not fa-miliar Then it dawned on me that there is a simple con-nection between slope with a loge abscissa slopelog 5[dP( yt)](dyt) and slope with a linear abscissa slopelin 5[dP(xt)](dxt) namely

(11)

because dyt dxt 5 [d loge(xt)] (dxt) 5 1xt At thresholdxt 5 1 ( yt 5 0) so at that point Strasburgerrsquos slope with alogarithmic axis is identical to my familiar slope plotted inthreshold units on a linear axis The simple connection be-tween slope on the log and linear abscissas converted me tobeing a strong supporter of using a natural log abscissa

The Weibull and cumulative normal psychometricfunctions To provide a background for Strasburgerrsquos ar-ticle I will discuss three PFs Weibull cumulative nor-mal and dcent as well as their close connections A com-mon parameterization of the PF is given by the Weibullfunction

(12)

where pweib(xt) the PF that goes from 0 to 100 is re-lated to Pweib(xt) the probability of a correct responseby Equation 1 b is the slope and k controls the definitionof threshold pweib(1) 5 1 k is the percent correct atthreshold (xt 5 1) One reason for the Weibullrsquos popular-ity is that it does a good job of fitting actual data In termsof logarithmic units the Weibull function (Equation 12) be-comes

(13)

Panel a of Figure 1 is a plot of Equation 12 (Weibull func-tion as a function of x on a linear abscissa) for the case b 52 and k 5 exp( 1) 5 0368 The choice b 5 2 makes theWeibull an upside-down Gaussian Panel d is the samefunction this time plotted as a function of y correspondingto a natural log abscissa With this choice of k the point ofmaximum slope as a function of yt is the threshold point( yt 5 0) The point of maximum slope at yt 5 0 is markedwith an asterisk in panel d The same point in panel a atxt 5 1 is not the point of maximum slope on a linear ab-scissa because of Equation 11 When plotted as a dcent func-tion [a z-score transform of P(xt)] in panel b the Weibullaccelerates below threshold and decelerates above thresh-

p y ktyt

weib( ) = ( )[ ]1exp

b

p x ktxt

weib ( ) = 1b

slopeslope

lin =( )

=( )

times =dP x

dx

dP y

dy

dy

dx xt

t

t

t

t

t t

log

slope y dPdy

dpdy

tt

t

( ) =

= ( )1 g

y x x y Yt t= ( ) = =log log e e a

x xt = a

(10A)

(10B)

MEASURING THE PSYCHOMETRIC FUNCTION 1431

old in agreement with a wide range of experimental dataThe acceleration and deceleration are most clearly seen bythe slope of dcent in the logndashlog coordinates of panel e Thelogndashlog dcent slope is plotted in panels c and f At xt 5 0 thelogndashlog slope is 2 corresponding to our choice of b 5 2The slope falls surprisingly rapidly as stimulus strength in-creases and the slope is near 1 at xt 5 2 If we had chosenb 5 4 for the Weibull function then the logndashlog dcent slopewould have gone from 4 at xt 5 0 to near 2 at xt 5 2Equation 20 will present a dcent function that captures thisbehavior

Another PF commonly used is based on the cumulativenormal (Equation 3)

(14A)

or

(14B)

where s is the standard error of the underlying Gaussianfunction (its connection to b will be clarified later) Notethat the cumulative normal PF in Equation 14 should beused only with a logarithmic abscissa because y goes to

yen needed for the 0 lower asymptote whereas theWeibull function can be used with both the linear (Equa-tion 12) and the log (Equation 13) abscissa

Two examples of Strasburgerrsquos maximum slope (hisEquations 7 and 17) are

(15)

for the Weibull function (Equation 13) and

(16)

for the cumulative normal (Equation 14) In Equation 15 Iset k in Equation 12 to be k 5 exp( 1) This choiceamounts to defining threshold so that the maximum slopeoccurs at threshold ( yt 5 0) Note that Equations 12ndash16are missing the (1 g) factor that are present in Strasburg-errsquos Equations 7 and 17 because we are dealing with prather than P Equations 15 and 16 provide a clear simpleand close connection between the slopes of the Weibull andcumulative normal functions so I am grateful to Stras-burger for that insight

An example might help in showing how Equation 16works Consider a 2AFC task (g 5 05) assuming a cumu-lative normal PF with s 5 10 According to Equation 16bcent 5 0399 At threshold (assumed for now to be the pointof maximum slope) P(xt 5 1) 5 75 At 10 abovethreshold P(xt 5 11) 75 1 01 bcent 5 07899 which isquite close to the exact value of 7896 This example showshow bcent defined with a natural log abscissa yt is relevantto a linear abscissa xt

Now that the logarithmic abscissa has been introducedthis is a good place to stop and point out that the log ab-scissa is fine for detection but not for discrimination since

the x 5 0 point is often in the middle of the range How-ever the cumulative normal PF is especially relevant to dis-crimination where the PF goes from 0 to 100 andwould be written as

(17A)

or

(17B)

where x 5 PSE is the point of subjective equality Theparameter a is the threshold since dcent(a) 5 zF(a)zF(0) 5 1 The threshold a is also s standard deviationsof the Gaussian probability density function (pdf ) Thereciprocal of a is the PF slope I tend to use the letter sfor a unitless standard deviation as occurs for the loga-rithmic variable y I use a for a standard deviation thathas the units of the stimulus x (like percent contrast) Thecomparison of Equation 14B for detection and Equation17B for discrimination is useful Although Equation 14is used in most of the articles in this special issue whichare concerned with detection tasks the techniques that I willbe discussing are also relevant to discrimination tasks forwhich Equation 17 is used

Threshold and the Weibull FunctionTo illustrate how a dcent 5 1 definition of threshold works

for a Weibull function let us start with a yesno task inwhich the false alarm rate is P(0) 5 1587 correspondingto a z score of zFA 5 1 as in Figure 1 The z score atthreshold (dcent 5 1) is zTh 5 zFA11 5 0 corresponding to aprobability of P(1) 5 50 From Equations 1 and 12 thek value for this definition of threshold is given by k 5[1 P(1)][1 P(0)] 5 05943 The Weibull function be-comes

(18A)

If I had defined dcent 5 2 to be threshold then zTh 5 zFA125 1 leading to k 5 (1 08413)(1 01587) 5 01886giving

(18B)

As another example suppose the false alarm rate in ayesno task is at 50 not uncommon in a signal detectionexperiment with blanks and test stimuli intermixed Thenthreshold at dcent 5 1 would occur at 8413 and the PFwould be

(18C)

with Pweib(0) 5 50 and Pweib(1) 5 8413 This casecorresponds to defining dcent on the ROC vertical intercept

For 2AFC the connection between dcent and z is z 5 dcent205Thus zTh 5 2 05 5 07071 corresponding to Pweib(1) 57602 leading to k 5 04795 in Equation 12 These con-nections will be clarified when an explicit form (Equation20)is given for the dcent function dcent(xt)

P xtxt

weib ( ) = 1 0 5 0 3173 b

P xtxt

weib ( ) = 1 1 0 1587 0 1886( ) b

P xtxt

weib ( ) = 1 1 0 1587 0 5943( ) b

z x xtF ( ) = PSE

a

p x P x xt tF F F( ) = ( ) = aelig

egraveoumloslash

PSEa

cent = =bs p s

12

0 399

cent = =b b bexp( ) 1 0 368

z yy y Y

tt( ) = =s s

p yy

tt

F F( ) =aeligegraveccedil

oumloslashdivides

1432 KLEIN

Complications With Strasburgerrsquos Advocacy of Maximum Slope

Here I will mention three complications implicated inStrasburgerrsquos suggestion of defining slope at the point ofmaximum slope on a logarithmic abscissa

1 As can be seen in Equation 11 the slopes on linearand logarithmic abscissas are related by a factor of 1xtBecause of this extra factor the maximum slope occursat a lower stimulus strength on linear as compared withlog axes This makes the notion of maximum slope lessfundamental However since we are usually interested inthe percent error of threshold estimates it turns out thata logarithmic abscissa is the most relevant axis support-ing Strasburgerrsquos log abscissa definition

2 The maximum slope is not necessarily at threshold(as Strasburger points out) For the Weibull functions de-fined in Equation 13 with k 5 exp( 1) and the cumula-tive normal function in Equation 14 the maximum slope(on a log axis) does occur at threshold However the twopoints are decoupled in the generalized Weibull functiondefined in Equation 12 For the Quick version of theWeibull function (Equation 13 with k 5 05 placingthreshold halfway up the PF) the threshold is below thepoint of maximum slope the derivative of P(xt) atthreshold is

which is slightly different from the maximum slope asgiven by Equation 15 Similarly when threshold is de-fined at dcent 5 1 the threshold is not at the point of max-imum slope People who fit psychometric functions wouldprobably prefer reporting slopes at the detection thresh-old rather than at the point of maximum slope

3 One of the most important considerations in selectinga point at which to measure threshold is the question ofhow to minimize the bias and standard error of the thresh-old estimate The goal of adaptive methods is to place tri-als at one level In order to avoid a threshold bias due to aimproper estimate of PF slope the test level should be atthe defined threshold The variance of the threshold esti-mate when the data are concentrated as a single level (aswith adaptive procedures) is given by Gourevitch and Gal-anter (1967) as

(19)

If the binomial error factor of P(1 P)N were not presentthe optimal placement of trials for measuring thresholdwould be at the point of maximum slope However thepresence of the P(1 P) factor shifts the optimal point toa higher level Wichmann and Hill (2001b) consider howtrial placement affects threshold variance for the methodof constant stimuli This topic will be considered in Sec-tion II

Connecting the Weibull and dcent FunctionsFigures 5ndash7 of Strasburger (2001a) compare the logndash

log dcent slope b with the Weibull PF slope b Strasburgerrsquosconnection of b 5 088b is problematic since the dcent ver-sion of the Weibull PF does not have a fixed logndashlog slopeAn improved dcent representation of the Weibull function isthe topic of the present section A useful parameterizationfor dcent which I shall refer to as the StromeyerndashFoley func-tion was introduced by Stromeyer and Klein (1974) andused extensively by Foley (1994)

(20)

where xt is the stimulus strength in threshold units as inEquation 8 The factors with a in the denominator are pres-ent so that dcent equals unity at threshold (xt 5 1 or x 5 a)At low xt dcent x t

ba The exponent b (the logndashlog slope ofdcent at very low contrasts) controls the amount of facilita-tion near threshold At high xt Equation 20 becomes dcentx t

1 w (1 a) The parameter w is the logndash log slope of thetest threshold versus pedestal contrast function (the tvc orldquodipperrdquo function) at strong pedestals One must be a bitcautious however because in yesno procedures the tvcslope can be decoupled from w if the signal detection ROCcurve has nonunity slope (Stromeyer amp Klein 1974) Theparameter a controls the point at which the dcent function be-gins to saturate Typical values of these unitless parametersare b 5 2 w 5 05 and a 5 06 (Yu Klein amp Levi 2001)The function in Equation 20 accelerates at low contrast(logndashlog slope of b) and decelerates at high contrasts (logndashlog slope of 1 w) in general agreement with a broadrange of data

For 2AFC z 5 dcentIuml2 so from Equation 3 the connec-tion between dcent and probability correct is

(21)

To establish the connection between the PFs specified inEquations12 and 20ndash21 one must first have the two agreeat threshold For 2AFC with the dcent 5 1 the threshold atP 5 7602 can be enforced by choosing k 5 04795 (seethe discussion following Equation 18C) so that Equa-tions 1 and 12 become

(22)

Modifying k as in Equation 22 leaves the Weibull shapeunchanged and shifts only the definition of threshold Ifb 5 106b w 5 1 039b and a 5 0614 then for allvalues of b the Weibull and dcent functions (Equations 21and 22 vs Equation 12) differ by less than 00004 for allstimulus strengths At very low stimulus strengths b 5b The value b 5 106b is a compromise for getting anoverall optimal fit

Strasburger (2001a) is concerned with the same issueIn his Table 1 he reports dcent logndashlog slopes of [8847

P xtxt( ) = 1 1 5 4795( )

b

P xd x

tt( ) = +

cent( )eacute

eumlecircecirc

5 5erf2

cent( ) =d xx

a a xt

tb

tb w+ 1 + 1( )

var( )( )

YP P

N

dP y

dy

t

t

=( )eacute

eumlecircecirc

12

cent =( )

= =b g b g bthresh

dP x

dx

t

t

e( )log ( )

( ) 12

2347 1

MEASURING THE PSYCHOMETRIC FUNCTION 1433

18379 3131 4421] for b 5 [1 2 35 5] If the dcent slopeis taken to be a measure of the dcent exponent b then theratio bb is [08847 09190 08946 08842] for the fourb values Our value of bb 5 106 differs substantiallyfrom 08 (Pelli 1985) and 088 (Strasburger 2001a) Pelli(1985) and Strasburger (2001a) used dcent functions with a 51 in Equation 20 With a 5 0614 our dcent function (Equa-tion 20) starts saturating near threshold in agreement withexperimental data The saturation lowers the effective valueof b I was very surprised to find that the StromeyerndashFoley dcent function did such a good job in matching theWeibull function across the whole range of b and x Fora long time I had thought a different fit would be neededfor each value of b

For the same Weibull function in a yesno method (falsealarm rate 5 50) the parameters b and w are identicalto the 2AFC case above Only the value of a shifts froma 5 0614 (2AFC) to a 5 054 (yesno) In order to makext 5 1 at dcent 5 1 k becomes 03173 (see Equation 18C)instead of 04795 (2AFC) As with 2AFC the differencebetween the Weibull and dcent function is less than 0004(lt04) for all stimulus strengths and all values of b andfor both a linear and a logarithmic abscissa These valuesmean that for both the yes no and the 2AFC methods thedcent function (Equation 20) corresponding to a Weibullfunction with b 5 2 has a low-contrast logndashlog slope ofb 5 212 and a high-contrast logndashlog slope of 1 w 5078 The high-contrast tvc (test contrast vs pedestal con-trast discrimination function sometimes called the ldquodip-perrdquo function) logndashlog slope would be w 5 022

I also carried out a similar analysis asking what cumu-lative normal s is best matched to the Weibull I founds 5 11720b However the match is poor with errors of4 (rather than lt004 in the preceding analysis) Thepoor fit of the two functions makes the fitting proceduresensitive to the weighting used in the fitting procedure Ifone uses a weighting factor of [P(1 P)N] where P is thePF one will find a quite different value for s than if oneignored the weighting or if one chose the weighting on thebasis of the data rather than the fitting function

II COMPARING EXPERIMENTAL TECHNIQUES FOR MEASURING

THE PSYCHOMETRIC FUNCTION

This section inspired by Linschoten et al (2001) is anexamination of various experimental methods for gather-ing data to estimate thresholds Linschoten et al comparethree 2AFC methods for measuring smell and taste thresh-olds a maximum-likelihood adaptive method an upndashdownstaircase method and an ascending method of limits Thespecial aspect of their experiments was that in measuringsmell and taste each 2AFC trial takes between 40 and60 sec (Lew Harvey personal communication) because ofthe long duration needed for the nostrils or mouth to re-cover their sensitivity The purpose of the Linschoten et alpaper is to examine which of the three methods is mostaccurate when the number of trials is low Linschoten et al

conclude that the 2AFC method is a good method for thistask My prior experience with forced choice tasks and lownumber of trials (Manny amp Klein 1985 McKee et al1985) made me wonder whether there might be better meth-ods to use in circumstances in which the number of trialsis limited This topic is considered in Section II

Compare 2AFC to YesNoThe commonly accepted framework for comparing

2AFC and yesno threshold estimates is signal detectiontheory Initially I will make the common assumption thatthe dcent function is a power function (Equation 20 witha 5 1)

(23)

Later in this section the experimentally more accurateform Equation 20 with a 1 will be used The varianceof the threshold estimate will now be calculated for 2AFCand yesno for the simplest case of trials placed at a sin-gle stimulus level y the goal of most adaptive methodsThe PF slope relates ordinate variance to abscissa vari-ance (from Gourevitch amp Galanter 1967)

(24)

where Y 5 y yt is the threshold estimate for a natural logabscissa as given in Equation 9 This formula for var(Y )is called the asymptotic formula in that it becomes exactin the limit of large number of total trials N Wichmannand Hill (2001b) show that for the method of constantstimuli a proper choice of testing levels brings one to theasymptotic region for N 120 (the smallest value thatthey explored) Improper choice of testing levels requiresa larger N in order to get to the asymptotic region Equa-tion 24 is based on having the data at a single level as oc-curs asymptotically in adaptive procedures In addition thedefinition of threshold must be at the point being testedIf levels are tested away from threshold there will be anadditional contribution to the threshold variance (Finney1971 McKee et al 1985) if the slope is a variable param-eter in the fitting procedure Equation 24 reaches its as-ymptotic value more rapidly for adaptive methods with fo-cused placement of trials than with the constant stimulusmethod when slope is a free parameter

Equation 24 needs analytic expressions for the deriva-tive and the variance of dcent The derivative is obtained fromEquation 23

(25)

The variance of dcent for both yesno (dcent 5 zhit 1 zcorrect rejection)and 2AFC (dcent 5 Iuml2zave) is

(26)var var( ) exp var( ) cent( ) = = ( )d z z P2 4 2p

dddy

bdt

cent = cent

var( )var( )

Yd

dddyt

= cent

centaeligegraveccedil

oumloslashdivide

2

cent = = ( )d c bytb

texp

1434 KLEIN

For the yes no case Equation 26 is based on a unityslope ROC curve with the subjectrsquos criterion at the neg-ative diagonal of the ROC curve where the hit rate equalsthe correct rejection rate This would be the most effi-cient operating point The variance for a nonunity slopeROC curve was considered by Klein Stromeyer andGanz (1974) From binomial statistics

(27)

with n 5 N for 2AFC and n 5 N2 for yesno since forthis binary yesno task the number of signal and blanktrials is half the total number of trials

Putting all of these factors together gives

(28)

for both the yesno and the 2AFC cases The only dif-ference between the two cases is the definition of n andthe definition of dcent (dcent2AFC 5 205z and dcentYN 5 2z for asymmetric criterion) When this is expressed in terms ofN (number of trials) and z (z score of probability cor-rect) we get

(29)

for both 2AFC and yesno That is when the percent cor-rects are all equal (P2AFC 5 Phit 5 Pcorrect rejection) thethreshold variance for 2AFC and yesno are equal Theminimum value of var(Y ) is 164(Nb2) occurring at z 5157 (P 5 94) This value of 94 correct is the optimumvalue at which to test for both 2AFC and yesno tasks in-dependent of b

This result that the threshold variance is the same for2AFC and yesno methods seems to disagree with Figure 4of McKee et al (1985) where the standard errors of thethreshold estimates are plotted for 2AFC and yesno asa function of the number of trials The 2AFC standarderror (their Figure 4) is twice that of yesno correspond-ing to a factor of four in variance However the McKeeet al analysis is misleading since all they (we) did for theyesno task was to use Equation 2 to expand the PF to a 0to 100 range thereby doubling the PF slope Thatrescaling is not the way to convert a 2AFC detection taskto the yesno task of the same detectability A signal de-tection methodology is needed for that conversion aswas done in the present analysis

One surprise is that the optimal testing probability isindependent of the PF slope b This finding is a generalproperty of PFs in which the dependence on the slopeparameter has the form xb The 94 level corresponds todcentYN 5 315 and dcent2AFC 5 223 For the commonly usedP 5 75 (z 5 0765 dcentYN 5 135 dcent2AFC 5 0954)var(Y ) 5 408(Nb2) which is about 25 times the opti-mal value obtained by placing the stimuli at 94 Green(1990) had previously pointed out that the 94 and 92levels were the optimal points to test for the cumulative

normal and logit functions respectively because of theminimum threshold estimate variability when one is test-ing at that point on the PF Other PFs can have optimalpoints at slightly lower levels but these are still above thelevels to which adaptive methods are normally directed

It is also useful to compare the 2AFC and yesno at thesame dcent rather than at their optimal points In that case fordcent values fixed at [1 2 3] the ratio of the 2AFC varianceto the yesno variance is [182 136 079] At low dcent thereis an advantage for the 2AFC method and that reverses athigh dcent as is expected from the optimal dcent being higher foryesno than for 2AFC In general one should run the yesno method at dcent levels above 20 (84 correct in the blankand signal intervals corresponds to dcent 5 2)

The equality of the optimal var(Y ) for 2AFC and yesno methods is not only surprising it seems paradoxicalmdashas can be seen by the following gedankenexperiment Sup-pose subjects close their eyes on the first interval of the2AFC trial and base their judgments on just the secondinterval as if they were doing a yesno task since blanksand stimuli are randomly intermixed Instead of sayingldquoyesrdquo or ldquonordquo they would respond ldquoInterval 2rdquo or ldquoInter-val 1rdquo respectively so that the 2AFC task would become ayesno task The paradox is that one would expect the vari-ance to be worse than for the eyes open 2AFC case sinceone is ignoring information However Equation29 showsthat the 2AFC and yesno methods have identical optimalvariance I suspect that the resolution of this paradox is thatin the yesno method with a stable criterion the optimalvariance occurs at a higher dcent value (315) than the 2AFCvalue (223) The dcent power law function that I chose doesnot saturate at large dcent levels giving a benefit to the yesno method

In order to check whether the paradox still holds witha more realistic PF (a dcent function that saturates at largedcent ) I now compare the 2AFC and yesno variance usinga Weibull function with a slope b In order to connect2AFC and yesno I need to use signal detection theoryand the StromeyerndashFoley dcent function (Equation 20) withthe ldquoWeibull parametersrdquo b 5 106b a 5 0614 and w 51 39b Following Equation 22 I pointed out that withthese parameter values the difference between the 2AFCWeibull function and the 2AFC dcent function (converted toa probability ordinate) is less than 0004 After the de-rivative Equation 25 is appropriately modified we findthat the optimal 2AFC variance is 342(Nb2) For the yesno case the optimal variance is 402(Nb2) This result re-solves the paradox The optimal 2AFC variance is about18 lower than the yesno variance so closing onersquos eyesin one of the intervals does indeed hurt

We shouldnrsquot forget that if one counts stimulus presen-tations Np rather than trials N the optimal 2AFC variancebecomes 684(Npb2) as opposed to 402(Npb2) for yesnoThus for the Linschoten experiments with each stimuluspresentation being slow the yesno methodology is moreaccurate for a given number of stimulus presentationseven at low dcent values Kaernbach (1990) compares 2AFC

var( ) exp( )

( )Y z

P P

N bz= ( )2

122

p

var( ) exp( )

( )Y z

P P

n bd= ( ) cent

412

2p

var( )( )

PP P

n= 1

MEASURING THE PSYCHOMETRIC FUNCTION 1435

and yesno adaptive methods with simulations and ac-tual experiments His abscissa is number of presenta-tions and his results are compatible with our findings

The objective yesno method that I have been discussingis inefficient since 50 of the trials are blanks Greaterefficiency is possible if several points on the PF are mea-sured (method of constant stimuli) with the number of blanktrials the same as at the other levels For example if fivelevels of stimuli were measured including blanks then theblanks would be only 20 of the total rather than 50 Forthe 2AFC paradigm the subject would still have to look at50 of the presentations being blanks The yesno eff i-ciency can be further improved by having the subjectrespond to the multiple stimuli with multiple ratings ratherthan a binary yesno response This rating scale methodof constant stimuli is my favorite method and I have beenusing it for 20 years (Klein amp Stromeyer 1980) The mul-tiple ratings allow one to measure the psychometric func-tion shape for dcents as large as 4 (well into the dipper regionwhere stimuli are slightly above threshold) It also allowsone to measure the ROC slope (presence of multiplicativenoise) Simulations of its benefits will be presented inKlein (2002)

Insight into how the number of responses affects thresh-old variance can be obtained by using the cumulative nor-mal PF (Equation 17) considered by Kaernbach (2001b)and Miller and Ulrich (2001) with g 5 0 Typically this isthe case of yesno discrimination but it can also be 2AFCdiscrimination with an ordinate going from 0 to 100as I have discussed in connection with Figure 3 For a cu-mulative normal PF with a binary response and for a sin-gle stimulus level being tested the threshold variance is(Equation 19) as follows

(30)

from Equation 27 and forthcoming Equations 40 and 41The optimal level to test is at P 5 05 so the optimal vari-ance becomes

(31)

If instead of a binary response the observer gives an ana-log response (many many rating categories) then the vari-ance is the familiar connection between standard deviationand standard error

(32)

With four response categories the variance is 119s2Nwhich is closer to the analog than to the binary case Ifstimuli with multiple strengths are intermixed (methodof constant stimuli) then the benefit of going from a bi-nary response to multiple responses is greater than that in-dicated in Equations 31ndash32 (Klein 2002)

A brief list of other problems with the 2AFC method ap-plied to detection comprises the following

1 2AFC requires the subject to memorize and thencompare the results of two subjective responses It is cog-nitively simpler to report each response as it arrives Sub-jects without much experience can easily make mistakes(lapses) simply becoming confused about the order of thestimuli Klein (1985) discusses memory load issues rele-vant to 2AFC

2 The 2AFC methodology makes various types ofanalysis and modeling more difficult because it requiresthat one average over all criteria The response variable in2AFC is the Interval 2 activation minus the Interval 1 ac-tivation (see the abscissa in Figure 2) This variable goesfrom minus infinity to plus infinity whereas the yesnovariable goes from zero to infinity It therefore seems moredifficult to model the effects of probability summation anduncertainty for 2AFC than for yesno

3 Models that relate psychophysical performance to un-derlying processes require information about how the noisevaries with signal strength (multiplicative noise) The rat-ing scale method of constant stimuli not only measures dcentas a function of contrast it also is able to provide an esti-mate of how the variance of the signal distribution (theROC slope) increases with contrast The 2AFC methodlacks that information

4 Both the yesno and 2AFC methods are supposed toeliminate the effects of bias Earlier I pointed out that in myrecent 2AFC discrimination experiments when the stim-ulus is near threshold I find I have a strong bias in favor ofresponding ldquoStimulus 2rdquo Until I learned to compensatefor this bias (not typically done by naiumlve subjects) my dcentwas underestimated As I have discussed it is possible tocompensate for this bias but that is rarely done

Improving the Brownian (Up ndashDown) StaircaseAdaptive methods with a goal of placing trials at an

optimal point on the PF are efficient Leek (2001) pro-vides a nice overview of these methods There are twogeneral categories of adaptive methods those based onmaximum (or mean) likelihood and those based on sim-pler staircase rules The present section is concernedwith the older staircase methods which I will callBrownian methods I used the name ldquoBrownianrdquo becauseof the upndashdown staircasersquos similarity to one-dimensionalBrownian motion in which the direction of each step israndom The familiar upndashdown staircase is Brownianbecause at each trial the decision about whether to go upor down is decided by a random factor governed by prob-ability P(x) I want to make this connection to Brownianmotion because I suspect that there are results in thephysics literature that are relevant to our psychophysicalstaircases I hope these connections get made

A Brownian staircase has a minimal memory of the runhistory it needs to remember the previous step (includingdirection) and the previous number of reversals or numberof trials (used for when to stop and for when to decreasestep size) A typical Brownian rule is to decrease signalstrength by one step (like 1 dB) after a correct responseand to increase it three steps after an incorrect response (a

var( thresh) = s 2

N

var( thresh) = timesp s2

2

N

var(var( )

( )exp thresh) =aeligegraveccedil

oumloslashdivide

= ( )P

N dPdy

P P zN2

22

2 1p s

1436 KLEIN

one downndashthree up rule) I was pleased to learn recentlythat his rule works nicely for an objective yesno task(Kaernbach 1990) as well as 2AFC as discussed earlier

In recent years likelihood methods (Pentland 1980Watson amp Pelli 1983) have become popular and havesomewhat displaced Brownian methods There is a wide-spread feeling that likelihood methods have substantiallysmaller threshold variance than do staircases in whichthresholds are obtained by simply averaging the levelstested Given the importance of this issue for informingresearchers about how best to measure thresholds it issurprising that there have been so few simulations com-paring staircases and optimal methods Pentlandrsquos (1980)results showing very large threshold variance for a stair-case method are so dramatically different from the re-sults of my simulations that I am skeptical The best pre-vious simulations of which I am aware are those of Green(1990) who compared the threshold variance from sev-eral 2AFC staircase methods with the ideal variance(Equation 24) He found that as long as the staircase ruleplaced trials above 80 correct the ratio of observed toideal threshold variance was about 14 This value is thesquare of the inverse of Greenrsquos finding that the ratio ofthe expected to the observed sigma is about 85

In my own simulations (Klein 2002) I found that aslong as the final step size was less than 02s the thresh-old variance from simply averaging over levels was closeto the optimal value Thus the staircase threshold varianceis about the same as the variance found by other optimalmethods One especially interesting finding was that thecommon method of averaging an even number of rever-sal points gave a variance that was about 10 higher thanaveraging over all levels This result made me wonderhow the common practice of averaging reversals ever gotstarted Since Green (1990) averaged over just the reversalpoints I suspect that his results are an overestimate of thestaircase variance In addition since Green specifies stepsize in decibels it is difficult for me to determine whetherhis final step size was small enough Since the thresholdvariance is inversely related to slope it is important to re-late step size to slope rather than to decibels The most nat-ural reference unit is s the standard deviation associatedwith the cumulative normal In summary I suggest that thisgeneral feeling of distrust of the simple Brownian staircaseis misguided and that simple staircases are often as goodas likelihood methods In the section Summary of Advan-tages of Brownian Staircase I point out several advantagesof the simple staircase over the likelihood methods

A qualification should be made to the claim above thataveraging levels of a simple staircase is as good as a fancylikelihood method If the staircase method uses a too smallstep size at the beginning and if the starting point is farfrom threshold then the simple staircase can be inefficientin reaching the threshold region However at the end ofthat run the astute experimenter should notice the largediscrepancy between the starting point and the thresholdand the run should be redone with an improved startingpoint or with a larger initial step size that will be reducedas the staircase proceeds

Increase number of 2AFC response categories In-dependent of the type of adaptive method used the 2AFCtask has a flaw whereby a series of lucky guesses at lowstimulus levels can erroneously drive adaptive methods tolevels that are too low Some adaptive methods with a smallnumber of trials are unable to fully recover from a string oflucky guesses early in the run One solution is to allow thesubject to have more than two responses For reasons thatI have never understood people like to have only two re-sponse categories in a 2AFC task Just as I am reluctant totake part in 2AFC experiments I am even more reluctantto take part in two response experiments The reason issimple I always feel that I have within me more than onebit of information after looking at a stimulus I usually haveat least four or more subjective states (2 bits) for a yesnotask HN LN LY HY where H and L are high and lowconfidence and N and Y are did not and did see it For a2AFC my internal states are H1 L1 L2 H2 for high andlow confidence on Intervals 1 and 2 Kaernbach (2001a) inthis special issue does an excellent job of justifying thelow-confidence response category He provides solid ar-guments simulations and experimental data on how alow-confidence category can minimize the ldquolucky guessrdquoproblem

Kaernbach (2001a) groups the L1 and L2 categoriestogether into a ldquodonrsquot knowrdquo category and argues per-suasively that the inclusion of this third response can in-crease the precision and accuracy of threshold estimateswhile also leaving the subject a happier person He callsit the ldquounforced choicerdquo method Most of his article con-cerns the understandable fear of introducing a responsebias exactly what 2AFC was invented to avoid He has de-veloped an intelligent rule for what stimulus level shouldfollow a ldquodonrsquot knowrdquo response Kaernbach shows withboth simulations and real data that with his method thispotential response bias problem has only a second-ordereffect on threshold estimates (similar to the 2AFC inter-val bias discussed around Equation 7) and its advantagesoutweigh the disadvantages I urge anyone using the 2AFCmethod to read it I conjecture that Kaernbachrsquos method isespecially useful in trimming the lower tail of the distri-bution of threshold estimates and also in reducing the in-terval bias of 2AFC tasks since the low confidence ratingwould tend to be used for trials on which there is the mostuncertainty

Although Kaernbach has managed to convince me ofthe usefulness of his three-response method he might havetrouble convincing others because of the response biasquestion I should like to offer a solution that might quellthe response bias fears Rather than using 2AFC with threeresponses one can increase the number of responses tofourmdashH1 L1 L2 H2mdashas discussed two paragraphsabove The L rating can be used when subjects feel theyare guessing

To illustrate how this staircase would work considerthe case in which we would like the 2AFC staircase to con-verge to about 84 correct A 5 upndash1 down staircase(5 levels up for a wrong response and 1 level down for acorrect response) leads to an average probability correct

MEASURING THE PSYCHOMETRIC FUNCTION 1437

of 56 5 8333 If one had zero information about thetrial and guessed then one would be correct half the timeand the staircase would shift an average of (5 1)2 512 levels Thus in his scheme with a single ldquodonrsquot knowrdquoresponse Kaernbach would have the staircase move up2 levels following a ldquodonrsquot knowrdquo response One mightworry that continued use of the ldquodonrsquot knowrdquo responsewould continuously increase the level so that one wouldget an upward bias But Kaernbach points out that theldquodonrsquot knowrdquo response replaces incorrect responses to agreater extent than it replaces correct responses so thatthe 12 level shift is actually a more negative shift than the15 shift associated with a wrong response For my sug-gestion of four responses the step shifts would be ndash1 1113 and 15 levels for responses HC LC LI and HIwhere H and L stand for high and low confidence and Cand I stand for correct and incorrect A slightly more con-servative set of responses would be ndash1 0 14 15 inwhich case the low-confidence correct judgment leaves thelevel unchanged In all these examples including Kaern-bachrsquos three-response method the low-confidence re-sponse has an average level shift of 12 when one is guess-ing The most important feature of the low confidencerating is that when one is below threshold giving a lowconfidence rating will prevent one from long strings oflucky guesses that bring one to an inappropriately lowlevel This is the situation that can be difficult to overcomeif it occurs in the early phase of a staircase when the stepsize can be large The use of the confidence rating wouldbe especially important for short runs where the extra bitof information from the subject is useful for minimizingerroneous estimates of threshold

Another reason not to be fearful of the multiple-response2AFC method is that after the run is finished one can es-timate threshold by using a maximum likelihood proce-dure rather than by averaging the run locations Standardlikelihood analysis would have to be modified for Kaern-bachrsquos three-response method in order to deal with theldquodonrsquot knowrdquo category For the suggested four-responsemethod one could simply ignore the confidence ratings forthe likelihood analysis My simulations discussed earlierin this section and Greenrsquos (1990) reveal that averagingover the levels (excluding a number of initial levels toavoid the starting point bias) has about as good a preci-sion and accuracy as the best of the likelihood methodsIf the threshold estimates from the likelihood method andfrom the averaging levels method disagree that run couldbe thrown out

Summary of advantages of Brownian staircaseSince there is no substantial variance advantage to thefancier likelihood methods one can pay more attention toa number of advantages associated with simple staircases

1 At the beginning of a run likelihood methods mayhave too little inertia (the sequential stimulus levels jumparound easily) unless a ldquostiff rdquo prior is carefully chosenBy inertia I refer to how difficult it is for the next level todeviate from the present level At the beginning of a runone would like to familiarize the subject with the stimu-

lus by presenting a series of stimuli that gradually increasein difficulty This natural feature of the Brownian stair-case is usually missing in QUEST-like methods Theslightly inefficient first few trials of a simple staircasemay actually be a benefit since it gives the subject somepractice with slightly visible stimuli

2 Toward the end of a maximum likelihood run one canhave the opposite problem of excessive inertia wherebyit is difficult to have substantial shifts of test contrastThis can be a problem if nonstationary effects are presentFor example thresholds can increase in mid-run owingto fatigue or to adaptation if a suprathreshold pedestal ormask is present In an old-fashioned Brownian staircasewith moderate inertia a quick look at the staircase trackshows the change of threshold in mid-run That observa-tion can provide an important clue to possible problemswith the current procedures A method that reveals non-stationarity is especially important for difficult subjectssuch as infants or patients in whom fatigue or inatten-tion can be an important factor

3 Several researchers have told me that the simplicity ofthe staircase rules is itself an advantage It makes writingprograms easier It is nicer for teaching students It alsosimplifies the uncertain business of guessing likelihoodpriors and communicating the methodology when writ-ing articles

Methods That Can Be Used to Estimate Slope as Well as Threshold

There are good reasons why one might want to learnmore about the PF than just the single threshold valueEstimating the PF slope is the first step in this directionTo estimate slope one must test the PF at more than onepoint Methods are available to do this (Kaernbach 2001bKontsevich amp Tyler 1999) and one could even adapt sim-ple Brownian staircase procedures to give better slope es-timates with minimum harm to threshold estimates (seethe comments in Strasburger 2001b)

Probably the best method for getting both thresholds andslopes is the Y (Psi) method of Kontsevich and Tyler(1999) The Y method is not totally novel it borrows tech-niques from many previous researchersmdashas Kontsevichand Tyler point out in a delightful paragraph that both clar-ifies their method and states its ancestry This paragraphsummarizes what is probably the most accurate and preciseapproach to estimating thresholds and slope so I reproduceit here

To summarize the Y method is a combination of solutionsknown from the literature The method updates the poste-rior probability distribution across the sampled space ofthe psychometric functions based on Bayesrsquo rule (Hall1968 Watson amp Pelli 1983) The space of the psychome-tric functions is two-dimensional (King-Smith amp Rose1997 Watson amp Pelli 1983) Evaluation of the psycho-metric function is based on computing the mean of theposterior probability distribution (Emerson 1986 King-Smith et al 1994) The termination rule is based on thenumber of trials as the most practical option (King-Smith

1438 KLEIN

et al 1994 Watson amp Pelli 1983) The placement of eachnew trial is based on one-step ahead minimum search(King-Smith 1984) of the expected entropy cost function(Pelli 1987)

A simplified rough description of the Y trial placementfor 2AFC runs is that initially trials are placed at about the90 correct level to establish a solid threshold Oncethreshold has been roughly established trials are placed atabout the 90 and the 70 levels to pin down the slopeThe precise levels depend on the assumed PF shape

Before one starts measuring slopes however one mustfirst be convinced that knowledge of slope can be impor-tant It is so for several reasons

1 Parametric fitting procedures either require knowl-edge of slope b or must have data that adequately con-strain the slope estimate Part of the discussion that fol-lows will be on how to obtain reliable threshold estimateswhen slope is not well known A tight knowledge of slopecan minimize the error of the threshold estimate

2 Strasburger (2001a 2001b) gives good argumentsfor the desirability of knowing the shape of the PF He isinterested in differences between foveal and peripheralvisual processing and he uses the PF slope as an indica-tion of differences

3 Slopes are different for different stimuli and thiscan lead to misleading results if slope is ignored It mayhelp to describe my experience with the Modelfest pro-ject (Carney et al 2000) to make this issue concreteThis continuing project involves a dozen laboratoriesaround the world We have measured detection thresh-olds for a wide variety of visual stimuli The data areused for developing an improved model of spatial visionSome of the stimuli were simple easily memorized lowspatial frequency patterns that tend to have shallowerslopes because a stimulus-known-exactly template canbe used efficiently and with minimal uncertainty On theother hand tiny Modelfest stimuli can have a good dealof spatial uncertainty and are therefore expected to pro-duce PFs with steeper slopes Because of the slope changethe pattern of thresholds will depend on the level atwhich threshold is measured When this rich dataset isanalyzed with f ilter models that make estimates formechanism bandwidths and mechanism pooling thesedifferent slope-dependent threshold patterns will lead todifferent estimates for the model parameters Several ofus in the Modelfest data gathering group argued that weshould use a data-gathering methodology that wouldallow estimation of PF slope of each of the 45 stimuliHowever I must admit that when I was a subject my pa-tience wore thin and I too chose the quicker runs Al-though I fully understand the attraction of short runs fo-cused on thresholds and ignoring slope I still think thereare good reasons for spreading trials out For one thinginterspersing trials that are slightly visible helps the sub-ject maintain memory of the test pattern

4 Slopes can directly resolve differences between mod-els In my own research for example my colleagues and

I are able to use the PF slope to distinguish long-range fa-cilitation that produces a low-level excitation or noise re-duction (typical with a cross-orientation surround) from ahigher level uncertainty reduction (typical with a same-orientation surround)

Throwing-Out Rules Goodness-of-FitThere are occasions when a staircase goes sour Most

often this happens when the subjectrsquos sensitivity changesin the middle of a run possibly because of fatigue It istherefore good to have a well-specified rule that allowsnonstationary runs to be thrown out The throwing-out rulecould be based on whether the PF slope estimate is toolow or too high Alternatively a standard approach is tolook at the chi-square goodness-of-f it statistic whichwould involve an assumption about the shape of the un-derlying PF I have a commentary in Section III regard-ing biases in the chi-square metric that were found byWichmann and Hill (2001a) Biases make it difficult touse standard chi-square tables The trouble with this ap-proach of basing the goodness-of-fit just on the cumulateddata is that it ignores the sequential history of the run TheBrownian upndashdown staircase makes the sequential historyvividly visible with a plot of sequentially tested levelsWhen fatigue or adaptation sets in one sees the tested lev-els shift from one stable point to another Leek Hanna andMarshall (1991) developed a method of assessing the non-stationarity of a psychometric function over the course ofits measurement using two interleaved tracks EarlierHall (1983) also suggested a technique to address the sameproblem I encourage further research into the develop-ment of a non-stationarity statistic that can be used to dis-card bad runs This statistic would take the sequential his-tory into account as well as the PF shape and the chi-square

The development of a ldquothrowing-outrdquo rule is the ex-treme case of the notion of weighted averaging One im-portant reason for developing good estimators for good-ness-of-fit and good estimators for threshold variance isto be able to optimally combine thresholds from repeatedexperiments I shall return to this issue in connectionwith the Wichmann and Hill (2001a) analysis of bias ingoodness-of-fit metrics

Final Thought The Method Should Depend on the Situation

Also important is that the method chosen should be ap-propriate for the task For example suppose one is usingwell-experienced observers for testing and retesting sim-ilar conditions in which one is looking for small changesin threshold or one is interested in the PF shape In suchcases one has a good estimate of the expected thresholdand the signal detection method of constant stimuli withblanks and with ratings might be best On the other handfor clinical testing in which one is highly uncertain aboutan individualrsquos threshold and criterion stability a 2AFClikelihood or staircase method (with a confidence rating)might be best

MEASURING THE PSYCHOMETRIC FUNCTION 1439

III DATA-FITTING METHODS FOR ESTIMATING THRESHOLD AND SLOPEPLUS GOODNESS-OF-FIT ESTIMATION

This final section of the paper is concerned with whatto do with the data after they have been collected Wehave here the standard Bayesian situation Given the PFit is easy to generate samples of data with confidence in-tervals But we have the inverse situation in which we aregiven the data and need to estimate properties of the PFand get confidence limits for those estimates The pair ofarticles in this special issue by Wichmann and Hill (2001a2001b) as well as the articles by King-Smith et al (1994)Treutwein (1995) and Treutwein and Strasburger (1999)do an outstanding job of introducing the issues and providemany important insights For example Emerson (1986)and King-Smith et al emphasize the advantages of meanlikelihood over maximum likelihood The speed of mod-ern computers makes it just as easy to calculate meanlikelihood as it is to calculate maximum likelihood As an-other example Treutwein and Strasburger (1999) demon-strate the power of using Bayesian priors for estimatingparameters by showing how one can estimate all four pa-rameters of Equation 1A in fewer than 100 trials This im-portant finding deserves further examination

In this section I will focus on four items First I will dis-cuss the relative merits of parametric versus nonparamet-ric methods for estimating the PF properties The paperby Miller and Ulrich (2001) provides a detailed analysis ofa nonparametric method for obtaining all the moments ofthe PF I will discuss some advantages and limitations oftheir method Second is the question of goodness-of-fitWichmann and Hill (2001a) show that when the numberof trials is small the goodness-of-fit can be very biasedI will examine the cause of that bias Third is the issue dis-cussed by Wichmann and Hill (2001a) that of how lapseseffect slope estimates This topic is relevant to Strasburg-errsquos (2001b) data and analysis Finally there is the possiblycontroversial topic of an upward slope bias in adaptivemethods that is relevant to the papers of Kaernbach (2001b)and Strasburger (2001b)

Parametric Versus Nonparametric FitsThe main split between methods for estimating pa-

rameters of the psychometric function is the distinctionbetween parametric and nonparametric methods In aparametric method one uses a PF that involves severalparameters and uses the data to estimate the parametersFor example one might know the shape of the PF and es-timate just the threshold parameter McKee et al (1985)show how slope uncertainty affects threshold uncertaintyin probit analysis The two papers by Wichmann and Hill(2001a 2001b) are an excellent introduction to many is-sues in parametric fitting An example of a nonparamet-ric method for estimating threshold would be to averagethe levels of a Brownian staircase after excluding the firstfour or so reversals in order to avoid the starting level biasBefore I served as guest editor of this special symposium

issue I believed that if one had prior knowledge of the exactPF shape a parametric method for estimating threshold wasprobably better than nonparametric methods After read-ing the articles presented here as well as carrying out myrecent simulations I no longer believe this I think thatin the proper situation nonparametric methods are as goodas parametric methods Furthermore there are situationsin which the shape of the PF is unknown and nonparamet-ric methods can be better

Miller and Ulrichrsquos Article on the Nonparametric SpearmanndashKaumlrber Method of Analysis

The standard way to extract information from psycho-metric data is to assume a functional shape such as a Wei-bull cumulative normal logit dcent and then vary the pa-rameters until likelihood is maximized or chi-square isminimized Miller and Ulrich (2001) propose using theSpearmanndashKaumlrber (SK) nonparametric method whichdoes not require a functional shape They start with a prob-ability density function defined by them as the derivativeof the PF

(33)

where s(i) is the stimulus intensity at the ith level andP(i) is the probability correct at the that level The notationpdf(i) is meant to indicate that it is the pdf for the intervalbetween i ndash 1 and i Miller and Ulrich give the r th momentof this distribution as their Equation 3

(34)

The next four equations are my attempt to clarify their ap-proach I do this because when applicable it is a finemethod that deserves more attention Equation 34 prob-ably came from the mean value theorem of calculus

(35)

where f (s) 5 sr11 and f cent(smid) 5 (r 1 1)smidr is the deriv-

ative of f with respect to s somewhere within the intervalbetween s(i 1) and s(i) For a slowly varying functionsmid would be near the midpoint Given Equation35 Equa-tion 34 becomes

(36)

where Ds 5 s(i) s(i 1) Equation 36 is what one wouldmean by the rth moment of the distribution The case r 50 is important since it gives the area under the pdf

(37)

by using the definition in Equation 33 The summation inEquation 37 gives the area under the pdf For it to be aproper pdf its area must be unity which means that theprobability at the highest level P(imax) must be unity andthe probability at the lowest level Pmin must be zero asMiller and Ulrich (2001) point out

m0 = =aringpdf( ) ( ) ( ) max mini s P i P ii

D

mrr

i

i s s= aring pdf mid( ) D

f s i f s i f s s i s i( ) ( ) ( ) ( ) ( ) [ ] [ ] = cent [ ]1 1mid

mrr r

ri s i s i=

+ [ ]aring + +11

11 1pdf( ) ( ) ( )

pdf( )( ) ( )

( ) ( )i

P i P i

s i s i= 1

1

1440 KLEIN

The next simplest SK moment is for r 5 1

(38)

from Equation 33 This first moment provides an esti-mate of the centroid of the distribution corresponding tothe center of the psychometric function It could be usedas an estimate of threshold for detection or PSE for dis-crimination An alternate estimator would be the medianof the pdf corresponding to the 50 point of the PF (theproper definition of PSE for discrimination)

Miller and Ulrich (2001) use Monte Carlo simulationsto claim that the SK method does a better job of extract-ing properties of the psychometric function than do para-metric methods One must be a bit careful here since intheir parametric fitting they generate the data by usingone PF and then fit it with other PFs But still it is a sur-prising claim if this simple method is so good why havepeople not been using it for years There are two possibleanswers to this question One is that people never thoughtof using it before Miller and Ulrich The other possibil-ity is that it has problems I have concluded that the an-swer is a combination of both factors I think that there isan important place for the SK approach but its limitationsmust be understood That is my next topic

The most important limitation of the SK nonparamet-ric method is that it has only been shown to work well formethod of constant stimulus data in which the PF goesfrom 0 to 1 I question whether it can be applied with thesame confidence to 2AFC and to adaptive procedures withunequal numbers of trials at each level The reason for thislimitation is that the SK method does not offer a simpleway to weight the different samples Let us consider twosituations with unequal weighting adaptive methods and2AFC detection tasks

Adaptive methods place trials near threshold but withvery uneven distribution of number of trials at differentlevels The SK method gives equal weighting to sparselypopulated levels that can be highly variable thus con-tributing to a greater variance of parameter estimates Letus illustrate this point for an extreme case with five levelss 5 [1 2 3 4 5] Suppose the adaptive method placed [11 1 98 1] trials at the five levels and the number correctwere [0 1 1 49 1] giving probabilities [0 1 1 5 1] andpdf 5 [1 0 5 5] The following PEST-like (Taylor ampCreelman 1967) rules could lead to this unusual dataMove down a level if the presently tested level has greaterthan P1 correct move up three levels if the presentlytested level has less than P2 correct P1 can be anynumber less than 33 and P2 can be any number greaterthan 67 The example data would be produced by start-ing at Level 5 and getting four in a row correct followedby an error followed by a wide set possible of alternat-ing responses for which the PEST rules would leave thefourth level as the level to be tested The SK estimate of the

center of the PF is given by Equation 38 to be micro1 5 200Since a pdf with a negative probability is distasteful tosome one can monotonize the PF according to the proce-dure of Miller and Ulrich (which does include the numberof trials at each level) The new probabilities are [0 5151 51 1] with pdf 5 [51 0 0 49] The new estimate ofthreshold is micro1 5 297 However given the data with 4998correct at Level 4 an estimate that includes a weightingaccording to the number of trials at each level would placethe centroid of the distribution very close to micro1 5 4 Thuswe see that because the SK procedure ignores the numberof trials at each level it produces erroneous estimates ofthe PF properties This example with shifting thresholdswas chosen to dramatize the problems of unequal weight-ing and the effect of monotonizing

A similar problem can occur even with the method ofconstant stimuli with equal number of trials at each levelbut with unequal binomial weighting as occurs in 2AFCdata The unequal weighting occurs because of the 50lower asymptote The binomial statistics error bar of dataat 55 correct is much larger than for data at 95 cor-rect Parametric procedures such as probit analysis giveless weighting to the lower data This is why the most ef-ficient placement of trials in 2AFC is above 90 correctas I have discussed in Section II Thus it is expected thatfor 2AFC the SK method would give threshold estimatesless precise than the estimates given by a parametricmethod that includes weighting

I originally thought that the SK method would fail on alltypes of 2AFC data However in the discussion followingEquation 7 I pointed out that for 2AFC discriminationtasks there is a way of extending the data to the 0ndash100range that produces weightings similar to those for a yesnotask Rather than use the standard Equation 2 to deal withthe 50 lower asymptote one should use the method dis-cussed in connection with Figure 3 in which one can usefor the ordinate the percent of correct responses in Inter-val 2 and use the signal strength in Interval 2 minus thatin Interval 1 for the abscissa That procedure produces aPF that goes from 0 to 100 thereby making it suit-able for SK analysis while also removing a possible bias

Now let us examine the SK method as applied to datafor which it is expected to work such as discriminationPFs that go from 0 to 100 These are tasks in which thelocation of the psychometric function is the PSE and theslope is the threshold

A potentially important claim by Miller and Ulrich(2001) is that the SK method performs better than probitanalysis This is a surprising claim since their data aregenerated by a probit PF one would expect probit analy-sis to do well especially since probit analysis does espe-cially well with discrimination PFs that go from 0 to 100To help clarify this issue Miller and I did a number ofsimulations We focused on the case of N 5 40 with 8 tri-als at each of 5 levels The cumulative normal PF (Equa-tion 14) was used with equally spaced z scores and with thelowest and highest probabilities being 1 and 99 Theprobabilities at the sample points are P(i) 5 1 1222

m1

2 21

2

11

2

=

= [ ] +

aring

aring

pdf( )( ) ( )

( ) ( )( ) ( )

is i s i

P i P is i s i

MEASURING THE PSYCHOMETRIC FUNCTION 1441

50 8778 and 99 corresponding to z scores ofz(i) 5 ndash2328 1164 0 1164 and 2328 Nonmonot-onicity is eliminated as Miller and Ulrich suggest (seethe Appendix for Matlab Code 4 for monotonizing) I didMonte Carlo simulations to obtain the standard deviationof the location of the PF micro1 given by Equation38 I foundthat both the SK method and the parametric probit method(assuming a fixed unity slope) gave a standard deviationof between 028 and 029 Miller and Ulrichrsquos result re-ported in their Table 17 gives a standard error of 0071However their Gaussian had a standard deviation of 025whereas my z-score units correspond to a standard devi-ation of unity Thus their standard error must be multipliedby 4 giving a standard error of 0284 in excellent agree-ment with my value So at least in this instance the SKmethod did not do better than the parametric method (withslope fixed) and the surprise mentioned at the beginningof this paragraph did not materialize I did not examineother examples where nonparametric methods might dobetter than the matched parametric method However thesimulations of McKee et al (1985) give some insight intowhy probit analysis with floating slope can do poorly evenon data generated from probit PFs McKee et al showthat when the number of trials is low and if stimulus lev-els near both 0 and 100 are not tested the slope can bevery flat and the predicted threshold can be quite distantIf care is not taken to place a bound on these probit esti-mates one would expect to find deviant threshold esti-mates The SK method on the other hand has built-inbounds for threshold estimates so outliers are avoidedThese bounds could explain why SK can do better thanprobit When the PF slope is uncertain nonparametricmethods might work better than parametric methods adecided advantage for the SK method

It is instructive to also do an analytic calculation of thevariance based on an analysis similar to what was doneearlier If only the ith level of the five had been testedthe variance of the PF location (the PSE in a discrimina-tion task) would have been as follows (Gourevitch ampGalanter 1967)

(39)

where var(P) is given by Equation 27 and PF slope is

(40)

where for the cumulative normal dzi dxi 5 1s where sis the standard deviation of the pdf and

(41)

from Equation 3A By combining these equations one gets

(42)

Note that if all trials were placed at the optimal stimuluslocation of z 5 0 (P 5 5) Equation 42 would becomes2p 2Ni as discussed earlier When all five levels arecombined the total variance is smaller than each individ-ual variance as given by the variance summation formula

(43)

Finally upon plugging in the five values of zi rangingfrom ndash2328 to 12328 and Pi ranging from 1 to 99the standard error of the PF location is

(44)

This value is precisely the value obtained in our MonteCarlo simulations of the nonparametric SK method andalso by the slope fixed parametric probit analysis Thistriple confirmation shows that the SK method is excel-lent in the realm where it applies (equal number of trialsat levels that span 0 to 100) It also confirms our sim-ple analytic formula

The data presented in Miller and Ulrichrsquos Table 17 giveus a fine opportunity to ask about the efficiency of themethod of constant stimuli that they use For the N 5 40example discussed here the variance is

(45)

This value is more than twice the variance of p (2 Ntot)that is obtained for the same PF using adaptive methodsincluding the simple upndashdown staircase that place trialsnear the optimal point

The large variance for the Miller and Ulrich stimulusplacement is not surprising given that of the five stimu-lus levels the two at P 5 1 and 99 are quite inefficientThe inefficient placement of trials is a common problemfor the method of constant stimuli as opposed to adaptivemethods However as I mentioned earlier the method ofconstant stimuli combined with multiple response ratingsin a signal detection framework may be the best of allmethods for detection studies for revealing properties of thesystem near threshold while maintaining a high efficiency

A curious question comes up about the optimal placementof trials for measuring the location of the psychometric func-tion With the original function the optimal placement is atthe steepest part of the function (at P 5 0) However withthe SK method when one is determining the location ofthe pdf one might think that the optimal placement of sam-ples should occur at the points where the pdf is steepestThis contradiction is resolved when one realizes that onlythe original samples are independent The pdf samples areobtained by taking differences of adjacent PF values soneighboring samples are anticorrelated Thus one cannot

var tottot

= =SEN

2 3 24

SE = =var tot12 0 2844

var var tot = aring1 1i

var( )exp

PSE i i i

i

i

P Pz

N= ( ) ( )

2 12

2

ps

dP

dz

zi

i

iaeligegraveccedil

oumloslashdivide

=( )2 2

2

exp

p

dP

dx

dP

dz

dz

dxi

i

i

i

i

i

= times

var( )var

PSE i

i

i

i

P

dP

dx

=( )

aeligegraveccedil

oumloslashdivide

2

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

1430 KLEIN

According to the more successful signal detection theory(Green amp Swets 1966) the dcent at the midpoint of the PFchanges according to the number of alternatives in aforced choice method and according to the false alarm ratein a yesno method This variability of dcent with method is agood reason not to define threshold as the halfway pointof the PF

A definition of threshold that is relatively independent ofthe method used for its measurement is to define thresholdas the stimulus strength that gives a fixed value of dcent Thestimulus strength that gives dcent 5 1 (76 correct for 2AFC)is a common definition of threshold Although I will showthat higher dcent levels have the advantage of giving more pre-cise threshold estimates unless otherwise stated I will takethreshold to be at dcent 5 1 for simplicity This definition ap-plies to both yesno and m-AFC tasks and to both detectionand discrimination tasks

As an example consider the case shown in Figure 1where the lower asymptote (false alarm rate) in a yesnodetection task is g 5 1587 corresponding to a z score ofz 5 ndash1 If threshold is defined to be at dcent 5 10 then fromEquation 5 the z score for threshold is z 5 0 correspond-ing to a hit rate of 50 (not quite halfway up the PF) Thisexample with a 50 hit rate corresponds to defining dcentalong the horizontal ROC axis If threshold had been de-fined to be dcent 5 2 in Figure 1 then the probability correctat threshold would be 8413 This example in which boththe hit rate and correct rejection rate are equal (both are8413) corresponds to the ROC negative diagonal

Strasburgerrsquos Suggestion on Specifying Slope A Logarithmic Abscissa

In many of the articles in this special issue a logarithmicabscissa such as decibels is used Many shapes of PFs havebeen used with a log abscissa Most delightful but frus-trating is Strasburgerrsquos (2001a) paper on the PF maximumslope He compares the Weibull logistic Quick cumula-tive normal hyperbolic tangent and signal detection dcentusing a logarithmic abscissa The present section is the out-come of my struggles with a number of issues raised byStrasburger (2001a) and my attempt to clarify them

I will typically express stimulus strength x in thresh-old units

(8)

where a is the threshold Stimulus strength will be ex-pressed in natural logarithmic units y as well as in lin-ear units x

(9)

where y 5 loge(x) is the natural log of the stimulus and Y5 loge(a) is the threshold on the log abscissa

The slope of the psychometric function P( yt) with a log-arithmic abscissa is

and the maximum slope is called bcent by Strasburger (2001a)A very different definition of slope is sometimes used bypsychophysicists the logndashlog slope of the dcent function aslope that is constant at low stimulus strengths for manyPFs The logndashlog dcent slope of the Weibull function is shownin the bottom pair of panels in Figure 1 The low-contrastlogndashlog slope is b for a Weibull PF (Equation 12) and bfor a dcent PF (Equation 20) Strasburger (2001a) shows howthe maximum slope using a probability ordinate is con-nected to the logndashlog slope using a dcent ordinate

A frustrating aspect of Strasburgerrsquos article is that theslope units of P (probability correct per loge) are not fa-miliar Then it dawned on me that there is a simple con-nection between slope with a loge abscissa slopelog 5[dP( yt)](dyt) and slope with a linear abscissa slopelin 5[dP(xt)](dxt) namely

(11)

because dyt dxt 5 [d loge(xt)] (dxt) 5 1xt At thresholdxt 5 1 ( yt 5 0) so at that point Strasburgerrsquos slope with alogarithmic axis is identical to my familiar slope plotted inthreshold units on a linear axis The simple connection be-tween slope on the log and linear abscissas converted me tobeing a strong supporter of using a natural log abscissa

The Weibull and cumulative normal psychometricfunctions To provide a background for Strasburgerrsquos ar-ticle I will discuss three PFs Weibull cumulative nor-mal and dcent as well as their close connections A com-mon parameterization of the PF is given by the Weibullfunction

(12)

where pweib(xt) the PF that goes from 0 to 100 is re-lated to Pweib(xt) the probability of a correct responseby Equation 1 b is the slope and k controls the definitionof threshold pweib(1) 5 1 k is the percent correct atthreshold (xt 5 1) One reason for the Weibullrsquos popular-ity is that it does a good job of fitting actual data In termsof logarithmic units the Weibull function (Equation 12) be-comes

(13)

Panel a of Figure 1 is a plot of Equation 12 (Weibull func-tion as a function of x on a linear abscissa) for the case b 52 and k 5 exp( 1) 5 0368 The choice b 5 2 makes theWeibull an upside-down Gaussian Panel d is the samefunction this time plotted as a function of y correspondingto a natural log abscissa With this choice of k the point ofmaximum slope as a function of yt is the threshold point( yt 5 0) The point of maximum slope at yt 5 0 is markedwith an asterisk in panel d The same point in panel a atxt 5 1 is not the point of maximum slope on a linear ab-scissa because of Equation 11 When plotted as a dcent func-tion [a z-score transform of P(xt)] in panel b the Weibullaccelerates below threshold and decelerates above thresh-

p y ktyt

weib( ) = ( )[ ]1exp

b

p x ktxt

weib ( ) = 1b

slopeslope

lin =( )

=( )

times =dP x

dx

dP y

dy

dy

dx xt

t

t

t

t

t t

log

slope y dPdy

dpdy

tt

t

( ) =

= ( )1 g

y x x y Yt t= ( ) = =log log e e a

x xt = a

(10A)

(10B)

MEASURING THE PSYCHOMETRIC FUNCTION 1431

old in agreement with a wide range of experimental dataThe acceleration and deceleration are most clearly seen bythe slope of dcent in the logndashlog coordinates of panel e Thelogndashlog dcent slope is plotted in panels c and f At xt 5 0 thelogndashlog slope is 2 corresponding to our choice of b 5 2The slope falls surprisingly rapidly as stimulus strength in-creases and the slope is near 1 at xt 5 2 If we had chosenb 5 4 for the Weibull function then the logndashlog dcent slopewould have gone from 4 at xt 5 0 to near 2 at xt 5 2Equation 20 will present a dcent function that captures thisbehavior

Another PF commonly used is based on the cumulativenormal (Equation 3)

(14A)

or

(14B)

where s is the standard error of the underlying Gaussianfunction (its connection to b will be clarified later) Notethat the cumulative normal PF in Equation 14 should beused only with a logarithmic abscissa because y goes to

yen needed for the 0 lower asymptote whereas theWeibull function can be used with both the linear (Equa-tion 12) and the log (Equation 13) abscissa

Two examples of Strasburgerrsquos maximum slope (hisEquations 7 and 17) are

(15)

for the Weibull function (Equation 13) and

(16)

for the cumulative normal (Equation 14) In Equation 15 Iset k in Equation 12 to be k 5 exp( 1) This choiceamounts to defining threshold so that the maximum slopeoccurs at threshold ( yt 5 0) Note that Equations 12ndash16are missing the (1 g) factor that are present in Strasburg-errsquos Equations 7 and 17 because we are dealing with prather than P Equations 15 and 16 provide a clear simpleand close connection between the slopes of the Weibull andcumulative normal functions so I am grateful to Stras-burger for that insight

An example might help in showing how Equation 16works Consider a 2AFC task (g 5 05) assuming a cumu-lative normal PF with s 5 10 According to Equation 16bcent 5 0399 At threshold (assumed for now to be the pointof maximum slope) P(xt 5 1) 5 75 At 10 abovethreshold P(xt 5 11) 75 1 01 bcent 5 07899 which isquite close to the exact value of 7896 This example showshow bcent defined with a natural log abscissa yt is relevantto a linear abscissa xt

Now that the logarithmic abscissa has been introducedthis is a good place to stop and point out that the log ab-scissa is fine for detection but not for discrimination since

the x 5 0 point is often in the middle of the range How-ever the cumulative normal PF is especially relevant to dis-crimination where the PF goes from 0 to 100 andwould be written as

(17A)

or

(17B)

where x 5 PSE is the point of subjective equality Theparameter a is the threshold since dcent(a) 5 zF(a)zF(0) 5 1 The threshold a is also s standard deviationsof the Gaussian probability density function (pdf ) Thereciprocal of a is the PF slope I tend to use the letter sfor a unitless standard deviation as occurs for the loga-rithmic variable y I use a for a standard deviation thathas the units of the stimulus x (like percent contrast) Thecomparison of Equation 14B for detection and Equation17B for discrimination is useful Although Equation 14is used in most of the articles in this special issue whichare concerned with detection tasks the techniques that I willbe discussing are also relevant to discrimination tasks forwhich Equation 17 is used

Threshold and the Weibull FunctionTo illustrate how a dcent 5 1 definition of threshold works

for a Weibull function let us start with a yesno task inwhich the false alarm rate is P(0) 5 1587 correspondingto a z score of zFA 5 1 as in Figure 1 The z score atthreshold (dcent 5 1) is zTh 5 zFA11 5 0 corresponding to aprobability of P(1) 5 50 From Equations 1 and 12 thek value for this definition of threshold is given by k 5[1 P(1)][1 P(0)] 5 05943 The Weibull function be-comes

(18A)

If I had defined dcent 5 2 to be threshold then zTh 5 zFA125 1 leading to k 5 (1 08413)(1 01587) 5 01886giving

(18B)

As another example suppose the false alarm rate in ayesno task is at 50 not uncommon in a signal detectionexperiment with blanks and test stimuli intermixed Thenthreshold at dcent 5 1 would occur at 8413 and the PFwould be

(18C)

with Pweib(0) 5 50 and Pweib(1) 5 8413 This casecorresponds to defining dcent on the ROC vertical intercept

For 2AFC the connection between dcent and z is z 5 dcent205Thus zTh 5 2 05 5 07071 corresponding to Pweib(1) 57602 leading to k 5 04795 in Equation 12 These con-nections will be clarified when an explicit form (Equation20)is given for the dcent function dcent(xt)

P xtxt

weib ( ) = 1 0 5 0 3173 b

P xtxt

weib ( ) = 1 1 0 1587 0 1886( ) b

P xtxt

weib ( ) = 1 1 0 1587 0 5943( ) b

z x xtF ( ) = PSE

a

p x P x xt tF F F( ) = ( ) = aelig

egraveoumloslash

PSEa

cent = =bs p s

12

0 399

cent = =b b bexp( ) 1 0 368

z yy y Y

tt( ) = =s s

p yy

tt

F F( ) =aeligegraveccedil

oumloslashdivides

1432 KLEIN

Complications With Strasburgerrsquos Advocacy of Maximum Slope

Here I will mention three complications implicated inStrasburgerrsquos suggestion of defining slope at the point ofmaximum slope on a logarithmic abscissa

1 As can be seen in Equation 11 the slopes on linearand logarithmic abscissas are related by a factor of 1xtBecause of this extra factor the maximum slope occursat a lower stimulus strength on linear as compared withlog axes This makes the notion of maximum slope lessfundamental However since we are usually interested inthe percent error of threshold estimates it turns out thata logarithmic abscissa is the most relevant axis support-ing Strasburgerrsquos log abscissa definition

2 The maximum slope is not necessarily at threshold(as Strasburger points out) For the Weibull functions de-fined in Equation 13 with k 5 exp( 1) and the cumula-tive normal function in Equation 14 the maximum slope(on a log axis) does occur at threshold However the twopoints are decoupled in the generalized Weibull functiondefined in Equation 12 For the Quick version of theWeibull function (Equation 13 with k 5 05 placingthreshold halfway up the PF) the threshold is below thepoint of maximum slope the derivative of P(xt) atthreshold is

which is slightly different from the maximum slope asgiven by Equation 15 Similarly when threshold is de-fined at dcent 5 1 the threshold is not at the point of max-imum slope People who fit psychometric functions wouldprobably prefer reporting slopes at the detection thresh-old rather than at the point of maximum slope

3 One of the most important considerations in selectinga point at which to measure threshold is the question ofhow to minimize the bias and standard error of the thresh-old estimate The goal of adaptive methods is to place tri-als at one level In order to avoid a threshold bias due to aimproper estimate of PF slope the test level should be atthe defined threshold The variance of the threshold esti-mate when the data are concentrated as a single level (aswith adaptive procedures) is given by Gourevitch and Gal-anter (1967) as

(19)

If the binomial error factor of P(1 P)N were not presentthe optimal placement of trials for measuring thresholdwould be at the point of maximum slope However thepresence of the P(1 P) factor shifts the optimal point toa higher level Wichmann and Hill (2001b) consider howtrial placement affects threshold variance for the methodof constant stimuli This topic will be considered in Sec-tion II

Connecting the Weibull and dcent FunctionsFigures 5ndash7 of Strasburger (2001a) compare the logndash

log dcent slope b with the Weibull PF slope b Strasburgerrsquosconnection of b 5 088b is problematic since the dcent ver-sion of the Weibull PF does not have a fixed logndashlog slopeAn improved dcent representation of the Weibull function isthe topic of the present section A useful parameterizationfor dcent which I shall refer to as the StromeyerndashFoley func-tion was introduced by Stromeyer and Klein (1974) andused extensively by Foley (1994)

(20)

where xt is the stimulus strength in threshold units as inEquation 8 The factors with a in the denominator are pres-ent so that dcent equals unity at threshold (xt 5 1 or x 5 a)At low xt dcent x t

ba The exponent b (the logndashlog slope ofdcent at very low contrasts) controls the amount of facilita-tion near threshold At high xt Equation 20 becomes dcentx t

1 w (1 a) The parameter w is the logndash log slope of thetest threshold versus pedestal contrast function (the tvc orldquodipperrdquo function) at strong pedestals One must be a bitcautious however because in yesno procedures the tvcslope can be decoupled from w if the signal detection ROCcurve has nonunity slope (Stromeyer amp Klein 1974) Theparameter a controls the point at which the dcent function be-gins to saturate Typical values of these unitless parametersare b 5 2 w 5 05 and a 5 06 (Yu Klein amp Levi 2001)The function in Equation 20 accelerates at low contrast(logndashlog slope of b) and decelerates at high contrasts (logndashlog slope of 1 w) in general agreement with a broadrange of data

For 2AFC z 5 dcentIuml2 so from Equation 3 the connec-tion between dcent and probability correct is

(21)

To establish the connection between the PFs specified inEquations12 and 20ndash21 one must first have the two agreeat threshold For 2AFC with the dcent 5 1 the threshold atP 5 7602 can be enforced by choosing k 5 04795 (seethe discussion following Equation 18C) so that Equa-tions 1 and 12 become

(22)

Modifying k as in Equation 22 leaves the Weibull shapeunchanged and shifts only the definition of threshold Ifb 5 106b w 5 1 039b and a 5 0614 then for allvalues of b the Weibull and dcent functions (Equations 21and 22 vs Equation 12) differ by less than 00004 for allstimulus strengths At very low stimulus strengths b 5b The value b 5 106b is a compromise for getting anoverall optimal fit

Strasburger (2001a) is concerned with the same issueIn his Table 1 he reports dcent logndashlog slopes of [8847

P xtxt( ) = 1 1 5 4795( )

b

P xd x

tt( ) = +

cent( )eacute

eumlecircecirc

5 5erf2

cent( ) =d xx

a a xt

tb

tb w+ 1 + 1( )

var( )( )

YP P

N

dP y

dy

t

t

=( )eacute

eumlecircecirc

12

cent =( )

= =b g b g bthresh

dP x

dx

t

t

e( )log ( )

( ) 12

2347 1

MEASURING THE PSYCHOMETRIC FUNCTION 1433

18379 3131 4421] for b 5 [1 2 35 5] If the dcent slopeis taken to be a measure of the dcent exponent b then theratio bb is [08847 09190 08946 08842] for the fourb values Our value of bb 5 106 differs substantiallyfrom 08 (Pelli 1985) and 088 (Strasburger 2001a) Pelli(1985) and Strasburger (2001a) used dcent functions with a 51 in Equation 20 With a 5 0614 our dcent function (Equa-tion 20) starts saturating near threshold in agreement withexperimental data The saturation lowers the effective valueof b I was very surprised to find that the StromeyerndashFoley dcent function did such a good job in matching theWeibull function across the whole range of b and x Fora long time I had thought a different fit would be neededfor each value of b

For the same Weibull function in a yesno method (falsealarm rate 5 50) the parameters b and w are identicalto the 2AFC case above Only the value of a shifts froma 5 0614 (2AFC) to a 5 054 (yesno) In order to makext 5 1 at dcent 5 1 k becomes 03173 (see Equation 18C)instead of 04795 (2AFC) As with 2AFC the differencebetween the Weibull and dcent function is less than 0004(lt04) for all stimulus strengths and all values of b andfor both a linear and a logarithmic abscissa These valuesmean that for both the yes no and the 2AFC methods thedcent function (Equation 20) corresponding to a Weibullfunction with b 5 2 has a low-contrast logndashlog slope ofb 5 212 and a high-contrast logndashlog slope of 1 w 5078 The high-contrast tvc (test contrast vs pedestal con-trast discrimination function sometimes called the ldquodip-perrdquo function) logndashlog slope would be w 5 022

I also carried out a similar analysis asking what cumu-lative normal s is best matched to the Weibull I founds 5 11720b However the match is poor with errors of4 (rather than lt004 in the preceding analysis) Thepoor fit of the two functions makes the fitting proceduresensitive to the weighting used in the fitting procedure Ifone uses a weighting factor of [P(1 P)N] where P is thePF one will find a quite different value for s than if oneignored the weighting or if one chose the weighting on thebasis of the data rather than the fitting function

II COMPARING EXPERIMENTAL TECHNIQUES FOR MEASURING

THE PSYCHOMETRIC FUNCTION

This section inspired by Linschoten et al (2001) is anexamination of various experimental methods for gather-ing data to estimate thresholds Linschoten et al comparethree 2AFC methods for measuring smell and taste thresh-olds a maximum-likelihood adaptive method an upndashdownstaircase method and an ascending method of limits Thespecial aspect of their experiments was that in measuringsmell and taste each 2AFC trial takes between 40 and60 sec (Lew Harvey personal communication) because ofthe long duration needed for the nostrils or mouth to re-cover their sensitivity The purpose of the Linschoten et alpaper is to examine which of the three methods is mostaccurate when the number of trials is low Linschoten et al

conclude that the 2AFC method is a good method for thistask My prior experience with forced choice tasks and lownumber of trials (Manny amp Klein 1985 McKee et al1985) made me wonder whether there might be better meth-ods to use in circumstances in which the number of trialsis limited This topic is considered in Section II

Compare 2AFC to YesNoThe commonly accepted framework for comparing

2AFC and yesno threshold estimates is signal detectiontheory Initially I will make the common assumption thatthe dcent function is a power function (Equation 20 witha 5 1)

(23)

Later in this section the experimentally more accurateform Equation 20 with a 1 will be used The varianceof the threshold estimate will now be calculated for 2AFCand yesno for the simplest case of trials placed at a sin-gle stimulus level y the goal of most adaptive methodsThe PF slope relates ordinate variance to abscissa vari-ance (from Gourevitch amp Galanter 1967)

(24)

where Y 5 y yt is the threshold estimate for a natural logabscissa as given in Equation 9 This formula for var(Y )is called the asymptotic formula in that it becomes exactin the limit of large number of total trials N Wichmannand Hill (2001b) show that for the method of constantstimuli a proper choice of testing levels brings one to theasymptotic region for N 120 (the smallest value thatthey explored) Improper choice of testing levels requiresa larger N in order to get to the asymptotic region Equa-tion 24 is based on having the data at a single level as oc-curs asymptotically in adaptive procedures In addition thedefinition of threshold must be at the point being testedIf levels are tested away from threshold there will be anadditional contribution to the threshold variance (Finney1971 McKee et al 1985) if the slope is a variable param-eter in the fitting procedure Equation 24 reaches its as-ymptotic value more rapidly for adaptive methods with fo-cused placement of trials than with the constant stimulusmethod when slope is a free parameter

Equation 24 needs analytic expressions for the deriva-tive and the variance of dcent The derivative is obtained fromEquation 23

(25)

The variance of dcent for both yesno (dcent 5 zhit 1 zcorrect rejection)and 2AFC (dcent 5 Iuml2zave) is

(26)var var( ) exp var( ) cent( ) = = ( )d z z P2 4 2p

dddy

bdt

cent = cent

var( )var( )

Yd

dddyt

= cent

centaeligegraveccedil

oumloslashdivide

2

cent = = ( )d c bytb

texp

1434 KLEIN

For the yes no case Equation 26 is based on a unityslope ROC curve with the subjectrsquos criterion at the neg-ative diagonal of the ROC curve where the hit rate equalsthe correct rejection rate This would be the most effi-cient operating point The variance for a nonunity slopeROC curve was considered by Klein Stromeyer andGanz (1974) From binomial statistics

(27)

with n 5 N for 2AFC and n 5 N2 for yesno since forthis binary yesno task the number of signal and blanktrials is half the total number of trials

Putting all of these factors together gives

(28)

for both the yesno and the 2AFC cases The only dif-ference between the two cases is the definition of n andthe definition of dcent (dcent2AFC 5 205z and dcentYN 5 2z for asymmetric criterion) When this is expressed in terms ofN (number of trials) and z (z score of probability cor-rect) we get

(29)

for both 2AFC and yesno That is when the percent cor-rects are all equal (P2AFC 5 Phit 5 Pcorrect rejection) thethreshold variance for 2AFC and yesno are equal Theminimum value of var(Y ) is 164(Nb2) occurring at z 5157 (P 5 94) This value of 94 correct is the optimumvalue at which to test for both 2AFC and yesno tasks in-dependent of b

This result that the threshold variance is the same for2AFC and yesno methods seems to disagree with Figure 4of McKee et al (1985) where the standard errors of thethreshold estimates are plotted for 2AFC and yesno asa function of the number of trials The 2AFC standarderror (their Figure 4) is twice that of yesno correspond-ing to a factor of four in variance However the McKeeet al analysis is misleading since all they (we) did for theyesno task was to use Equation 2 to expand the PF to a 0to 100 range thereby doubling the PF slope Thatrescaling is not the way to convert a 2AFC detection taskto the yesno task of the same detectability A signal de-tection methodology is needed for that conversion aswas done in the present analysis

One surprise is that the optimal testing probability isindependent of the PF slope b This finding is a generalproperty of PFs in which the dependence on the slopeparameter has the form xb The 94 level corresponds todcentYN 5 315 and dcent2AFC 5 223 For the commonly usedP 5 75 (z 5 0765 dcentYN 5 135 dcent2AFC 5 0954)var(Y ) 5 408(Nb2) which is about 25 times the opti-mal value obtained by placing the stimuli at 94 Green(1990) had previously pointed out that the 94 and 92levels were the optimal points to test for the cumulative

normal and logit functions respectively because of theminimum threshold estimate variability when one is test-ing at that point on the PF Other PFs can have optimalpoints at slightly lower levels but these are still above thelevels to which adaptive methods are normally directed

It is also useful to compare the 2AFC and yesno at thesame dcent rather than at their optimal points In that case fordcent values fixed at [1 2 3] the ratio of the 2AFC varianceto the yesno variance is [182 136 079] At low dcent thereis an advantage for the 2AFC method and that reverses athigh dcent as is expected from the optimal dcent being higher foryesno than for 2AFC In general one should run the yesno method at dcent levels above 20 (84 correct in the blankand signal intervals corresponds to dcent 5 2)

The equality of the optimal var(Y ) for 2AFC and yesno methods is not only surprising it seems paradoxicalmdashas can be seen by the following gedankenexperiment Sup-pose subjects close their eyes on the first interval of the2AFC trial and base their judgments on just the secondinterval as if they were doing a yesno task since blanksand stimuli are randomly intermixed Instead of sayingldquoyesrdquo or ldquonordquo they would respond ldquoInterval 2rdquo or ldquoInter-val 1rdquo respectively so that the 2AFC task would become ayesno task The paradox is that one would expect the vari-ance to be worse than for the eyes open 2AFC case sinceone is ignoring information However Equation29 showsthat the 2AFC and yesno methods have identical optimalvariance I suspect that the resolution of this paradox is thatin the yesno method with a stable criterion the optimalvariance occurs at a higher dcent value (315) than the 2AFCvalue (223) The dcent power law function that I chose doesnot saturate at large dcent levels giving a benefit to the yesno method

In order to check whether the paradox still holds witha more realistic PF (a dcent function that saturates at largedcent ) I now compare the 2AFC and yesno variance usinga Weibull function with a slope b In order to connect2AFC and yesno I need to use signal detection theoryand the StromeyerndashFoley dcent function (Equation 20) withthe ldquoWeibull parametersrdquo b 5 106b a 5 0614 and w 51 39b Following Equation 22 I pointed out that withthese parameter values the difference between the 2AFCWeibull function and the 2AFC dcent function (converted toa probability ordinate) is less than 0004 After the de-rivative Equation 25 is appropriately modified we findthat the optimal 2AFC variance is 342(Nb2) For the yesno case the optimal variance is 402(Nb2) This result re-solves the paradox The optimal 2AFC variance is about18 lower than the yesno variance so closing onersquos eyesin one of the intervals does indeed hurt

We shouldnrsquot forget that if one counts stimulus presen-tations Np rather than trials N the optimal 2AFC variancebecomes 684(Npb2) as opposed to 402(Npb2) for yesnoThus for the Linschoten experiments with each stimuluspresentation being slow the yesno methodology is moreaccurate for a given number of stimulus presentationseven at low dcent values Kaernbach (1990) compares 2AFC

var( ) exp( )

( )Y z

P P

N bz= ( )2

122

p

var( ) exp( )

( )Y z

P P

n bd= ( ) cent

412

2p

var( )( )

PP P

n= 1

MEASURING THE PSYCHOMETRIC FUNCTION 1435

and yesno adaptive methods with simulations and ac-tual experiments His abscissa is number of presenta-tions and his results are compatible with our findings

The objective yesno method that I have been discussingis inefficient since 50 of the trials are blanks Greaterefficiency is possible if several points on the PF are mea-sured (method of constant stimuli) with the number of blanktrials the same as at the other levels For example if fivelevels of stimuli were measured including blanks then theblanks would be only 20 of the total rather than 50 Forthe 2AFC paradigm the subject would still have to look at50 of the presentations being blanks The yesno eff i-ciency can be further improved by having the subjectrespond to the multiple stimuli with multiple ratings ratherthan a binary yesno response This rating scale methodof constant stimuli is my favorite method and I have beenusing it for 20 years (Klein amp Stromeyer 1980) The mul-tiple ratings allow one to measure the psychometric func-tion shape for dcents as large as 4 (well into the dipper regionwhere stimuli are slightly above threshold) It also allowsone to measure the ROC slope (presence of multiplicativenoise) Simulations of its benefits will be presented inKlein (2002)

Insight into how the number of responses affects thresh-old variance can be obtained by using the cumulative nor-mal PF (Equation 17) considered by Kaernbach (2001b)and Miller and Ulrich (2001) with g 5 0 Typically this isthe case of yesno discrimination but it can also be 2AFCdiscrimination with an ordinate going from 0 to 100as I have discussed in connection with Figure 3 For a cu-mulative normal PF with a binary response and for a sin-gle stimulus level being tested the threshold variance is(Equation 19) as follows

(30)

from Equation 27 and forthcoming Equations 40 and 41The optimal level to test is at P 5 05 so the optimal vari-ance becomes

(31)

If instead of a binary response the observer gives an ana-log response (many many rating categories) then the vari-ance is the familiar connection between standard deviationand standard error

(32)

With four response categories the variance is 119s2Nwhich is closer to the analog than to the binary case Ifstimuli with multiple strengths are intermixed (methodof constant stimuli) then the benefit of going from a bi-nary response to multiple responses is greater than that in-dicated in Equations 31ndash32 (Klein 2002)

A brief list of other problems with the 2AFC method ap-plied to detection comprises the following

1 2AFC requires the subject to memorize and thencompare the results of two subjective responses It is cog-nitively simpler to report each response as it arrives Sub-jects without much experience can easily make mistakes(lapses) simply becoming confused about the order of thestimuli Klein (1985) discusses memory load issues rele-vant to 2AFC

2 The 2AFC methodology makes various types ofanalysis and modeling more difficult because it requiresthat one average over all criteria The response variable in2AFC is the Interval 2 activation minus the Interval 1 ac-tivation (see the abscissa in Figure 2) This variable goesfrom minus infinity to plus infinity whereas the yesnovariable goes from zero to infinity It therefore seems moredifficult to model the effects of probability summation anduncertainty for 2AFC than for yesno

3 Models that relate psychophysical performance to un-derlying processes require information about how the noisevaries with signal strength (multiplicative noise) The rat-ing scale method of constant stimuli not only measures dcentas a function of contrast it also is able to provide an esti-mate of how the variance of the signal distribution (theROC slope) increases with contrast The 2AFC methodlacks that information

4 Both the yesno and 2AFC methods are supposed toeliminate the effects of bias Earlier I pointed out that in myrecent 2AFC discrimination experiments when the stim-ulus is near threshold I find I have a strong bias in favor ofresponding ldquoStimulus 2rdquo Until I learned to compensatefor this bias (not typically done by naiumlve subjects) my dcentwas underestimated As I have discussed it is possible tocompensate for this bias but that is rarely done

Improving the Brownian (Up ndashDown) StaircaseAdaptive methods with a goal of placing trials at an

optimal point on the PF are efficient Leek (2001) pro-vides a nice overview of these methods There are twogeneral categories of adaptive methods those based onmaximum (or mean) likelihood and those based on sim-pler staircase rules The present section is concernedwith the older staircase methods which I will callBrownian methods I used the name ldquoBrownianrdquo becauseof the upndashdown staircasersquos similarity to one-dimensionalBrownian motion in which the direction of each step israndom The familiar upndashdown staircase is Brownianbecause at each trial the decision about whether to go upor down is decided by a random factor governed by prob-ability P(x) I want to make this connection to Brownianmotion because I suspect that there are results in thephysics literature that are relevant to our psychophysicalstaircases I hope these connections get made

A Brownian staircase has a minimal memory of the runhistory it needs to remember the previous step (includingdirection) and the previous number of reversals or numberof trials (used for when to stop and for when to decreasestep size) A typical Brownian rule is to decrease signalstrength by one step (like 1 dB) after a correct responseand to increase it three steps after an incorrect response (a

var( thresh) = s 2

N

var( thresh) = timesp s2

2

N

var(var( )

( )exp thresh) =aeligegraveccedil

oumloslashdivide

= ( )P

N dPdy

P P zN2

22

2 1p s

1436 KLEIN

one downndashthree up rule) I was pleased to learn recentlythat his rule works nicely for an objective yesno task(Kaernbach 1990) as well as 2AFC as discussed earlier

In recent years likelihood methods (Pentland 1980Watson amp Pelli 1983) have become popular and havesomewhat displaced Brownian methods There is a wide-spread feeling that likelihood methods have substantiallysmaller threshold variance than do staircases in whichthresholds are obtained by simply averaging the levelstested Given the importance of this issue for informingresearchers about how best to measure thresholds it issurprising that there have been so few simulations com-paring staircases and optimal methods Pentlandrsquos (1980)results showing very large threshold variance for a stair-case method are so dramatically different from the re-sults of my simulations that I am skeptical The best pre-vious simulations of which I am aware are those of Green(1990) who compared the threshold variance from sev-eral 2AFC staircase methods with the ideal variance(Equation 24) He found that as long as the staircase ruleplaced trials above 80 correct the ratio of observed toideal threshold variance was about 14 This value is thesquare of the inverse of Greenrsquos finding that the ratio ofthe expected to the observed sigma is about 85

In my own simulations (Klein 2002) I found that aslong as the final step size was less than 02s the thresh-old variance from simply averaging over levels was closeto the optimal value Thus the staircase threshold varianceis about the same as the variance found by other optimalmethods One especially interesting finding was that thecommon method of averaging an even number of rever-sal points gave a variance that was about 10 higher thanaveraging over all levels This result made me wonderhow the common practice of averaging reversals ever gotstarted Since Green (1990) averaged over just the reversalpoints I suspect that his results are an overestimate of thestaircase variance In addition since Green specifies stepsize in decibels it is difficult for me to determine whetherhis final step size was small enough Since the thresholdvariance is inversely related to slope it is important to re-late step size to slope rather than to decibels The most nat-ural reference unit is s the standard deviation associatedwith the cumulative normal In summary I suggest that thisgeneral feeling of distrust of the simple Brownian staircaseis misguided and that simple staircases are often as goodas likelihood methods In the section Summary of Advan-tages of Brownian Staircase I point out several advantagesof the simple staircase over the likelihood methods

A qualification should be made to the claim above thataveraging levels of a simple staircase is as good as a fancylikelihood method If the staircase method uses a too smallstep size at the beginning and if the starting point is farfrom threshold then the simple staircase can be inefficientin reaching the threshold region However at the end ofthat run the astute experimenter should notice the largediscrepancy between the starting point and the thresholdand the run should be redone with an improved startingpoint or with a larger initial step size that will be reducedas the staircase proceeds

Increase number of 2AFC response categories In-dependent of the type of adaptive method used the 2AFCtask has a flaw whereby a series of lucky guesses at lowstimulus levels can erroneously drive adaptive methods tolevels that are too low Some adaptive methods with a smallnumber of trials are unable to fully recover from a string oflucky guesses early in the run One solution is to allow thesubject to have more than two responses For reasons thatI have never understood people like to have only two re-sponse categories in a 2AFC task Just as I am reluctant totake part in 2AFC experiments I am even more reluctantto take part in two response experiments The reason issimple I always feel that I have within me more than onebit of information after looking at a stimulus I usually haveat least four or more subjective states (2 bits) for a yesnotask HN LN LY HY where H and L are high and lowconfidence and N and Y are did not and did see it For a2AFC my internal states are H1 L1 L2 H2 for high andlow confidence on Intervals 1 and 2 Kaernbach (2001a) inthis special issue does an excellent job of justifying thelow-confidence response category He provides solid ar-guments simulations and experimental data on how alow-confidence category can minimize the ldquolucky guessrdquoproblem

Kaernbach (2001a) groups the L1 and L2 categoriestogether into a ldquodonrsquot knowrdquo category and argues per-suasively that the inclusion of this third response can in-crease the precision and accuracy of threshold estimateswhile also leaving the subject a happier person He callsit the ldquounforced choicerdquo method Most of his article con-cerns the understandable fear of introducing a responsebias exactly what 2AFC was invented to avoid He has de-veloped an intelligent rule for what stimulus level shouldfollow a ldquodonrsquot knowrdquo response Kaernbach shows withboth simulations and real data that with his method thispotential response bias problem has only a second-ordereffect on threshold estimates (similar to the 2AFC inter-val bias discussed around Equation 7) and its advantagesoutweigh the disadvantages I urge anyone using the 2AFCmethod to read it I conjecture that Kaernbachrsquos method isespecially useful in trimming the lower tail of the distri-bution of threshold estimates and also in reducing the in-terval bias of 2AFC tasks since the low confidence ratingwould tend to be used for trials on which there is the mostuncertainty

Although Kaernbach has managed to convince me ofthe usefulness of his three-response method he might havetrouble convincing others because of the response biasquestion I should like to offer a solution that might quellthe response bias fears Rather than using 2AFC with threeresponses one can increase the number of responses tofourmdashH1 L1 L2 H2mdashas discussed two paragraphsabove The L rating can be used when subjects feel theyare guessing

To illustrate how this staircase would work considerthe case in which we would like the 2AFC staircase to con-verge to about 84 correct A 5 upndash1 down staircase(5 levels up for a wrong response and 1 level down for acorrect response) leads to an average probability correct

MEASURING THE PSYCHOMETRIC FUNCTION 1437

of 56 5 8333 If one had zero information about thetrial and guessed then one would be correct half the timeand the staircase would shift an average of (5 1)2 512 levels Thus in his scheme with a single ldquodonrsquot knowrdquoresponse Kaernbach would have the staircase move up2 levels following a ldquodonrsquot knowrdquo response One mightworry that continued use of the ldquodonrsquot knowrdquo responsewould continuously increase the level so that one wouldget an upward bias But Kaernbach points out that theldquodonrsquot knowrdquo response replaces incorrect responses to agreater extent than it replaces correct responses so thatthe 12 level shift is actually a more negative shift than the15 shift associated with a wrong response For my sug-gestion of four responses the step shifts would be ndash1 1113 and 15 levels for responses HC LC LI and HIwhere H and L stand for high and low confidence and Cand I stand for correct and incorrect A slightly more con-servative set of responses would be ndash1 0 14 15 inwhich case the low-confidence correct judgment leaves thelevel unchanged In all these examples including Kaern-bachrsquos three-response method the low-confidence re-sponse has an average level shift of 12 when one is guess-ing The most important feature of the low confidencerating is that when one is below threshold giving a lowconfidence rating will prevent one from long strings oflucky guesses that bring one to an inappropriately lowlevel This is the situation that can be difficult to overcomeif it occurs in the early phase of a staircase when the stepsize can be large The use of the confidence rating wouldbe especially important for short runs where the extra bitof information from the subject is useful for minimizingerroneous estimates of threshold

Another reason not to be fearful of the multiple-response2AFC method is that after the run is finished one can es-timate threshold by using a maximum likelihood proce-dure rather than by averaging the run locations Standardlikelihood analysis would have to be modified for Kaern-bachrsquos three-response method in order to deal with theldquodonrsquot knowrdquo category For the suggested four-responsemethod one could simply ignore the confidence ratings forthe likelihood analysis My simulations discussed earlierin this section and Greenrsquos (1990) reveal that averagingover the levels (excluding a number of initial levels toavoid the starting point bias) has about as good a preci-sion and accuracy as the best of the likelihood methodsIf the threshold estimates from the likelihood method andfrom the averaging levels method disagree that run couldbe thrown out

Summary of advantages of Brownian staircaseSince there is no substantial variance advantage to thefancier likelihood methods one can pay more attention toa number of advantages associated with simple staircases

1 At the beginning of a run likelihood methods mayhave too little inertia (the sequential stimulus levels jumparound easily) unless a ldquostiff rdquo prior is carefully chosenBy inertia I refer to how difficult it is for the next level todeviate from the present level At the beginning of a runone would like to familiarize the subject with the stimu-

lus by presenting a series of stimuli that gradually increasein difficulty This natural feature of the Brownian stair-case is usually missing in QUEST-like methods Theslightly inefficient first few trials of a simple staircasemay actually be a benefit since it gives the subject somepractice with slightly visible stimuli

2 Toward the end of a maximum likelihood run one canhave the opposite problem of excessive inertia wherebyit is difficult to have substantial shifts of test contrastThis can be a problem if nonstationary effects are presentFor example thresholds can increase in mid-run owingto fatigue or to adaptation if a suprathreshold pedestal ormask is present In an old-fashioned Brownian staircasewith moderate inertia a quick look at the staircase trackshows the change of threshold in mid-run That observa-tion can provide an important clue to possible problemswith the current procedures A method that reveals non-stationarity is especially important for difficult subjectssuch as infants or patients in whom fatigue or inatten-tion can be an important factor

3 Several researchers have told me that the simplicity ofthe staircase rules is itself an advantage It makes writingprograms easier It is nicer for teaching students It alsosimplifies the uncertain business of guessing likelihoodpriors and communicating the methodology when writ-ing articles

Methods That Can Be Used to Estimate Slope as Well as Threshold

There are good reasons why one might want to learnmore about the PF than just the single threshold valueEstimating the PF slope is the first step in this directionTo estimate slope one must test the PF at more than onepoint Methods are available to do this (Kaernbach 2001bKontsevich amp Tyler 1999) and one could even adapt sim-ple Brownian staircase procedures to give better slope es-timates with minimum harm to threshold estimates (seethe comments in Strasburger 2001b)

Probably the best method for getting both thresholds andslopes is the Y (Psi) method of Kontsevich and Tyler(1999) The Y method is not totally novel it borrows tech-niques from many previous researchersmdashas Kontsevichand Tyler point out in a delightful paragraph that both clar-ifies their method and states its ancestry This paragraphsummarizes what is probably the most accurate and preciseapproach to estimating thresholds and slope so I reproduceit here

To summarize the Y method is a combination of solutionsknown from the literature The method updates the poste-rior probability distribution across the sampled space ofthe psychometric functions based on Bayesrsquo rule (Hall1968 Watson amp Pelli 1983) The space of the psychome-tric functions is two-dimensional (King-Smith amp Rose1997 Watson amp Pelli 1983) Evaluation of the psycho-metric function is based on computing the mean of theposterior probability distribution (Emerson 1986 King-Smith et al 1994) The termination rule is based on thenumber of trials as the most practical option (King-Smith

1438 KLEIN

et al 1994 Watson amp Pelli 1983) The placement of eachnew trial is based on one-step ahead minimum search(King-Smith 1984) of the expected entropy cost function(Pelli 1987)

A simplified rough description of the Y trial placementfor 2AFC runs is that initially trials are placed at about the90 correct level to establish a solid threshold Oncethreshold has been roughly established trials are placed atabout the 90 and the 70 levels to pin down the slopeThe precise levels depend on the assumed PF shape

Before one starts measuring slopes however one mustfirst be convinced that knowledge of slope can be impor-tant It is so for several reasons

1 Parametric fitting procedures either require knowl-edge of slope b or must have data that adequately con-strain the slope estimate Part of the discussion that fol-lows will be on how to obtain reliable threshold estimateswhen slope is not well known A tight knowledge of slopecan minimize the error of the threshold estimate

2 Strasburger (2001a 2001b) gives good argumentsfor the desirability of knowing the shape of the PF He isinterested in differences between foveal and peripheralvisual processing and he uses the PF slope as an indica-tion of differences

3 Slopes are different for different stimuli and thiscan lead to misleading results if slope is ignored It mayhelp to describe my experience with the Modelfest pro-ject (Carney et al 2000) to make this issue concreteThis continuing project involves a dozen laboratoriesaround the world We have measured detection thresh-olds for a wide variety of visual stimuli The data areused for developing an improved model of spatial visionSome of the stimuli were simple easily memorized lowspatial frequency patterns that tend to have shallowerslopes because a stimulus-known-exactly template canbe used efficiently and with minimal uncertainty On theother hand tiny Modelfest stimuli can have a good dealof spatial uncertainty and are therefore expected to pro-duce PFs with steeper slopes Because of the slope changethe pattern of thresholds will depend on the level atwhich threshold is measured When this rich dataset isanalyzed with f ilter models that make estimates formechanism bandwidths and mechanism pooling thesedifferent slope-dependent threshold patterns will lead todifferent estimates for the model parameters Several ofus in the Modelfest data gathering group argued that weshould use a data-gathering methodology that wouldallow estimation of PF slope of each of the 45 stimuliHowever I must admit that when I was a subject my pa-tience wore thin and I too chose the quicker runs Al-though I fully understand the attraction of short runs fo-cused on thresholds and ignoring slope I still think thereare good reasons for spreading trials out For one thinginterspersing trials that are slightly visible helps the sub-ject maintain memory of the test pattern

4 Slopes can directly resolve differences between mod-els In my own research for example my colleagues and

I are able to use the PF slope to distinguish long-range fa-cilitation that produces a low-level excitation or noise re-duction (typical with a cross-orientation surround) from ahigher level uncertainty reduction (typical with a same-orientation surround)

Throwing-Out Rules Goodness-of-FitThere are occasions when a staircase goes sour Most

often this happens when the subjectrsquos sensitivity changesin the middle of a run possibly because of fatigue It istherefore good to have a well-specified rule that allowsnonstationary runs to be thrown out The throwing-out rulecould be based on whether the PF slope estimate is toolow or too high Alternatively a standard approach is tolook at the chi-square goodness-of-f it statistic whichwould involve an assumption about the shape of the un-derlying PF I have a commentary in Section III regard-ing biases in the chi-square metric that were found byWichmann and Hill (2001a) Biases make it difficult touse standard chi-square tables The trouble with this ap-proach of basing the goodness-of-fit just on the cumulateddata is that it ignores the sequential history of the run TheBrownian upndashdown staircase makes the sequential historyvividly visible with a plot of sequentially tested levelsWhen fatigue or adaptation sets in one sees the tested lev-els shift from one stable point to another Leek Hanna andMarshall (1991) developed a method of assessing the non-stationarity of a psychometric function over the course ofits measurement using two interleaved tracks EarlierHall (1983) also suggested a technique to address the sameproblem I encourage further research into the develop-ment of a non-stationarity statistic that can be used to dis-card bad runs This statistic would take the sequential his-tory into account as well as the PF shape and the chi-square

The development of a ldquothrowing-outrdquo rule is the ex-treme case of the notion of weighted averaging One im-portant reason for developing good estimators for good-ness-of-fit and good estimators for threshold variance isto be able to optimally combine thresholds from repeatedexperiments I shall return to this issue in connectionwith the Wichmann and Hill (2001a) analysis of bias ingoodness-of-fit metrics

Final Thought The Method Should Depend on the Situation

Also important is that the method chosen should be ap-propriate for the task For example suppose one is usingwell-experienced observers for testing and retesting sim-ilar conditions in which one is looking for small changesin threshold or one is interested in the PF shape In suchcases one has a good estimate of the expected thresholdand the signal detection method of constant stimuli withblanks and with ratings might be best On the other handfor clinical testing in which one is highly uncertain aboutan individualrsquos threshold and criterion stability a 2AFClikelihood or staircase method (with a confidence rating)might be best

MEASURING THE PSYCHOMETRIC FUNCTION 1439

III DATA-FITTING METHODS FOR ESTIMATING THRESHOLD AND SLOPEPLUS GOODNESS-OF-FIT ESTIMATION

This final section of the paper is concerned with whatto do with the data after they have been collected Wehave here the standard Bayesian situation Given the PFit is easy to generate samples of data with confidence in-tervals But we have the inverse situation in which we aregiven the data and need to estimate properties of the PFand get confidence limits for those estimates The pair ofarticles in this special issue by Wichmann and Hill (2001a2001b) as well as the articles by King-Smith et al (1994)Treutwein (1995) and Treutwein and Strasburger (1999)do an outstanding job of introducing the issues and providemany important insights For example Emerson (1986)and King-Smith et al emphasize the advantages of meanlikelihood over maximum likelihood The speed of mod-ern computers makes it just as easy to calculate meanlikelihood as it is to calculate maximum likelihood As an-other example Treutwein and Strasburger (1999) demon-strate the power of using Bayesian priors for estimatingparameters by showing how one can estimate all four pa-rameters of Equation 1A in fewer than 100 trials This im-portant finding deserves further examination

In this section I will focus on four items First I will dis-cuss the relative merits of parametric versus nonparamet-ric methods for estimating the PF properties The paperby Miller and Ulrich (2001) provides a detailed analysis ofa nonparametric method for obtaining all the moments ofthe PF I will discuss some advantages and limitations oftheir method Second is the question of goodness-of-fitWichmann and Hill (2001a) show that when the numberof trials is small the goodness-of-fit can be very biasedI will examine the cause of that bias Third is the issue dis-cussed by Wichmann and Hill (2001a) that of how lapseseffect slope estimates This topic is relevant to Strasburg-errsquos (2001b) data and analysis Finally there is the possiblycontroversial topic of an upward slope bias in adaptivemethods that is relevant to the papers of Kaernbach (2001b)and Strasburger (2001b)

Parametric Versus Nonparametric FitsThe main split between methods for estimating pa-

rameters of the psychometric function is the distinctionbetween parametric and nonparametric methods In aparametric method one uses a PF that involves severalparameters and uses the data to estimate the parametersFor example one might know the shape of the PF and es-timate just the threshold parameter McKee et al (1985)show how slope uncertainty affects threshold uncertaintyin probit analysis The two papers by Wichmann and Hill(2001a 2001b) are an excellent introduction to many is-sues in parametric fitting An example of a nonparamet-ric method for estimating threshold would be to averagethe levels of a Brownian staircase after excluding the firstfour or so reversals in order to avoid the starting level biasBefore I served as guest editor of this special symposium

issue I believed that if one had prior knowledge of the exactPF shape a parametric method for estimating threshold wasprobably better than nonparametric methods After read-ing the articles presented here as well as carrying out myrecent simulations I no longer believe this I think thatin the proper situation nonparametric methods are as goodas parametric methods Furthermore there are situationsin which the shape of the PF is unknown and nonparamet-ric methods can be better

Miller and Ulrichrsquos Article on the Nonparametric SpearmanndashKaumlrber Method of Analysis

The standard way to extract information from psycho-metric data is to assume a functional shape such as a Wei-bull cumulative normal logit dcent and then vary the pa-rameters until likelihood is maximized or chi-square isminimized Miller and Ulrich (2001) propose using theSpearmanndashKaumlrber (SK) nonparametric method whichdoes not require a functional shape They start with a prob-ability density function defined by them as the derivativeof the PF

(33)

where s(i) is the stimulus intensity at the ith level andP(i) is the probability correct at the that level The notationpdf(i) is meant to indicate that it is the pdf for the intervalbetween i ndash 1 and i Miller and Ulrich give the r th momentof this distribution as their Equation 3

(34)

The next four equations are my attempt to clarify their ap-proach I do this because when applicable it is a finemethod that deserves more attention Equation 34 prob-ably came from the mean value theorem of calculus

(35)

where f (s) 5 sr11 and f cent(smid) 5 (r 1 1)smidr is the deriv-

ative of f with respect to s somewhere within the intervalbetween s(i 1) and s(i) For a slowly varying functionsmid would be near the midpoint Given Equation35 Equa-tion 34 becomes

(36)

where Ds 5 s(i) s(i 1) Equation 36 is what one wouldmean by the rth moment of the distribution The case r 50 is important since it gives the area under the pdf

(37)

by using the definition in Equation 33 The summation inEquation 37 gives the area under the pdf For it to be aproper pdf its area must be unity which means that theprobability at the highest level P(imax) must be unity andthe probability at the lowest level Pmin must be zero asMiller and Ulrich (2001) point out

m0 = =aringpdf( ) ( ) ( ) max mini s P i P ii

D

mrr

i

i s s= aring pdf mid( ) D

f s i f s i f s s i s i( ) ( ) ( ) ( ) ( ) [ ] [ ] = cent [ ]1 1mid

mrr r

ri s i s i=

+ [ ]aring + +11

11 1pdf( ) ( ) ( )

pdf( )( ) ( )

( ) ( )i

P i P i

s i s i= 1

1

1440 KLEIN

The next simplest SK moment is for r 5 1

(38)

from Equation 33 This first moment provides an esti-mate of the centroid of the distribution corresponding tothe center of the psychometric function It could be usedas an estimate of threshold for detection or PSE for dis-crimination An alternate estimator would be the medianof the pdf corresponding to the 50 point of the PF (theproper definition of PSE for discrimination)

Miller and Ulrich (2001) use Monte Carlo simulationsto claim that the SK method does a better job of extract-ing properties of the psychometric function than do para-metric methods One must be a bit careful here since intheir parametric fitting they generate the data by usingone PF and then fit it with other PFs But still it is a sur-prising claim if this simple method is so good why havepeople not been using it for years There are two possibleanswers to this question One is that people never thoughtof using it before Miller and Ulrich The other possibil-ity is that it has problems I have concluded that the an-swer is a combination of both factors I think that there isan important place for the SK approach but its limitationsmust be understood That is my next topic

The most important limitation of the SK nonparamet-ric method is that it has only been shown to work well formethod of constant stimulus data in which the PF goesfrom 0 to 1 I question whether it can be applied with thesame confidence to 2AFC and to adaptive procedures withunequal numbers of trials at each level The reason for thislimitation is that the SK method does not offer a simpleway to weight the different samples Let us consider twosituations with unequal weighting adaptive methods and2AFC detection tasks

Adaptive methods place trials near threshold but withvery uneven distribution of number of trials at differentlevels The SK method gives equal weighting to sparselypopulated levels that can be highly variable thus con-tributing to a greater variance of parameter estimates Letus illustrate this point for an extreme case with five levelss 5 [1 2 3 4 5] Suppose the adaptive method placed [11 1 98 1] trials at the five levels and the number correctwere [0 1 1 49 1] giving probabilities [0 1 1 5 1] andpdf 5 [1 0 5 5] The following PEST-like (Taylor ampCreelman 1967) rules could lead to this unusual dataMove down a level if the presently tested level has greaterthan P1 correct move up three levels if the presentlytested level has less than P2 correct P1 can be anynumber less than 33 and P2 can be any number greaterthan 67 The example data would be produced by start-ing at Level 5 and getting four in a row correct followedby an error followed by a wide set possible of alternat-ing responses for which the PEST rules would leave thefourth level as the level to be tested The SK estimate of the

center of the PF is given by Equation 38 to be micro1 5 200Since a pdf with a negative probability is distasteful tosome one can monotonize the PF according to the proce-dure of Miller and Ulrich (which does include the numberof trials at each level) The new probabilities are [0 5151 51 1] with pdf 5 [51 0 0 49] The new estimate ofthreshold is micro1 5 297 However given the data with 4998correct at Level 4 an estimate that includes a weightingaccording to the number of trials at each level would placethe centroid of the distribution very close to micro1 5 4 Thuswe see that because the SK procedure ignores the numberof trials at each level it produces erroneous estimates ofthe PF properties This example with shifting thresholdswas chosen to dramatize the problems of unequal weight-ing and the effect of monotonizing

A similar problem can occur even with the method ofconstant stimuli with equal number of trials at each levelbut with unequal binomial weighting as occurs in 2AFCdata The unequal weighting occurs because of the 50lower asymptote The binomial statistics error bar of dataat 55 correct is much larger than for data at 95 cor-rect Parametric procedures such as probit analysis giveless weighting to the lower data This is why the most ef-ficient placement of trials in 2AFC is above 90 correctas I have discussed in Section II Thus it is expected thatfor 2AFC the SK method would give threshold estimatesless precise than the estimates given by a parametricmethod that includes weighting

I originally thought that the SK method would fail on alltypes of 2AFC data However in the discussion followingEquation 7 I pointed out that for 2AFC discriminationtasks there is a way of extending the data to the 0ndash100range that produces weightings similar to those for a yesnotask Rather than use the standard Equation 2 to deal withthe 50 lower asymptote one should use the method dis-cussed in connection with Figure 3 in which one can usefor the ordinate the percent of correct responses in Inter-val 2 and use the signal strength in Interval 2 minus thatin Interval 1 for the abscissa That procedure produces aPF that goes from 0 to 100 thereby making it suit-able for SK analysis while also removing a possible bias

Now let us examine the SK method as applied to datafor which it is expected to work such as discriminationPFs that go from 0 to 100 These are tasks in which thelocation of the psychometric function is the PSE and theslope is the threshold

A potentially important claim by Miller and Ulrich(2001) is that the SK method performs better than probitanalysis This is a surprising claim since their data aregenerated by a probit PF one would expect probit analy-sis to do well especially since probit analysis does espe-cially well with discrimination PFs that go from 0 to 100To help clarify this issue Miller and I did a number ofsimulations We focused on the case of N 5 40 with 8 tri-als at each of 5 levels The cumulative normal PF (Equa-tion 14) was used with equally spaced z scores and with thelowest and highest probabilities being 1 and 99 Theprobabilities at the sample points are P(i) 5 1 1222

m1

2 21

2

11

2

=

= [ ] +

aring

aring

pdf( )( ) ( )

( ) ( )( ) ( )

is i s i

P i P is i s i

MEASURING THE PSYCHOMETRIC FUNCTION 1441

50 8778 and 99 corresponding to z scores ofz(i) 5 ndash2328 1164 0 1164 and 2328 Nonmonot-onicity is eliminated as Miller and Ulrich suggest (seethe Appendix for Matlab Code 4 for monotonizing) I didMonte Carlo simulations to obtain the standard deviationof the location of the PF micro1 given by Equation38 I foundthat both the SK method and the parametric probit method(assuming a fixed unity slope) gave a standard deviationof between 028 and 029 Miller and Ulrichrsquos result re-ported in their Table 17 gives a standard error of 0071However their Gaussian had a standard deviation of 025whereas my z-score units correspond to a standard devi-ation of unity Thus their standard error must be multipliedby 4 giving a standard error of 0284 in excellent agree-ment with my value So at least in this instance the SKmethod did not do better than the parametric method (withslope fixed) and the surprise mentioned at the beginningof this paragraph did not materialize I did not examineother examples where nonparametric methods might dobetter than the matched parametric method However thesimulations of McKee et al (1985) give some insight intowhy probit analysis with floating slope can do poorly evenon data generated from probit PFs McKee et al showthat when the number of trials is low and if stimulus lev-els near both 0 and 100 are not tested the slope can bevery flat and the predicted threshold can be quite distantIf care is not taken to place a bound on these probit esti-mates one would expect to find deviant threshold esti-mates The SK method on the other hand has built-inbounds for threshold estimates so outliers are avoidedThese bounds could explain why SK can do better thanprobit When the PF slope is uncertain nonparametricmethods might work better than parametric methods adecided advantage for the SK method

It is instructive to also do an analytic calculation of thevariance based on an analysis similar to what was doneearlier If only the ith level of the five had been testedthe variance of the PF location (the PSE in a discrimina-tion task) would have been as follows (Gourevitch ampGalanter 1967)

(39)

where var(P) is given by Equation 27 and PF slope is

(40)

where for the cumulative normal dzi dxi 5 1s where sis the standard deviation of the pdf and

(41)

from Equation 3A By combining these equations one gets

(42)

Note that if all trials were placed at the optimal stimuluslocation of z 5 0 (P 5 5) Equation 42 would becomes2p 2Ni as discussed earlier When all five levels arecombined the total variance is smaller than each individ-ual variance as given by the variance summation formula

(43)

Finally upon plugging in the five values of zi rangingfrom ndash2328 to 12328 and Pi ranging from 1 to 99the standard error of the PF location is

(44)

This value is precisely the value obtained in our MonteCarlo simulations of the nonparametric SK method andalso by the slope fixed parametric probit analysis Thistriple confirmation shows that the SK method is excel-lent in the realm where it applies (equal number of trialsat levels that span 0 to 100) It also confirms our sim-ple analytic formula

The data presented in Miller and Ulrichrsquos Table 17 giveus a fine opportunity to ask about the efficiency of themethod of constant stimuli that they use For the N 5 40example discussed here the variance is

(45)

This value is more than twice the variance of p (2 Ntot)that is obtained for the same PF using adaptive methodsincluding the simple upndashdown staircase that place trialsnear the optimal point

The large variance for the Miller and Ulrich stimulusplacement is not surprising given that of the five stimu-lus levels the two at P 5 1 and 99 are quite inefficientThe inefficient placement of trials is a common problemfor the method of constant stimuli as opposed to adaptivemethods However as I mentioned earlier the method ofconstant stimuli combined with multiple response ratingsin a signal detection framework may be the best of allmethods for detection studies for revealing properties of thesystem near threshold while maintaining a high efficiency

A curious question comes up about the optimal placementof trials for measuring the location of the psychometric func-tion With the original function the optimal placement is atthe steepest part of the function (at P 5 0) However withthe SK method when one is determining the location ofthe pdf one might think that the optimal placement of sam-ples should occur at the points where the pdf is steepestThis contradiction is resolved when one realizes that onlythe original samples are independent The pdf samples areobtained by taking differences of adjacent PF values soneighboring samples are anticorrelated Thus one cannot

var tottot

= =SEN

2 3 24

SE = =var tot12 0 2844

var var tot = aring1 1i

var( )exp

PSE i i i

i

i

P Pz

N= ( ) ( )

2 12

2

ps

dP

dz

zi

i

iaeligegraveccedil

oumloslashdivide

=( )2 2

2

exp

p

dP

dx

dP

dz

dz

dxi

i

i

i

i

i

= times

var( )var

PSE i

i

i

i

P

dP

dx

=( )

aeligegraveccedil

oumloslashdivide

2

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

MEASURING THE PSYCHOMETRIC FUNCTION 1431

old in agreement with a wide range of experimental dataThe acceleration and deceleration are most clearly seen bythe slope of dcent in the logndashlog coordinates of panel e Thelogndashlog dcent slope is plotted in panels c and f At xt 5 0 thelogndashlog slope is 2 corresponding to our choice of b 5 2The slope falls surprisingly rapidly as stimulus strength in-creases and the slope is near 1 at xt 5 2 If we had chosenb 5 4 for the Weibull function then the logndashlog dcent slopewould have gone from 4 at xt 5 0 to near 2 at xt 5 2Equation 20 will present a dcent function that captures thisbehavior

Another PF commonly used is based on the cumulativenormal (Equation 3)

(14A)

or

(14B)

where s is the standard error of the underlying Gaussianfunction (its connection to b will be clarified later) Notethat the cumulative normal PF in Equation 14 should beused only with a logarithmic abscissa because y goes to

yen needed for the 0 lower asymptote whereas theWeibull function can be used with both the linear (Equa-tion 12) and the log (Equation 13) abscissa

Two examples of Strasburgerrsquos maximum slope (hisEquations 7 and 17) are

(15)

for the Weibull function (Equation 13) and

(16)

for the cumulative normal (Equation 14) In Equation 15 Iset k in Equation 12 to be k 5 exp( 1) This choiceamounts to defining threshold so that the maximum slopeoccurs at threshold ( yt 5 0) Note that Equations 12ndash16are missing the (1 g) factor that are present in Strasburg-errsquos Equations 7 and 17 because we are dealing with prather than P Equations 15 and 16 provide a clear simpleand close connection between the slopes of the Weibull andcumulative normal functions so I am grateful to Stras-burger for that insight

An example might help in showing how Equation 16works Consider a 2AFC task (g 5 05) assuming a cumu-lative normal PF with s 5 10 According to Equation 16bcent 5 0399 At threshold (assumed for now to be the pointof maximum slope) P(xt 5 1) 5 75 At 10 abovethreshold P(xt 5 11) 75 1 01 bcent 5 07899 which isquite close to the exact value of 7896 This example showshow bcent defined with a natural log abscissa yt is relevantto a linear abscissa xt

Now that the logarithmic abscissa has been introducedthis is a good place to stop and point out that the log ab-scissa is fine for detection but not for discrimination since

the x 5 0 point is often in the middle of the range How-ever the cumulative normal PF is especially relevant to dis-crimination where the PF goes from 0 to 100 andwould be written as

(17A)

or

(17B)

where x 5 PSE is the point of subjective equality Theparameter a is the threshold since dcent(a) 5 zF(a)zF(0) 5 1 The threshold a is also s standard deviationsof the Gaussian probability density function (pdf ) Thereciprocal of a is the PF slope I tend to use the letter sfor a unitless standard deviation as occurs for the loga-rithmic variable y I use a for a standard deviation thathas the units of the stimulus x (like percent contrast) Thecomparison of Equation 14B for detection and Equation17B for discrimination is useful Although Equation 14is used in most of the articles in this special issue whichare concerned with detection tasks the techniques that I willbe discussing are also relevant to discrimination tasks forwhich Equation 17 is used

Threshold and the Weibull FunctionTo illustrate how a dcent 5 1 definition of threshold works

for a Weibull function let us start with a yesno task inwhich the false alarm rate is P(0) 5 1587 correspondingto a z score of zFA 5 1 as in Figure 1 The z score atthreshold (dcent 5 1) is zTh 5 zFA11 5 0 corresponding to aprobability of P(1) 5 50 From Equations 1 and 12 thek value for this definition of threshold is given by k 5[1 P(1)][1 P(0)] 5 05943 The Weibull function be-comes

(18A)

If I had defined dcent 5 2 to be threshold then zTh 5 zFA125 1 leading to k 5 (1 08413)(1 01587) 5 01886giving

(18B)

As another example suppose the false alarm rate in ayesno task is at 50 not uncommon in a signal detectionexperiment with blanks and test stimuli intermixed Thenthreshold at dcent 5 1 would occur at 8413 and the PFwould be

(18C)

with Pweib(0) 5 50 and Pweib(1) 5 8413 This casecorresponds to defining dcent on the ROC vertical intercept

For 2AFC the connection between dcent and z is z 5 dcent205Thus zTh 5 2 05 5 07071 corresponding to Pweib(1) 57602 leading to k 5 04795 in Equation 12 These con-nections will be clarified when an explicit form (Equation20)is given for the dcent function dcent(xt)

P xtxt

weib ( ) = 1 0 5 0 3173 b

P xtxt

weib ( ) = 1 1 0 1587 0 1886( ) b

P xtxt

weib ( ) = 1 1 0 1587 0 5943( ) b

z x xtF ( ) = PSE

a

p x P x xt tF F F( ) = ( ) = aelig

egraveoumloslash

PSEa

cent = =bs p s

12

0 399

cent = =b b bexp( ) 1 0 368

z yy y Y

tt( ) = =s s

p yy

tt

F F( ) =aeligegraveccedil

oumloslashdivides

1432 KLEIN

Complications With Strasburgerrsquos Advocacy of Maximum Slope

Here I will mention three complications implicated inStrasburgerrsquos suggestion of defining slope at the point ofmaximum slope on a logarithmic abscissa

1 As can be seen in Equation 11 the slopes on linearand logarithmic abscissas are related by a factor of 1xtBecause of this extra factor the maximum slope occursat a lower stimulus strength on linear as compared withlog axes This makes the notion of maximum slope lessfundamental However since we are usually interested inthe percent error of threshold estimates it turns out thata logarithmic abscissa is the most relevant axis support-ing Strasburgerrsquos log abscissa definition

2 The maximum slope is not necessarily at threshold(as Strasburger points out) For the Weibull functions de-fined in Equation 13 with k 5 exp( 1) and the cumula-tive normal function in Equation 14 the maximum slope(on a log axis) does occur at threshold However the twopoints are decoupled in the generalized Weibull functiondefined in Equation 12 For the Quick version of theWeibull function (Equation 13 with k 5 05 placingthreshold halfway up the PF) the threshold is below thepoint of maximum slope the derivative of P(xt) atthreshold is

which is slightly different from the maximum slope asgiven by Equation 15 Similarly when threshold is de-fined at dcent 5 1 the threshold is not at the point of max-imum slope People who fit psychometric functions wouldprobably prefer reporting slopes at the detection thresh-old rather than at the point of maximum slope

3 One of the most important considerations in selectinga point at which to measure threshold is the question ofhow to minimize the bias and standard error of the thresh-old estimate The goal of adaptive methods is to place tri-als at one level In order to avoid a threshold bias due to aimproper estimate of PF slope the test level should be atthe defined threshold The variance of the threshold esti-mate when the data are concentrated as a single level (aswith adaptive procedures) is given by Gourevitch and Gal-anter (1967) as

(19)

If the binomial error factor of P(1 P)N were not presentthe optimal placement of trials for measuring thresholdwould be at the point of maximum slope However thepresence of the P(1 P) factor shifts the optimal point toa higher level Wichmann and Hill (2001b) consider howtrial placement affects threshold variance for the methodof constant stimuli This topic will be considered in Sec-tion II

Connecting the Weibull and dcent FunctionsFigures 5ndash7 of Strasburger (2001a) compare the logndash

log dcent slope b with the Weibull PF slope b Strasburgerrsquosconnection of b 5 088b is problematic since the dcent ver-sion of the Weibull PF does not have a fixed logndashlog slopeAn improved dcent representation of the Weibull function isthe topic of the present section A useful parameterizationfor dcent which I shall refer to as the StromeyerndashFoley func-tion was introduced by Stromeyer and Klein (1974) andused extensively by Foley (1994)

(20)

where xt is the stimulus strength in threshold units as inEquation 8 The factors with a in the denominator are pres-ent so that dcent equals unity at threshold (xt 5 1 or x 5 a)At low xt dcent x t

ba The exponent b (the logndashlog slope ofdcent at very low contrasts) controls the amount of facilita-tion near threshold At high xt Equation 20 becomes dcentx t

1 w (1 a) The parameter w is the logndash log slope of thetest threshold versus pedestal contrast function (the tvc orldquodipperrdquo function) at strong pedestals One must be a bitcautious however because in yesno procedures the tvcslope can be decoupled from w if the signal detection ROCcurve has nonunity slope (Stromeyer amp Klein 1974) Theparameter a controls the point at which the dcent function be-gins to saturate Typical values of these unitless parametersare b 5 2 w 5 05 and a 5 06 (Yu Klein amp Levi 2001)The function in Equation 20 accelerates at low contrast(logndashlog slope of b) and decelerates at high contrasts (logndashlog slope of 1 w) in general agreement with a broadrange of data

For 2AFC z 5 dcentIuml2 so from Equation 3 the connec-tion between dcent and probability correct is

(21)

To establish the connection between the PFs specified inEquations12 and 20ndash21 one must first have the two agreeat threshold For 2AFC with the dcent 5 1 the threshold atP 5 7602 can be enforced by choosing k 5 04795 (seethe discussion following Equation 18C) so that Equa-tions 1 and 12 become

(22)

Modifying k as in Equation 22 leaves the Weibull shapeunchanged and shifts only the definition of threshold Ifb 5 106b w 5 1 039b and a 5 0614 then for allvalues of b the Weibull and dcent functions (Equations 21and 22 vs Equation 12) differ by less than 00004 for allstimulus strengths At very low stimulus strengths b 5b The value b 5 106b is a compromise for getting anoverall optimal fit

Strasburger (2001a) is concerned with the same issueIn his Table 1 he reports dcent logndashlog slopes of [8847

P xtxt( ) = 1 1 5 4795( )

b

P xd x

tt( ) = +

cent( )eacute

eumlecircecirc

5 5erf2

cent( ) =d xx

a a xt

tb

tb w+ 1 + 1( )

var( )( )

YP P

N

dP y

dy

t

t

=( )eacute

eumlecircecirc

12

cent =( )

= =b g b g bthresh

dP x

dx

t

t

e( )log ( )

( ) 12

2347 1

MEASURING THE PSYCHOMETRIC FUNCTION 1433

18379 3131 4421] for b 5 [1 2 35 5] If the dcent slopeis taken to be a measure of the dcent exponent b then theratio bb is [08847 09190 08946 08842] for the fourb values Our value of bb 5 106 differs substantiallyfrom 08 (Pelli 1985) and 088 (Strasburger 2001a) Pelli(1985) and Strasburger (2001a) used dcent functions with a 51 in Equation 20 With a 5 0614 our dcent function (Equa-tion 20) starts saturating near threshold in agreement withexperimental data The saturation lowers the effective valueof b I was very surprised to find that the StromeyerndashFoley dcent function did such a good job in matching theWeibull function across the whole range of b and x Fora long time I had thought a different fit would be neededfor each value of b

For the same Weibull function in a yesno method (falsealarm rate 5 50) the parameters b and w are identicalto the 2AFC case above Only the value of a shifts froma 5 0614 (2AFC) to a 5 054 (yesno) In order to makext 5 1 at dcent 5 1 k becomes 03173 (see Equation 18C)instead of 04795 (2AFC) As with 2AFC the differencebetween the Weibull and dcent function is less than 0004(lt04) for all stimulus strengths and all values of b andfor both a linear and a logarithmic abscissa These valuesmean that for both the yes no and the 2AFC methods thedcent function (Equation 20) corresponding to a Weibullfunction with b 5 2 has a low-contrast logndashlog slope ofb 5 212 and a high-contrast logndashlog slope of 1 w 5078 The high-contrast tvc (test contrast vs pedestal con-trast discrimination function sometimes called the ldquodip-perrdquo function) logndashlog slope would be w 5 022

I also carried out a similar analysis asking what cumu-lative normal s is best matched to the Weibull I founds 5 11720b However the match is poor with errors of4 (rather than lt004 in the preceding analysis) Thepoor fit of the two functions makes the fitting proceduresensitive to the weighting used in the fitting procedure Ifone uses a weighting factor of [P(1 P)N] where P is thePF one will find a quite different value for s than if oneignored the weighting or if one chose the weighting on thebasis of the data rather than the fitting function

II COMPARING EXPERIMENTAL TECHNIQUES FOR MEASURING

THE PSYCHOMETRIC FUNCTION

This section inspired by Linschoten et al (2001) is anexamination of various experimental methods for gather-ing data to estimate thresholds Linschoten et al comparethree 2AFC methods for measuring smell and taste thresh-olds a maximum-likelihood adaptive method an upndashdownstaircase method and an ascending method of limits Thespecial aspect of their experiments was that in measuringsmell and taste each 2AFC trial takes between 40 and60 sec (Lew Harvey personal communication) because ofthe long duration needed for the nostrils or mouth to re-cover their sensitivity The purpose of the Linschoten et alpaper is to examine which of the three methods is mostaccurate when the number of trials is low Linschoten et al

conclude that the 2AFC method is a good method for thistask My prior experience with forced choice tasks and lownumber of trials (Manny amp Klein 1985 McKee et al1985) made me wonder whether there might be better meth-ods to use in circumstances in which the number of trialsis limited This topic is considered in Section II

Compare 2AFC to YesNoThe commonly accepted framework for comparing

2AFC and yesno threshold estimates is signal detectiontheory Initially I will make the common assumption thatthe dcent function is a power function (Equation 20 witha 5 1)

(23)

Later in this section the experimentally more accurateform Equation 20 with a 1 will be used The varianceof the threshold estimate will now be calculated for 2AFCand yesno for the simplest case of trials placed at a sin-gle stimulus level y the goal of most adaptive methodsThe PF slope relates ordinate variance to abscissa vari-ance (from Gourevitch amp Galanter 1967)

(24)

where Y 5 y yt is the threshold estimate for a natural logabscissa as given in Equation 9 This formula for var(Y )is called the asymptotic formula in that it becomes exactin the limit of large number of total trials N Wichmannand Hill (2001b) show that for the method of constantstimuli a proper choice of testing levels brings one to theasymptotic region for N 120 (the smallest value thatthey explored) Improper choice of testing levels requiresa larger N in order to get to the asymptotic region Equa-tion 24 is based on having the data at a single level as oc-curs asymptotically in adaptive procedures In addition thedefinition of threshold must be at the point being testedIf levels are tested away from threshold there will be anadditional contribution to the threshold variance (Finney1971 McKee et al 1985) if the slope is a variable param-eter in the fitting procedure Equation 24 reaches its as-ymptotic value more rapidly for adaptive methods with fo-cused placement of trials than with the constant stimulusmethod when slope is a free parameter

Equation 24 needs analytic expressions for the deriva-tive and the variance of dcent The derivative is obtained fromEquation 23

(25)

The variance of dcent for both yesno (dcent 5 zhit 1 zcorrect rejection)and 2AFC (dcent 5 Iuml2zave) is

(26)var var( ) exp var( ) cent( ) = = ( )d z z P2 4 2p

dddy

bdt

cent = cent

var( )var( )

Yd

dddyt

= cent

centaeligegraveccedil

oumloslashdivide

2

cent = = ( )d c bytb

texp

1434 KLEIN

For the yes no case Equation 26 is based on a unityslope ROC curve with the subjectrsquos criterion at the neg-ative diagonal of the ROC curve where the hit rate equalsthe correct rejection rate This would be the most effi-cient operating point The variance for a nonunity slopeROC curve was considered by Klein Stromeyer andGanz (1974) From binomial statistics

(27)

with n 5 N for 2AFC and n 5 N2 for yesno since forthis binary yesno task the number of signal and blanktrials is half the total number of trials

Putting all of these factors together gives

(28)

for both the yesno and the 2AFC cases The only dif-ference between the two cases is the definition of n andthe definition of dcent (dcent2AFC 5 205z and dcentYN 5 2z for asymmetric criterion) When this is expressed in terms ofN (number of trials) and z (z score of probability cor-rect) we get

(29)

for both 2AFC and yesno That is when the percent cor-rects are all equal (P2AFC 5 Phit 5 Pcorrect rejection) thethreshold variance for 2AFC and yesno are equal Theminimum value of var(Y ) is 164(Nb2) occurring at z 5157 (P 5 94) This value of 94 correct is the optimumvalue at which to test for both 2AFC and yesno tasks in-dependent of b

This result that the threshold variance is the same for2AFC and yesno methods seems to disagree with Figure 4of McKee et al (1985) where the standard errors of thethreshold estimates are plotted for 2AFC and yesno asa function of the number of trials The 2AFC standarderror (their Figure 4) is twice that of yesno correspond-ing to a factor of four in variance However the McKeeet al analysis is misleading since all they (we) did for theyesno task was to use Equation 2 to expand the PF to a 0to 100 range thereby doubling the PF slope Thatrescaling is not the way to convert a 2AFC detection taskto the yesno task of the same detectability A signal de-tection methodology is needed for that conversion aswas done in the present analysis

One surprise is that the optimal testing probability isindependent of the PF slope b This finding is a generalproperty of PFs in which the dependence on the slopeparameter has the form xb The 94 level corresponds todcentYN 5 315 and dcent2AFC 5 223 For the commonly usedP 5 75 (z 5 0765 dcentYN 5 135 dcent2AFC 5 0954)var(Y ) 5 408(Nb2) which is about 25 times the opti-mal value obtained by placing the stimuli at 94 Green(1990) had previously pointed out that the 94 and 92levels were the optimal points to test for the cumulative

normal and logit functions respectively because of theminimum threshold estimate variability when one is test-ing at that point on the PF Other PFs can have optimalpoints at slightly lower levels but these are still above thelevels to which adaptive methods are normally directed

It is also useful to compare the 2AFC and yesno at thesame dcent rather than at their optimal points In that case fordcent values fixed at [1 2 3] the ratio of the 2AFC varianceto the yesno variance is [182 136 079] At low dcent thereis an advantage for the 2AFC method and that reverses athigh dcent as is expected from the optimal dcent being higher foryesno than for 2AFC In general one should run the yesno method at dcent levels above 20 (84 correct in the blankand signal intervals corresponds to dcent 5 2)

The equality of the optimal var(Y ) for 2AFC and yesno methods is not only surprising it seems paradoxicalmdashas can be seen by the following gedankenexperiment Sup-pose subjects close their eyes on the first interval of the2AFC trial and base their judgments on just the secondinterval as if they were doing a yesno task since blanksand stimuli are randomly intermixed Instead of sayingldquoyesrdquo or ldquonordquo they would respond ldquoInterval 2rdquo or ldquoInter-val 1rdquo respectively so that the 2AFC task would become ayesno task The paradox is that one would expect the vari-ance to be worse than for the eyes open 2AFC case sinceone is ignoring information However Equation29 showsthat the 2AFC and yesno methods have identical optimalvariance I suspect that the resolution of this paradox is thatin the yesno method with a stable criterion the optimalvariance occurs at a higher dcent value (315) than the 2AFCvalue (223) The dcent power law function that I chose doesnot saturate at large dcent levels giving a benefit to the yesno method

In order to check whether the paradox still holds witha more realistic PF (a dcent function that saturates at largedcent ) I now compare the 2AFC and yesno variance usinga Weibull function with a slope b In order to connect2AFC and yesno I need to use signal detection theoryand the StromeyerndashFoley dcent function (Equation 20) withthe ldquoWeibull parametersrdquo b 5 106b a 5 0614 and w 51 39b Following Equation 22 I pointed out that withthese parameter values the difference between the 2AFCWeibull function and the 2AFC dcent function (converted toa probability ordinate) is less than 0004 After the de-rivative Equation 25 is appropriately modified we findthat the optimal 2AFC variance is 342(Nb2) For the yesno case the optimal variance is 402(Nb2) This result re-solves the paradox The optimal 2AFC variance is about18 lower than the yesno variance so closing onersquos eyesin one of the intervals does indeed hurt

We shouldnrsquot forget that if one counts stimulus presen-tations Np rather than trials N the optimal 2AFC variancebecomes 684(Npb2) as opposed to 402(Npb2) for yesnoThus for the Linschoten experiments with each stimuluspresentation being slow the yesno methodology is moreaccurate for a given number of stimulus presentationseven at low dcent values Kaernbach (1990) compares 2AFC

var( ) exp( )

( )Y z

P P

N bz= ( )2

122

p

var( ) exp( )

( )Y z

P P

n bd= ( ) cent

412

2p

var( )( )

PP P

n= 1

MEASURING THE PSYCHOMETRIC FUNCTION 1435

and yesno adaptive methods with simulations and ac-tual experiments His abscissa is number of presenta-tions and his results are compatible with our findings

The objective yesno method that I have been discussingis inefficient since 50 of the trials are blanks Greaterefficiency is possible if several points on the PF are mea-sured (method of constant stimuli) with the number of blanktrials the same as at the other levels For example if fivelevels of stimuli were measured including blanks then theblanks would be only 20 of the total rather than 50 Forthe 2AFC paradigm the subject would still have to look at50 of the presentations being blanks The yesno eff i-ciency can be further improved by having the subjectrespond to the multiple stimuli with multiple ratings ratherthan a binary yesno response This rating scale methodof constant stimuli is my favorite method and I have beenusing it for 20 years (Klein amp Stromeyer 1980) The mul-tiple ratings allow one to measure the psychometric func-tion shape for dcents as large as 4 (well into the dipper regionwhere stimuli are slightly above threshold) It also allowsone to measure the ROC slope (presence of multiplicativenoise) Simulations of its benefits will be presented inKlein (2002)

Insight into how the number of responses affects thresh-old variance can be obtained by using the cumulative nor-mal PF (Equation 17) considered by Kaernbach (2001b)and Miller and Ulrich (2001) with g 5 0 Typically this isthe case of yesno discrimination but it can also be 2AFCdiscrimination with an ordinate going from 0 to 100as I have discussed in connection with Figure 3 For a cu-mulative normal PF with a binary response and for a sin-gle stimulus level being tested the threshold variance is(Equation 19) as follows

(30)

from Equation 27 and forthcoming Equations 40 and 41The optimal level to test is at P 5 05 so the optimal vari-ance becomes

(31)

If instead of a binary response the observer gives an ana-log response (many many rating categories) then the vari-ance is the familiar connection between standard deviationand standard error

(32)

With four response categories the variance is 119s2Nwhich is closer to the analog than to the binary case Ifstimuli with multiple strengths are intermixed (methodof constant stimuli) then the benefit of going from a bi-nary response to multiple responses is greater than that in-dicated in Equations 31ndash32 (Klein 2002)

A brief list of other problems with the 2AFC method ap-plied to detection comprises the following

1 2AFC requires the subject to memorize and thencompare the results of two subjective responses It is cog-nitively simpler to report each response as it arrives Sub-jects without much experience can easily make mistakes(lapses) simply becoming confused about the order of thestimuli Klein (1985) discusses memory load issues rele-vant to 2AFC

2 The 2AFC methodology makes various types ofanalysis and modeling more difficult because it requiresthat one average over all criteria The response variable in2AFC is the Interval 2 activation minus the Interval 1 ac-tivation (see the abscissa in Figure 2) This variable goesfrom minus infinity to plus infinity whereas the yesnovariable goes from zero to infinity It therefore seems moredifficult to model the effects of probability summation anduncertainty for 2AFC than for yesno

3 Models that relate psychophysical performance to un-derlying processes require information about how the noisevaries with signal strength (multiplicative noise) The rat-ing scale method of constant stimuli not only measures dcentas a function of contrast it also is able to provide an esti-mate of how the variance of the signal distribution (theROC slope) increases with contrast The 2AFC methodlacks that information

4 Both the yesno and 2AFC methods are supposed toeliminate the effects of bias Earlier I pointed out that in myrecent 2AFC discrimination experiments when the stim-ulus is near threshold I find I have a strong bias in favor ofresponding ldquoStimulus 2rdquo Until I learned to compensatefor this bias (not typically done by naiumlve subjects) my dcentwas underestimated As I have discussed it is possible tocompensate for this bias but that is rarely done

Improving the Brownian (Up ndashDown) StaircaseAdaptive methods with a goal of placing trials at an

optimal point on the PF are efficient Leek (2001) pro-vides a nice overview of these methods There are twogeneral categories of adaptive methods those based onmaximum (or mean) likelihood and those based on sim-pler staircase rules The present section is concernedwith the older staircase methods which I will callBrownian methods I used the name ldquoBrownianrdquo becauseof the upndashdown staircasersquos similarity to one-dimensionalBrownian motion in which the direction of each step israndom The familiar upndashdown staircase is Brownianbecause at each trial the decision about whether to go upor down is decided by a random factor governed by prob-ability P(x) I want to make this connection to Brownianmotion because I suspect that there are results in thephysics literature that are relevant to our psychophysicalstaircases I hope these connections get made

A Brownian staircase has a minimal memory of the runhistory it needs to remember the previous step (includingdirection) and the previous number of reversals or numberof trials (used for when to stop and for when to decreasestep size) A typical Brownian rule is to decrease signalstrength by one step (like 1 dB) after a correct responseand to increase it three steps after an incorrect response (a

var( thresh) = s 2

N

var( thresh) = timesp s2

2

N

var(var( )

( )exp thresh) =aeligegraveccedil

oumloslashdivide

= ( )P

N dPdy

P P zN2

22

2 1p s

1436 KLEIN

one downndashthree up rule) I was pleased to learn recentlythat his rule works nicely for an objective yesno task(Kaernbach 1990) as well as 2AFC as discussed earlier

In recent years likelihood methods (Pentland 1980Watson amp Pelli 1983) have become popular and havesomewhat displaced Brownian methods There is a wide-spread feeling that likelihood methods have substantiallysmaller threshold variance than do staircases in whichthresholds are obtained by simply averaging the levelstested Given the importance of this issue for informingresearchers about how best to measure thresholds it issurprising that there have been so few simulations com-paring staircases and optimal methods Pentlandrsquos (1980)results showing very large threshold variance for a stair-case method are so dramatically different from the re-sults of my simulations that I am skeptical The best pre-vious simulations of which I am aware are those of Green(1990) who compared the threshold variance from sev-eral 2AFC staircase methods with the ideal variance(Equation 24) He found that as long as the staircase ruleplaced trials above 80 correct the ratio of observed toideal threshold variance was about 14 This value is thesquare of the inverse of Greenrsquos finding that the ratio ofthe expected to the observed sigma is about 85

In my own simulations (Klein 2002) I found that aslong as the final step size was less than 02s the thresh-old variance from simply averaging over levels was closeto the optimal value Thus the staircase threshold varianceis about the same as the variance found by other optimalmethods One especially interesting finding was that thecommon method of averaging an even number of rever-sal points gave a variance that was about 10 higher thanaveraging over all levels This result made me wonderhow the common practice of averaging reversals ever gotstarted Since Green (1990) averaged over just the reversalpoints I suspect that his results are an overestimate of thestaircase variance In addition since Green specifies stepsize in decibels it is difficult for me to determine whetherhis final step size was small enough Since the thresholdvariance is inversely related to slope it is important to re-late step size to slope rather than to decibels The most nat-ural reference unit is s the standard deviation associatedwith the cumulative normal In summary I suggest that thisgeneral feeling of distrust of the simple Brownian staircaseis misguided and that simple staircases are often as goodas likelihood methods In the section Summary of Advan-tages of Brownian Staircase I point out several advantagesof the simple staircase over the likelihood methods

A qualification should be made to the claim above thataveraging levels of a simple staircase is as good as a fancylikelihood method If the staircase method uses a too smallstep size at the beginning and if the starting point is farfrom threshold then the simple staircase can be inefficientin reaching the threshold region However at the end ofthat run the astute experimenter should notice the largediscrepancy between the starting point and the thresholdand the run should be redone with an improved startingpoint or with a larger initial step size that will be reducedas the staircase proceeds

Increase number of 2AFC response categories In-dependent of the type of adaptive method used the 2AFCtask has a flaw whereby a series of lucky guesses at lowstimulus levels can erroneously drive adaptive methods tolevels that are too low Some adaptive methods with a smallnumber of trials are unable to fully recover from a string oflucky guesses early in the run One solution is to allow thesubject to have more than two responses For reasons thatI have never understood people like to have only two re-sponse categories in a 2AFC task Just as I am reluctant totake part in 2AFC experiments I am even more reluctantto take part in two response experiments The reason issimple I always feel that I have within me more than onebit of information after looking at a stimulus I usually haveat least four or more subjective states (2 bits) for a yesnotask HN LN LY HY where H and L are high and lowconfidence and N and Y are did not and did see it For a2AFC my internal states are H1 L1 L2 H2 for high andlow confidence on Intervals 1 and 2 Kaernbach (2001a) inthis special issue does an excellent job of justifying thelow-confidence response category He provides solid ar-guments simulations and experimental data on how alow-confidence category can minimize the ldquolucky guessrdquoproblem

Kaernbach (2001a) groups the L1 and L2 categoriestogether into a ldquodonrsquot knowrdquo category and argues per-suasively that the inclusion of this third response can in-crease the precision and accuracy of threshold estimateswhile also leaving the subject a happier person He callsit the ldquounforced choicerdquo method Most of his article con-cerns the understandable fear of introducing a responsebias exactly what 2AFC was invented to avoid He has de-veloped an intelligent rule for what stimulus level shouldfollow a ldquodonrsquot knowrdquo response Kaernbach shows withboth simulations and real data that with his method thispotential response bias problem has only a second-ordereffect on threshold estimates (similar to the 2AFC inter-val bias discussed around Equation 7) and its advantagesoutweigh the disadvantages I urge anyone using the 2AFCmethod to read it I conjecture that Kaernbachrsquos method isespecially useful in trimming the lower tail of the distri-bution of threshold estimates and also in reducing the in-terval bias of 2AFC tasks since the low confidence ratingwould tend to be used for trials on which there is the mostuncertainty

Although Kaernbach has managed to convince me ofthe usefulness of his three-response method he might havetrouble convincing others because of the response biasquestion I should like to offer a solution that might quellthe response bias fears Rather than using 2AFC with threeresponses one can increase the number of responses tofourmdashH1 L1 L2 H2mdashas discussed two paragraphsabove The L rating can be used when subjects feel theyare guessing

To illustrate how this staircase would work considerthe case in which we would like the 2AFC staircase to con-verge to about 84 correct A 5 upndash1 down staircase(5 levels up for a wrong response and 1 level down for acorrect response) leads to an average probability correct

MEASURING THE PSYCHOMETRIC FUNCTION 1437

of 56 5 8333 If one had zero information about thetrial and guessed then one would be correct half the timeand the staircase would shift an average of (5 1)2 512 levels Thus in his scheme with a single ldquodonrsquot knowrdquoresponse Kaernbach would have the staircase move up2 levels following a ldquodonrsquot knowrdquo response One mightworry that continued use of the ldquodonrsquot knowrdquo responsewould continuously increase the level so that one wouldget an upward bias But Kaernbach points out that theldquodonrsquot knowrdquo response replaces incorrect responses to agreater extent than it replaces correct responses so thatthe 12 level shift is actually a more negative shift than the15 shift associated with a wrong response For my sug-gestion of four responses the step shifts would be ndash1 1113 and 15 levels for responses HC LC LI and HIwhere H and L stand for high and low confidence and Cand I stand for correct and incorrect A slightly more con-servative set of responses would be ndash1 0 14 15 inwhich case the low-confidence correct judgment leaves thelevel unchanged In all these examples including Kaern-bachrsquos three-response method the low-confidence re-sponse has an average level shift of 12 when one is guess-ing The most important feature of the low confidencerating is that when one is below threshold giving a lowconfidence rating will prevent one from long strings oflucky guesses that bring one to an inappropriately lowlevel This is the situation that can be difficult to overcomeif it occurs in the early phase of a staircase when the stepsize can be large The use of the confidence rating wouldbe especially important for short runs where the extra bitof information from the subject is useful for minimizingerroneous estimates of threshold

Another reason not to be fearful of the multiple-response2AFC method is that after the run is finished one can es-timate threshold by using a maximum likelihood proce-dure rather than by averaging the run locations Standardlikelihood analysis would have to be modified for Kaern-bachrsquos three-response method in order to deal with theldquodonrsquot knowrdquo category For the suggested four-responsemethod one could simply ignore the confidence ratings forthe likelihood analysis My simulations discussed earlierin this section and Greenrsquos (1990) reveal that averagingover the levels (excluding a number of initial levels toavoid the starting point bias) has about as good a preci-sion and accuracy as the best of the likelihood methodsIf the threshold estimates from the likelihood method andfrom the averaging levels method disagree that run couldbe thrown out

Summary of advantages of Brownian staircaseSince there is no substantial variance advantage to thefancier likelihood methods one can pay more attention toa number of advantages associated with simple staircases

1 At the beginning of a run likelihood methods mayhave too little inertia (the sequential stimulus levels jumparound easily) unless a ldquostiff rdquo prior is carefully chosenBy inertia I refer to how difficult it is for the next level todeviate from the present level At the beginning of a runone would like to familiarize the subject with the stimu-

lus by presenting a series of stimuli that gradually increasein difficulty This natural feature of the Brownian stair-case is usually missing in QUEST-like methods Theslightly inefficient first few trials of a simple staircasemay actually be a benefit since it gives the subject somepractice with slightly visible stimuli

2 Toward the end of a maximum likelihood run one canhave the opposite problem of excessive inertia wherebyit is difficult to have substantial shifts of test contrastThis can be a problem if nonstationary effects are presentFor example thresholds can increase in mid-run owingto fatigue or to adaptation if a suprathreshold pedestal ormask is present In an old-fashioned Brownian staircasewith moderate inertia a quick look at the staircase trackshows the change of threshold in mid-run That observa-tion can provide an important clue to possible problemswith the current procedures A method that reveals non-stationarity is especially important for difficult subjectssuch as infants or patients in whom fatigue or inatten-tion can be an important factor

3 Several researchers have told me that the simplicity ofthe staircase rules is itself an advantage It makes writingprograms easier It is nicer for teaching students It alsosimplifies the uncertain business of guessing likelihoodpriors and communicating the methodology when writ-ing articles

Methods That Can Be Used to Estimate Slope as Well as Threshold

There are good reasons why one might want to learnmore about the PF than just the single threshold valueEstimating the PF slope is the first step in this directionTo estimate slope one must test the PF at more than onepoint Methods are available to do this (Kaernbach 2001bKontsevich amp Tyler 1999) and one could even adapt sim-ple Brownian staircase procedures to give better slope es-timates with minimum harm to threshold estimates (seethe comments in Strasburger 2001b)

Probably the best method for getting both thresholds andslopes is the Y (Psi) method of Kontsevich and Tyler(1999) The Y method is not totally novel it borrows tech-niques from many previous researchersmdashas Kontsevichand Tyler point out in a delightful paragraph that both clar-ifies their method and states its ancestry This paragraphsummarizes what is probably the most accurate and preciseapproach to estimating thresholds and slope so I reproduceit here

To summarize the Y method is a combination of solutionsknown from the literature The method updates the poste-rior probability distribution across the sampled space ofthe psychometric functions based on Bayesrsquo rule (Hall1968 Watson amp Pelli 1983) The space of the psychome-tric functions is two-dimensional (King-Smith amp Rose1997 Watson amp Pelli 1983) Evaluation of the psycho-metric function is based on computing the mean of theposterior probability distribution (Emerson 1986 King-Smith et al 1994) The termination rule is based on thenumber of trials as the most practical option (King-Smith

1438 KLEIN

et al 1994 Watson amp Pelli 1983) The placement of eachnew trial is based on one-step ahead minimum search(King-Smith 1984) of the expected entropy cost function(Pelli 1987)

A simplified rough description of the Y trial placementfor 2AFC runs is that initially trials are placed at about the90 correct level to establish a solid threshold Oncethreshold has been roughly established trials are placed atabout the 90 and the 70 levels to pin down the slopeThe precise levels depend on the assumed PF shape

Before one starts measuring slopes however one mustfirst be convinced that knowledge of slope can be impor-tant It is so for several reasons

1 Parametric fitting procedures either require knowl-edge of slope b or must have data that adequately con-strain the slope estimate Part of the discussion that fol-lows will be on how to obtain reliable threshold estimateswhen slope is not well known A tight knowledge of slopecan minimize the error of the threshold estimate

2 Strasburger (2001a 2001b) gives good argumentsfor the desirability of knowing the shape of the PF He isinterested in differences between foveal and peripheralvisual processing and he uses the PF slope as an indica-tion of differences

3 Slopes are different for different stimuli and thiscan lead to misleading results if slope is ignored It mayhelp to describe my experience with the Modelfest pro-ject (Carney et al 2000) to make this issue concreteThis continuing project involves a dozen laboratoriesaround the world We have measured detection thresh-olds for a wide variety of visual stimuli The data areused for developing an improved model of spatial visionSome of the stimuli were simple easily memorized lowspatial frequency patterns that tend to have shallowerslopes because a stimulus-known-exactly template canbe used efficiently and with minimal uncertainty On theother hand tiny Modelfest stimuli can have a good dealof spatial uncertainty and are therefore expected to pro-duce PFs with steeper slopes Because of the slope changethe pattern of thresholds will depend on the level atwhich threshold is measured When this rich dataset isanalyzed with f ilter models that make estimates formechanism bandwidths and mechanism pooling thesedifferent slope-dependent threshold patterns will lead todifferent estimates for the model parameters Several ofus in the Modelfest data gathering group argued that weshould use a data-gathering methodology that wouldallow estimation of PF slope of each of the 45 stimuliHowever I must admit that when I was a subject my pa-tience wore thin and I too chose the quicker runs Al-though I fully understand the attraction of short runs fo-cused on thresholds and ignoring slope I still think thereare good reasons for spreading trials out For one thinginterspersing trials that are slightly visible helps the sub-ject maintain memory of the test pattern

4 Slopes can directly resolve differences between mod-els In my own research for example my colleagues and

I are able to use the PF slope to distinguish long-range fa-cilitation that produces a low-level excitation or noise re-duction (typical with a cross-orientation surround) from ahigher level uncertainty reduction (typical with a same-orientation surround)

Throwing-Out Rules Goodness-of-FitThere are occasions when a staircase goes sour Most

often this happens when the subjectrsquos sensitivity changesin the middle of a run possibly because of fatigue It istherefore good to have a well-specified rule that allowsnonstationary runs to be thrown out The throwing-out rulecould be based on whether the PF slope estimate is toolow or too high Alternatively a standard approach is tolook at the chi-square goodness-of-f it statistic whichwould involve an assumption about the shape of the un-derlying PF I have a commentary in Section III regard-ing biases in the chi-square metric that were found byWichmann and Hill (2001a) Biases make it difficult touse standard chi-square tables The trouble with this ap-proach of basing the goodness-of-fit just on the cumulateddata is that it ignores the sequential history of the run TheBrownian upndashdown staircase makes the sequential historyvividly visible with a plot of sequentially tested levelsWhen fatigue or adaptation sets in one sees the tested lev-els shift from one stable point to another Leek Hanna andMarshall (1991) developed a method of assessing the non-stationarity of a psychometric function over the course ofits measurement using two interleaved tracks EarlierHall (1983) also suggested a technique to address the sameproblem I encourage further research into the develop-ment of a non-stationarity statistic that can be used to dis-card bad runs This statistic would take the sequential his-tory into account as well as the PF shape and the chi-square

The development of a ldquothrowing-outrdquo rule is the ex-treme case of the notion of weighted averaging One im-portant reason for developing good estimators for good-ness-of-fit and good estimators for threshold variance isto be able to optimally combine thresholds from repeatedexperiments I shall return to this issue in connectionwith the Wichmann and Hill (2001a) analysis of bias ingoodness-of-fit metrics

Final Thought The Method Should Depend on the Situation

Also important is that the method chosen should be ap-propriate for the task For example suppose one is usingwell-experienced observers for testing and retesting sim-ilar conditions in which one is looking for small changesin threshold or one is interested in the PF shape In suchcases one has a good estimate of the expected thresholdand the signal detection method of constant stimuli withblanks and with ratings might be best On the other handfor clinical testing in which one is highly uncertain aboutan individualrsquos threshold and criterion stability a 2AFClikelihood or staircase method (with a confidence rating)might be best

MEASURING THE PSYCHOMETRIC FUNCTION 1439

III DATA-FITTING METHODS FOR ESTIMATING THRESHOLD AND SLOPEPLUS GOODNESS-OF-FIT ESTIMATION

This final section of the paper is concerned with whatto do with the data after they have been collected Wehave here the standard Bayesian situation Given the PFit is easy to generate samples of data with confidence in-tervals But we have the inverse situation in which we aregiven the data and need to estimate properties of the PFand get confidence limits for those estimates The pair ofarticles in this special issue by Wichmann and Hill (2001a2001b) as well as the articles by King-Smith et al (1994)Treutwein (1995) and Treutwein and Strasburger (1999)do an outstanding job of introducing the issues and providemany important insights For example Emerson (1986)and King-Smith et al emphasize the advantages of meanlikelihood over maximum likelihood The speed of mod-ern computers makes it just as easy to calculate meanlikelihood as it is to calculate maximum likelihood As an-other example Treutwein and Strasburger (1999) demon-strate the power of using Bayesian priors for estimatingparameters by showing how one can estimate all four pa-rameters of Equation 1A in fewer than 100 trials This im-portant finding deserves further examination

In this section I will focus on four items First I will dis-cuss the relative merits of parametric versus nonparamet-ric methods for estimating the PF properties The paperby Miller and Ulrich (2001) provides a detailed analysis ofa nonparametric method for obtaining all the moments ofthe PF I will discuss some advantages and limitations oftheir method Second is the question of goodness-of-fitWichmann and Hill (2001a) show that when the numberof trials is small the goodness-of-fit can be very biasedI will examine the cause of that bias Third is the issue dis-cussed by Wichmann and Hill (2001a) that of how lapseseffect slope estimates This topic is relevant to Strasburg-errsquos (2001b) data and analysis Finally there is the possiblycontroversial topic of an upward slope bias in adaptivemethods that is relevant to the papers of Kaernbach (2001b)and Strasburger (2001b)

Parametric Versus Nonparametric FitsThe main split between methods for estimating pa-

rameters of the psychometric function is the distinctionbetween parametric and nonparametric methods In aparametric method one uses a PF that involves severalparameters and uses the data to estimate the parametersFor example one might know the shape of the PF and es-timate just the threshold parameter McKee et al (1985)show how slope uncertainty affects threshold uncertaintyin probit analysis The two papers by Wichmann and Hill(2001a 2001b) are an excellent introduction to many is-sues in parametric fitting An example of a nonparamet-ric method for estimating threshold would be to averagethe levels of a Brownian staircase after excluding the firstfour or so reversals in order to avoid the starting level biasBefore I served as guest editor of this special symposium

issue I believed that if one had prior knowledge of the exactPF shape a parametric method for estimating threshold wasprobably better than nonparametric methods After read-ing the articles presented here as well as carrying out myrecent simulations I no longer believe this I think thatin the proper situation nonparametric methods are as goodas parametric methods Furthermore there are situationsin which the shape of the PF is unknown and nonparamet-ric methods can be better

Miller and Ulrichrsquos Article on the Nonparametric SpearmanndashKaumlrber Method of Analysis

The standard way to extract information from psycho-metric data is to assume a functional shape such as a Wei-bull cumulative normal logit dcent and then vary the pa-rameters until likelihood is maximized or chi-square isminimized Miller and Ulrich (2001) propose using theSpearmanndashKaumlrber (SK) nonparametric method whichdoes not require a functional shape They start with a prob-ability density function defined by them as the derivativeof the PF

(33)

where s(i) is the stimulus intensity at the ith level andP(i) is the probability correct at the that level The notationpdf(i) is meant to indicate that it is the pdf for the intervalbetween i ndash 1 and i Miller and Ulrich give the r th momentof this distribution as their Equation 3

(34)

The next four equations are my attempt to clarify their ap-proach I do this because when applicable it is a finemethod that deserves more attention Equation 34 prob-ably came from the mean value theorem of calculus

(35)

where f (s) 5 sr11 and f cent(smid) 5 (r 1 1)smidr is the deriv-

ative of f with respect to s somewhere within the intervalbetween s(i 1) and s(i) For a slowly varying functionsmid would be near the midpoint Given Equation35 Equa-tion 34 becomes

(36)

where Ds 5 s(i) s(i 1) Equation 36 is what one wouldmean by the rth moment of the distribution The case r 50 is important since it gives the area under the pdf

(37)

by using the definition in Equation 33 The summation inEquation 37 gives the area under the pdf For it to be aproper pdf its area must be unity which means that theprobability at the highest level P(imax) must be unity andthe probability at the lowest level Pmin must be zero asMiller and Ulrich (2001) point out

m0 = =aringpdf( ) ( ) ( ) max mini s P i P ii

D

mrr

i

i s s= aring pdf mid( ) D

f s i f s i f s s i s i( ) ( ) ( ) ( ) ( ) [ ] [ ] = cent [ ]1 1mid

mrr r

ri s i s i=

+ [ ]aring + +11

11 1pdf( ) ( ) ( )

pdf( )( ) ( )

( ) ( )i

P i P i

s i s i= 1

1

1440 KLEIN

The next simplest SK moment is for r 5 1

(38)

from Equation 33 This first moment provides an esti-mate of the centroid of the distribution corresponding tothe center of the psychometric function It could be usedas an estimate of threshold for detection or PSE for dis-crimination An alternate estimator would be the medianof the pdf corresponding to the 50 point of the PF (theproper definition of PSE for discrimination)

Miller and Ulrich (2001) use Monte Carlo simulationsto claim that the SK method does a better job of extract-ing properties of the psychometric function than do para-metric methods One must be a bit careful here since intheir parametric fitting they generate the data by usingone PF and then fit it with other PFs But still it is a sur-prising claim if this simple method is so good why havepeople not been using it for years There are two possibleanswers to this question One is that people never thoughtof using it before Miller and Ulrich The other possibil-ity is that it has problems I have concluded that the an-swer is a combination of both factors I think that there isan important place for the SK approach but its limitationsmust be understood That is my next topic

The most important limitation of the SK nonparamet-ric method is that it has only been shown to work well formethod of constant stimulus data in which the PF goesfrom 0 to 1 I question whether it can be applied with thesame confidence to 2AFC and to adaptive procedures withunequal numbers of trials at each level The reason for thislimitation is that the SK method does not offer a simpleway to weight the different samples Let us consider twosituations with unequal weighting adaptive methods and2AFC detection tasks

Adaptive methods place trials near threshold but withvery uneven distribution of number of trials at differentlevels The SK method gives equal weighting to sparselypopulated levels that can be highly variable thus con-tributing to a greater variance of parameter estimates Letus illustrate this point for an extreme case with five levelss 5 [1 2 3 4 5] Suppose the adaptive method placed [11 1 98 1] trials at the five levels and the number correctwere [0 1 1 49 1] giving probabilities [0 1 1 5 1] andpdf 5 [1 0 5 5] The following PEST-like (Taylor ampCreelman 1967) rules could lead to this unusual dataMove down a level if the presently tested level has greaterthan P1 correct move up three levels if the presentlytested level has less than P2 correct P1 can be anynumber less than 33 and P2 can be any number greaterthan 67 The example data would be produced by start-ing at Level 5 and getting four in a row correct followedby an error followed by a wide set possible of alternat-ing responses for which the PEST rules would leave thefourth level as the level to be tested The SK estimate of the

center of the PF is given by Equation 38 to be micro1 5 200Since a pdf with a negative probability is distasteful tosome one can monotonize the PF according to the proce-dure of Miller and Ulrich (which does include the numberof trials at each level) The new probabilities are [0 5151 51 1] with pdf 5 [51 0 0 49] The new estimate ofthreshold is micro1 5 297 However given the data with 4998correct at Level 4 an estimate that includes a weightingaccording to the number of trials at each level would placethe centroid of the distribution very close to micro1 5 4 Thuswe see that because the SK procedure ignores the numberof trials at each level it produces erroneous estimates ofthe PF properties This example with shifting thresholdswas chosen to dramatize the problems of unequal weight-ing and the effect of monotonizing

A similar problem can occur even with the method ofconstant stimuli with equal number of trials at each levelbut with unequal binomial weighting as occurs in 2AFCdata The unequal weighting occurs because of the 50lower asymptote The binomial statistics error bar of dataat 55 correct is much larger than for data at 95 cor-rect Parametric procedures such as probit analysis giveless weighting to the lower data This is why the most ef-ficient placement of trials in 2AFC is above 90 correctas I have discussed in Section II Thus it is expected thatfor 2AFC the SK method would give threshold estimatesless precise than the estimates given by a parametricmethod that includes weighting

I originally thought that the SK method would fail on alltypes of 2AFC data However in the discussion followingEquation 7 I pointed out that for 2AFC discriminationtasks there is a way of extending the data to the 0ndash100range that produces weightings similar to those for a yesnotask Rather than use the standard Equation 2 to deal withthe 50 lower asymptote one should use the method dis-cussed in connection with Figure 3 in which one can usefor the ordinate the percent of correct responses in Inter-val 2 and use the signal strength in Interval 2 minus thatin Interval 1 for the abscissa That procedure produces aPF that goes from 0 to 100 thereby making it suit-able for SK analysis while also removing a possible bias

Now let us examine the SK method as applied to datafor which it is expected to work such as discriminationPFs that go from 0 to 100 These are tasks in which thelocation of the psychometric function is the PSE and theslope is the threshold

A potentially important claim by Miller and Ulrich(2001) is that the SK method performs better than probitanalysis This is a surprising claim since their data aregenerated by a probit PF one would expect probit analy-sis to do well especially since probit analysis does espe-cially well with discrimination PFs that go from 0 to 100To help clarify this issue Miller and I did a number ofsimulations We focused on the case of N 5 40 with 8 tri-als at each of 5 levels The cumulative normal PF (Equa-tion 14) was used with equally spaced z scores and with thelowest and highest probabilities being 1 and 99 Theprobabilities at the sample points are P(i) 5 1 1222

m1

2 21

2

11

2

=

= [ ] +

aring

aring

pdf( )( ) ( )

( ) ( )( ) ( )

is i s i

P i P is i s i

MEASURING THE PSYCHOMETRIC FUNCTION 1441

50 8778 and 99 corresponding to z scores ofz(i) 5 ndash2328 1164 0 1164 and 2328 Nonmonot-onicity is eliminated as Miller and Ulrich suggest (seethe Appendix for Matlab Code 4 for monotonizing) I didMonte Carlo simulations to obtain the standard deviationof the location of the PF micro1 given by Equation38 I foundthat both the SK method and the parametric probit method(assuming a fixed unity slope) gave a standard deviationof between 028 and 029 Miller and Ulrichrsquos result re-ported in their Table 17 gives a standard error of 0071However their Gaussian had a standard deviation of 025whereas my z-score units correspond to a standard devi-ation of unity Thus their standard error must be multipliedby 4 giving a standard error of 0284 in excellent agree-ment with my value So at least in this instance the SKmethod did not do better than the parametric method (withslope fixed) and the surprise mentioned at the beginningof this paragraph did not materialize I did not examineother examples where nonparametric methods might dobetter than the matched parametric method However thesimulations of McKee et al (1985) give some insight intowhy probit analysis with floating slope can do poorly evenon data generated from probit PFs McKee et al showthat when the number of trials is low and if stimulus lev-els near both 0 and 100 are not tested the slope can bevery flat and the predicted threshold can be quite distantIf care is not taken to place a bound on these probit esti-mates one would expect to find deviant threshold esti-mates The SK method on the other hand has built-inbounds for threshold estimates so outliers are avoidedThese bounds could explain why SK can do better thanprobit When the PF slope is uncertain nonparametricmethods might work better than parametric methods adecided advantage for the SK method

It is instructive to also do an analytic calculation of thevariance based on an analysis similar to what was doneearlier If only the ith level of the five had been testedthe variance of the PF location (the PSE in a discrimina-tion task) would have been as follows (Gourevitch ampGalanter 1967)

(39)

where var(P) is given by Equation 27 and PF slope is

(40)

where for the cumulative normal dzi dxi 5 1s where sis the standard deviation of the pdf and

(41)

from Equation 3A By combining these equations one gets

(42)

Note that if all trials were placed at the optimal stimuluslocation of z 5 0 (P 5 5) Equation 42 would becomes2p 2Ni as discussed earlier When all five levels arecombined the total variance is smaller than each individ-ual variance as given by the variance summation formula

(43)

Finally upon plugging in the five values of zi rangingfrom ndash2328 to 12328 and Pi ranging from 1 to 99the standard error of the PF location is

(44)

This value is precisely the value obtained in our MonteCarlo simulations of the nonparametric SK method andalso by the slope fixed parametric probit analysis Thistriple confirmation shows that the SK method is excel-lent in the realm where it applies (equal number of trialsat levels that span 0 to 100) It also confirms our sim-ple analytic formula

The data presented in Miller and Ulrichrsquos Table 17 giveus a fine opportunity to ask about the efficiency of themethod of constant stimuli that they use For the N 5 40example discussed here the variance is

(45)

This value is more than twice the variance of p (2 Ntot)that is obtained for the same PF using adaptive methodsincluding the simple upndashdown staircase that place trialsnear the optimal point

The large variance for the Miller and Ulrich stimulusplacement is not surprising given that of the five stimu-lus levels the two at P 5 1 and 99 are quite inefficientThe inefficient placement of trials is a common problemfor the method of constant stimuli as opposed to adaptivemethods However as I mentioned earlier the method ofconstant stimuli combined with multiple response ratingsin a signal detection framework may be the best of allmethods for detection studies for revealing properties of thesystem near threshold while maintaining a high efficiency

A curious question comes up about the optimal placementof trials for measuring the location of the psychometric func-tion With the original function the optimal placement is atthe steepest part of the function (at P 5 0) However withthe SK method when one is determining the location ofthe pdf one might think that the optimal placement of sam-ples should occur at the points where the pdf is steepestThis contradiction is resolved when one realizes that onlythe original samples are independent The pdf samples areobtained by taking differences of adjacent PF values soneighboring samples are anticorrelated Thus one cannot

var tottot

= =SEN

2 3 24

SE = =var tot12 0 2844

var var tot = aring1 1i

var( )exp

PSE i i i

i

i

P Pz

N= ( ) ( )

2 12

2

ps

dP

dz

zi

i

iaeligegraveccedil

oumloslashdivide

=( )2 2

2

exp

p

dP

dx

dP

dz

dz

dxi

i

i

i

i

i

= times

var( )var

PSE i

i

i

i

P

dP

dx

=( )

aeligegraveccedil

oumloslashdivide

2

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

1432 KLEIN

Complications With Strasburgerrsquos Advocacy of Maximum Slope

Here I will mention three complications implicated inStrasburgerrsquos suggestion of defining slope at the point ofmaximum slope on a logarithmic abscissa

1 As can be seen in Equation 11 the slopes on linearand logarithmic abscissas are related by a factor of 1xtBecause of this extra factor the maximum slope occursat a lower stimulus strength on linear as compared withlog axes This makes the notion of maximum slope lessfundamental However since we are usually interested inthe percent error of threshold estimates it turns out thata logarithmic abscissa is the most relevant axis support-ing Strasburgerrsquos log abscissa definition

2 The maximum slope is not necessarily at threshold(as Strasburger points out) For the Weibull functions de-fined in Equation 13 with k 5 exp( 1) and the cumula-tive normal function in Equation 14 the maximum slope(on a log axis) does occur at threshold However the twopoints are decoupled in the generalized Weibull functiondefined in Equation 12 For the Quick version of theWeibull function (Equation 13 with k 5 05 placingthreshold halfway up the PF) the threshold is below thepoint of maximum slope the derivative of P(xt) atthreshold is

which is slightly different from the maximum slope asgiven by Equation 15 Similarly when threshold is de-fined at dcent 5 1 the threshold is not at the point of max-imum slope People who fit psychometric functions wouldprobably prefer reporting slopes at the detection thresh-old rather than at the point of maximum slope

3 One of the most important considerations in selectinga point at which to measure threshold is the question ofhow to minimize the bias and standard error of the thresh-old estimate The goal of adaptive methods is to place tri-als at one level In order to avoid a threshold bias due to aimproper estimate of PF slope the test level should be atthe defined threshold The variance of the threshold esti-mate when the data are concentrated as a single level (aswith adaptive procedures) is given by Gourevitch and Gal-anter (1967) as

(19)

If the binomial error factor of P(1 P)N were not presentthe optimal placement of trials for measuring thresholdwould be at the point of maximum slope However thepresence of the P(1 P) factor shifts the optimal point toa higher level Wichmann and Hill (2001b) consider howtrial placement affects threshold variance for the methodof constant stimuli This topic will be considered in Sec-tion II

Connecting the Weibull and dcent FunctionsFigures 5ndash7 of Strasburger (2001a) compare the logndash

log dcent slope b with the Weibull PF slope b Strasburgerrsquosconnection of b 5 088b is problematic since the dcent ver-sion of the Weibull PF does not have a fixed logndashlog slopeAn improved dcent representation of the Weibull function isthe topic of the present section A useful parameterizationfor dcent which I shall refer to as the StromeyerndashFoley func-tion was introduced by Stromeyer and Klein (1974) andused extensively by Foley (1994)

(20)

where xt is the stimulus strength in threshold units as inEquation 8 The factors with a in the denominator are pres-ent so that dcent equals unity at threshold (xt 5 1 or x 5 a)At low xt dcent x t

ba The exponent b (the logndashlog slope ofdcent at very low contrasts) controls the amount of facilita-tion near threshold At high xt Equation 20 becomes dcentx t

1 w (1 a) The parameter w is the logndash log slope of thetest threshold versus pedestal contrast function (the tvc orldquodipperrdquo function) at strong pedestals One must be a bitcautious however because in yesno procedures the tvcslope can be decoupled from w if the signal detection ROCcurve has nonunity slope (Stromeyer amp Klein 1974) Theparameter a controls the point at which the dcent function be-gins to saturate Typical values of these unitless parametersare b 5 2 w 5 05 and a 5 06 (Yu Klein amp Levi 2001)The function in Equation 20 accelerates at low contrast(logndashlog slope of b) and decelerates at high contrasts (logndashlog slope of 1 w) in general agreement with a broadrange of data

For 2AFC z 5 dcentIuml2 so from Equation 3 the connec-tion between dcent and probability correct is

(21)

To establish the connection between the PFs specified inEquations12 and 20ndash21 one must first have the two agreeat threshold For 2AFC with the dcent 5 1 the threshold atP 5 7602 can be enforced by choosing k 5 04795 (seethe discussion following Equation 18C) so that Equa-tions 1 and 12 become

(22)

Modifying k as in Equation 22 leaves the Weibull shapeunchanged and shifts only the definition of threshold Ifb 5 106b w 5 1 039b and a 5 0614 then for allvalues of b the Weibull and dcent functions (Equations 21and 22 vs Equation 12) differ by less than 00004 for allstimulus strengths At very low stimulus strengths b 5b The value b 5 106b is a compromise for getting anoverall optimal fit

Strasburger (2001a) is concerned with the same issueIn his Table 1 he reports dcent logndashlog slopes of [8847

P xtxt( ) = 1 1 5 4795( )

b

P xd x

tt( ) = +

cent( )eacute

eumlecircecirc

5 5erf2

cent( ) =d xx

a a xt

tb

tb w+ 1 + 1( )

var( )( )

YP P

N

dP y

dy

t

t

=( )eacute

eumlecircecirc

12

cent =( )

= =b g b g bthresh

dP x

dx

t

t

e( )log ( )

( ) 12

2347 1

MEASURING THE PSYCHOMETRIC FUNCTION 1433

18379 3131 4421] for b 5 [1 2 35 5] If the dcent slopeis taken to be a measure of the dcent exponent b then theratio bb is [08847 09190 08946 08842] for the fourb values Our value of bb 5 106 differs substantiallyfrom 08 (Pelli 1985) and 088 (Strasburger 2001a) Pelli(1985) and Strasburger (2001a) used dcent functions with a 51 in Equation 20 With a 5 0614 our dcent function (Equa-tion 20) starts saturating near threshold in agreement withexperimental data The saturation lowers the effective valueof b I was very surprised to find that the StromeyerndashFoley dcent function did such a good job in matching theWeibull function across the whole range of b and x Fora long time I had thought a different fit would be neededfor each value of b

For the same Weibull function in a yesno method (falsealarm rate 5 50) the parameters b and w are identicalto the 2AFC case above Only the value of a shifts froma 5 0614 (2AFC) to a 5 054 (yesno) In order to makext 5 1 at dcent 5 1 k becomes 03173 (see Equation 18C)instead of 04795 (2AFC) As with 2AFC the differencebetween the Weibull and dcent function is less than 0004(lt04) for all stimulus strengths and all values of b andfor both a linear and a logarithmic abscissa These valuesmean that for both the yes no and the 2AFC methods thedcent function (Equation 20) corresponding to a Weibullfunction with b 5 2 has a low-contrast logndashlog slope ofb 5 212 and a high-contrast logndashlog slope of 1 w 5078 The high-contrast tvc (test contrast vs pedestal con-trast discrimination function sometimes called the ldquodip-perrdquo function) logndashlog slope would be w 5 022

I also carried out a similar analysis asking what cumu-lative normal s is best matched to the Weibull I founds 5 11720b However the match is poor with errors of4 (rather than lt004 in the preceding analysis) Thepoor fit of the two functions makes the fitting proceduresensitive to the weighting used in the fitting procedure Ifone uses a weighting factor of [P(1 P)N] where P is thePF one will find a quite different value for s than if oneignored the weighting or if one chose the weighting on thebasis of the data rather than the fitting function

II COMPARING EXPERIMENTAL TECHNIQUES FOR MEASURING

THE PSYCHOMETRIC FUNCTION

This section inspired by Linschoten et al (2001) is anexamination of various experimental methods for gather-ing data to estimate thresholds Linschoten et al comparethree 2AFC methods for measuring smell and taste thresh-olds a maximum-likelihood adaptive method an upndashdownstaircase method and an ascending method of limits Thespecial aspect of their experiments was that in measuringsmell and taste each 2AFC trial takes between 40 and60 sec (Lew Harvey personal communication) because ofthe long duration needed for the nostrils or mouth to re-cover their sensitivity The purpose of the Linschoten et alpaper is to examine which of the three methods is mostaccurate when the number of trials is low Linschoten et al

conclude that the 2AFC method is a good method for thistask My prior experience with forced choice tasks and lownumber of trials (Manny amp Klein 1985 McKee et al1985) made me wonder whether there might be better meth-ods to use in circumstances in which the number of trialsis limited This topic is considered in Section II

Compare 2AFC to YesNoThe commonly accepted framework for comparing

2AFC and yesno threshold estimates is signal detectiontheory Initially I will make the common assumption thatthe dcent function is a power function (Equation 20 witha 5 1)

(23)

Later in this section the experimentally more accurateform Equation 20 with a 1 will be used The varianceof the threshold estimate will now be calculated for 2AFCand yesno for the simplest case of trials placed at a sin-gle stimulus level y the goal of most adaptive methodsThe PF slope relates ordinate variance to abscissa vari-ance (from Gourevitch amp Galanter 1967)

(24)

where Y 5 y yt is the threshold estimate for a natural logabscissa as given in Equation 9 This formula for var(Y )is called the asymptotic formula in that it becomes exactin the limit of large number of total trials N Wichmannand Hill (2001b) show that for the method of constantstimuli a proper choice of testing levels brings one to theasymptotic region for N 120 (the smallest value thatthey explored) Improper choice of testing levels requiresa larger N in order to get to the asymptotic region Equa-tion 24 is based on having the data at a single level as oc-curs asymptotically in adaptive procedures In addition thedefinition of threshold must be at the point being testedIf levels are tested away from threshold there will be anadditional contribution to the threshold variance (Finney1971 McKee et al 1985) if the slope is a variable param-eter in the fitting procedure Equation 24 reaches its as-ymptotic value more rapidly for adaptive methods with fo-cused placement of trials than with the constant stimulusmethod when slope is a free parameter

Equation 24 needs analytic expressions for the deriva-tive and the variance of dcent The derivative is obtained fromEquation 23

(25)

The variance of dcent for both yesno (dcent 5 zhit 1 zcorrect rejection)and 2AFC (dcent 5 Iuml2zave) is

(26)var var( ) exp var( ) cent( ) = = ( )d z z P2 4 2p

dddy

bdt

cent = cent

var( )var( )

Yd

dddyt

= cent

centaeligegraveccedil

oumloslashdivide

2

cent = = ( )d c bytb

texp

1434 KLEIN

For the yes no case Equation 26 is based on a unityslope ROC curve with the subjectrsquos criterion at the neg-ative diagonal of the ROC curve where the hit rate equalsthe correct rejection rate This would be the most effi-cient operating point The variance for a nonunity slopeROC curve was considered by Klein Stromeyer andGanz (1974) From binomial statistics

(27)

with n 5 N for 2AFC and n 5 N2 for yesno since forthis binary yesno task the number of signal and blanktrials is half the total number of trials

Putting all of these factors together gives

(28)

for both the yesno and the 2AFC cases The only dif-ference between the two cases is the definition of n andthe definition of dcent (dcent2AFC 5 205z and dcentYN 5 2z for asymmetric criterion) When this is expressed in terms ofN (number of trials) and z (z score of probability cor-rect) we get

(29)

for both 2AFC and yesno That is when the percent cor-rects are all equal (P2AFC 5 Phit 5 Pcorrect rejection) thethreshold variance for 2AFC and yesno are equal Theminimum value of var(Y ) is 164(Nb2) occurring at z 5157 (P 5 94) This value of 94 correct is the optimumvalue at which to test for both 2AFC and yesno tasks in-dependent of b

This result that the threshold variance is the same for2AFC and yesno methods seems to disagree with Figure 4of McKee et al (1985) where the standard errors of thethreshold estimates are plotted for 2AFC and yesno asa function of the number of trials The 2AFC standarderror (their Figure 4) is twice that of yesno correspond-ing to a factor of four in variance However the McKeeet al analysis is misleading since all they (we) did for theyesno task was to use Equation 2 to expand the PF to a 0to 100 range thereby doubling the PF slope Thatrescaling is not the way to convert a 2AFC detection taskto the yesno task of the same detectability A signal de-tection methodology is needed for that conversion aswas done in the present analysis

One surprise is that the optimal testing probability isindependent of the PF slope b This finding is a generalproperty of PFs in which the dependence on the slopeparameter has the form xb The 94 level corresponds todcentYN 5 315 and dcent2AFC 5 223 For the commonly usedP 5 75 (z 5 0765 dcentYN 5 135 dcent2AFC 5 0954)var(Y ) 5 408(Nb2) which is about 25 times the opti-mal value obtained by placing the stimuli at 94 Green(1990) had previously pointed out that the 94 and 92levels were the optimal points to test for the cumulative

normal and logit functions respectively because of theminimum threshold estimate variability when one is test-ing at that point on the PF Other PFs can have optimalpoints at slightly lower levels but these are still above thelevels to which adaptive methods are normally directed

It is also useful to compare the 2AFC and yesno at thesame dcent rather than at their optimal points In that case fordcent values fixed at [1 2 3] the ratio of the 2AFC varianceto the yesno variance is [182 136 079] At low dcent thereis an advantage for the 2AFC method and that reverses athigh dcent as is expected from the optimal dcent being higher foryesno than for 2AFC In general one should run the yesno method at dcent levels above 20 (84 correct in the blankand signal intervals corresponds to dcent 5 2)

The equality of the optimal var(Y ) for 2AFC and yesno methods is not only surprising it seems paradoxicalmdashas can be seen by the following gedankenexperiment Sup-pose subjects close their eyes on the first interval of the2AFC trial and base their judgments on just the secondinterval as if they were doing a yesno task since blanksand stimuli are randomly intermixed Instead of sayingldquoyesrdquo or ldquonordquo they would respond ldquoInterval 2rdquo or ldquoInter-val 1rdquo respectively so that the 2AFC task would become ayesno task The paradox is that one would expect the vari-ance to be worse than for the eyes open 2AFC case sinceone is ignoring information However Equation29 showsthat the 2AFC and yesno methods have identical optimalvariance I suspect that the resolution of this paradox is thatin the yesno method with a stable criterion the optimalvariance occurs at a higher dcent value (315) than the 2AFCvalue (223) The dcent power law function that I chose doesnot saturate at large dcent levels giving a benefit to the yesno method

In order to check whether the paradox still holds witha more realistic PF (a dcent function that saturates at largedcent ) I now compare the 2AFC and yesno variance usinga Weibull function with a slope b In order to connect2AFC and yesno I need to use signal detection theoryand the StromeyerndashFoley dcent function (Equation 20) withthe ldquoWeibull parametersrdquo b 5 106b a 5 0614 and w 51 39b Following Equation 22 I pointed out that withthese parameter values the difference between the 2AFCWeibull function and the 2AFC dcent function (converted toa probability ordinate) is less than 0004 After the de-rivative Equation 25 is appropriately modified we findthat the optimal 2AFC variance is 342(Nb2) For the yesno case the optimal variance is 402(Nb2) This result re-solves the paradox The optimal 2AFC variance is about18 lower than the yesno variance so closing onersquos eyesin one of the intervals does indeed hurt

We shouldnrsquot forget that if one counts stimulus presen-tations Np rather than trials N the optimal 2AFC variancebecomes 684(Npb2) as opposed to 402(Npb2) for yesnoThus for the Linschoten experiments with each stimuluspresentation being slow the yesno methodology is moreaccurate for a given number of stimulus presentationseven at low dcent values Kaernbach (1990) compares 2AFC

var( ) exp( )

( )Y z

P P

N bz= ( )2

122

p

var( ) exp( )

( )Y z

P P

n bd= ( ) cent

412

2p

var( )( )

PP P

n= 1

MEASURING THE PSYCHOMETRIC FUNCTION 1435

and yesno adaptive methods with simulations and ac-tual experiments His abscissa is number of presenta-tions and his results are compatible with our findings

The objective yesno method that I have been discussingis inefficient since 50 of the trials are blanks Greaterefficiency is possible if several points on the PF are mea-sured (method of constant stimuli) with the number of blanktrials the same as at the other levels For example if fivelevels of stimuli were measured including blanks then theblanks would be only 20 of the total rather than 50 Forthe 2AFC paradigm the subject would still have to look at50 of the presentations being blanks The yesno eff i-ciency can be further improved by having the subjectrespond to the multiple stimuli with multiple ratings ratherthan a binary yesno response This rating scale methodof constant stimuli is my favorite method and I have beenusing it for 20 years (Klein amp Stromeyer 1980) The mul-tiple ratings allow one to measure the psychometric func-tion shape for dcents as large as 4 (well into the dipper regionwhere stimuli are slightly above threshold) It also allowsone to measure the ROC slope (presence of multiplicativenoise) Simulations of its benefits will be presented inKlein (2002)

Insight into how the number of responses affects thresh-old variance can be obtained by using the cumulative nor-mal PF (Equation 17) considered by Kaernbach (2001b)and Miller and Ulrich (2001) with g 5 0 Typically this isthe case of yesno discrimination but it can also be 2AFCdiscrimination with an ordinate going from 0 to 100as I have discussed in connection with Figure 3 For a cu-mulative normal PF with a binary response and for a sin-gle stimulus level being tested the threshold variance is(Equation 19) as follows

(30)

from Equation 27 and forthcoming Equations 40 and 41The optimal level to test is at P 5 05 so the optimal vari-ance becomes

(31)

If instead of a binary response the observer gives an ana-log response (many many rating categories) then the vari-ance is the familiar connection between standard deviationand standard error

(32)

With four response categories the variance is 119s2Nwhich is closer to the analog than to the binary case Ifstimuli with multiple strengths are intermixed (methodof constant stimuli) then the benefit of going from a bi-nary response to multiple responses is greater than that in-dicated in Equations 31ndash32 (Klein 2002)

A brief list of other problems with the 2AFC method ap-plied to detection comprises the following

1 2AFC requires the subject to memorize and thencompare the results of two subjective responses It is cog-nitively simpler to report each response as it arrives Sub-jects without much experience can easily make mistakes(lapses) simply becoming confused about the order of thestimuli Klein (1985) discusses memory load issues rele-vant to 2AFC

2 The 2AFC methodology makes various types ofanalysis and modeling more difficult because it requiresthat one average over all criteria The response variable in2AFC is the Interval 2 activation minus the Interval 1 ac-tivation (see the abscissa in Figure 2) This variable goesfrom minus infinity to plus infinity whereas the yesnovariable goes from zero to infinity It therefore seems moredifficult to model the effects of probability summation anduncertainty for 2AFC than for yesno

3 Models that relate psychophysical performance to un-derlying processes require information about how the noisevaries with signal strength (multiplicative noise) The rat-ing scale method of constant stimuli not only measures dcentas a function of contrast it also is able to provide an esti-mate of how the variance of the signal distribution (theROC slope) increases with contrast The 2AFC methodlacks that information

4 Both the yesno and 2AFC methods are supposed toeliminate the effects of bias Earlier I pointed out that in myrecent 2AFC discrimination experiments when the stim-ulus is near threshold I find I have a strong bias in favor ofresponding ldquoStimulus 2rdquo Until I learned to compensatefor this bias (not typically done by naiumlve subjects) my dcentwas underestimated As I have discussed it is possible tocompensate for this bias but that is rarely done

Improving the Brownian (Up ndashDown) StaircaseAdaptive methods with a goal of placing trials at an

optimal point on the PF are efficient Leek (2001) pro-vides a nice overview of these methods There are twogeneral categories of adaptive methods those based onmaximum (or mean) likelihood and those based on sim-pler staircase rules The present section is concernedwith the older staircase methods which I will callBrownian methods I used the name ldquoBrownianrdquo becauseof the upndashdown staircasersquos similarity to one-dimensionalBrownian motion in which the direction of each step israndom The familiar upndashdown staircase is Brownianbecause at each trial the decision about whether to go upor down is decided by a random factor governed by prob-ability P(x) I want to make this connection to Brownianmotion because I suspect that there are results in thephysics literature that are relevant to our psychophysicalstaircases I hope these connections get made

A Brownian staircase has a minimal memory of the runhistory it needs to remember the previous step (includingdirection) and the previous number of reversals or numberof trials (used for when to stop and for when to decreasestep size) A typical Brownian rule is to decrease signalstrength by one step (like 1 dB) after a correct responseand to increase it three steps after an incorrect response (a

var( thresh) = s 2

N

var( thresh) = timesp s2

2

N

var(var( )

( )exp thresh) =aeligegraveccedil

oumloslashdivide

= ( )P

N dPdy

P P zN2

22

2 1p s

1436 KLEIN

one downndashthree up rule) I was pleased to learn recentlythat his rule works nicely for an objective yesno task(Kaernbach 1990) as well as 2AFC as discussed earlier

In recent years likelihood methods (Pentland 1980Watson amp Pelli 1983) have become popular and havesomewhat displaced Brownian methods There is a wide-spread feeling that likelihood methods have substantiallysmaller threshold variance than do staircases in whichthresholds are obtained by simply averaging the levelstested Given the importance of this issue for informingresearchers about how best to measure thresholds it issurprising that there have been so few simulations com-paring staircases and optimal methods Pentlandrsquos (1980)results showing very large threshold variance for a stair-case method are so dramatically different from the re-sults of my simulations that I am skeptical The best pre-vious simulations of which I am aware are those of Green(1990) who compared the threshold variance from sev-eral 2AFC staircase methods with the ideal variance(Equation 24) He found that as long as the staircase ruleplaced trials above 80 correct the ratio of observed toideal threshold variance was about 14 This value is thesquare of the inverse of Greenrsquos finding that the ratio ofthe expected to the observed sigma is about 85

In my own simulations (Klein 2002) I found that aslong as the final step size was less than 02s the thresh-old variance from simply averaging over levels was closeto the optimal value Thus the staircase threshold varianceis about the same as the variance found by other optimalmethods One especially interesting finding was that thecommon method of averaging an even number of rever-sal points gave a variance that was about 10 higher thanaveraging over all levels This result made me wonderhow the common practice of averaging reversals ever gotstarted Since Green (1990) averaged over just the reversalpoints I suspect that his results are an overestimate of thestaircase variance In addition since Green specifies stepsize in decibels it is difficult for me to determine whetherhis final step size was small enough Since the thresholdvariance is inversely related to slope it is important to re-late step size to slope rather than to decibels The most nat-ural reference unit is s the standard deviation associatedwith the cumulative normal In summary I suggest that thisgeneral feeling of distrust of the simple Brownian staircaseis misguided and that simple staircases are often as goodas likelihood methods In the section Summary of Advan-tages of Brownian Staircase I point out several advantagesof the simple staircase over the likelihood methods

A qualification should be made to the claim above thataveraging levels of a simple staircase is as good as a fancylikelihood method If the staircase method uses a too smallstep size at the beginning and if the starting point is farfrom threshold then the simple staircase can be inefficientin reaching the threshold region However at the end ofthat run the astute experimenter should notice the largediscrepancy between the starting point and the thresholdand the run should be redone with an improved startingpoint or with a larger initial step size that will be reducedas the staircase proceeds

Increase number of 2AFC response categories In-dependent of the type of adaptive method used the 2AFCtask has a flaw whereby a series of lucky guesses at lowstimulus levels can erroneously drive adaptive methods tolevels that are too low Some adaptive methods with a smallnumber of trials are unable to fully recover from a string oflucky guesses early in the run One solution is to allow thesubject to have more than two responses For reasons thatI have never understood people like to have only two re-sponse categories in a 2AFC task Just as I am reluctant totake part in 2AFC experiments I am even more reluctantto take part in two response experiments The reason issimple I always feel that I have within me more than onebit of information after looking at a stimulus I usually haveat least four or more subjective states (2 bits) for a yesnotask HN LN LY HY where H and L are high and lowconfidence and N and Y are did not and did see it For a2AFC my internal states are H1 L1 L2 H2 for high andlow confidence on Intervals 1 and 2 Kaernbach (2001a) inthis special issue does an excellent job of justifying thelow-confidence response category He provides solid ar-guments simulations and experimental data on how alow-confidence category can minimize the ldquolucky guessrdquoproblem

Kaernbach (2001a) groups the L1 and L2 categoriestogether into a ldquodonrsquot knowrdquo category and argues per-suasively that the inclusion of this third response can in-crease the precision and accuracy of threshold estimateswhile also leaving the subject a happier person He callsit the ldquounforced choicerdquo method Most of his article con-cerns the understandable fear of introducing a responsebias exactly what 2AFC was invented to avoid He has de-veloped an intelligent rule for what stimulus level shouldfollow a ldquodonrsquot knowrdquo response Kaernbach shows withboth simulations and real data that with his method thispotential response bias problem has only a second-ordereffect on threshold estimates (similar to the 2AFC inter-val bias discussed around Equation 7) and its advantagesoutweigh the disadvantages I urge anyone using the 2AFCmethod to read it I conjecture that Kaernbachrsquos method isespecially useful in trimming the lower tail of the distri-bution of threshold estimates and also in reducing the in-terval bias of 2AFC tasks since the low confidence ratingwould tend to be used for trials on which there is the mostuncertainty

Although Kaernbach has managed to convince me ofthe usefulness of his three-response method he might havetrouble convincing others because of the response biasquestion I should like to offer a solution that might quellthe response bias fears Rather than using 2AFC with threeresponses one can increase the number of responses tofourmdashH1 L1 L2 H2mdashas discussed two paragraphsabove The L rating can be used when subjects feel theyare guessing

To illustrate how this staircase would work considerthe case in which we would like the 2AFC staircase to con-verge to about 84 correct A 5 upndash1 down staircase(5 levels up for a wrong response and 1 level down for acorrect response) leads to an average probability correct

MEASURING THE PSYCHOMETRIC FUNCTION 1437

of 56 5 8333 If one had zero information about thetrial and guessed then one would be correct half the timeand the staircase would shift an average of (5 1)2 512 levels Thus in his scheme with a single ldquodonrsquot knowrdquoresponse Kaernbach would have the staircase move up2 levels following a ldquodonrsquot knowrdquo response One mightworry that continued use of the ldquodonrsquot knowrdquo responsewould continuously increase the level so that one wouldget an upward bias But Kaernbach points out that theldquodonrsquot knowrdquo response replaces incorrect responses to agreater extent than it replaces correct responses so thatthe 12 level shift is actually a more negative shift than the15 shift associated with a wrong response For my sug-gestion of four responses the step shifts would be ndash1 1113 and 15 levels for responses HC LC LI and HIwhere H and L stand for high and low confidence and Cand I stand for correct and incorrect A slightly more con-servative set of responses would be ndash1 0 14 15 inwhich case the low-confidence correct judgment leaves thelevel unchanged In all these examples including Kaern-bachrsquos three-response method the low-confidence re-sponse has an average level shift of 12 when one is guess-ing The most important feature of the low confidencerating is that when one is below threshold giving a lowconfidence rating will prevent one from long strings oflucky guesses that bring one to an inappropriately lowlevel This is the situation that can be difficult to overcomeif it occurs in the early phase of a staircase when the stepsize can be large The use of the confidence rating wouldbe especially important for short runs where the extra bitof information from the subject is useful for minimizingerroneous estimates of threshold

Another reason not to be fearful of the multiple-response2AFC method is that after the run is finished one can es-timate threshold by using a maximum likelihood proce-dure rather than by averaging the run locations Standardlikelihood analysis would have to be modified for Kaern-bachrsquos three-response method in order to deal with theldquodonrsquot knowrdquo category For the suggested four-responsemethod one could simply ignore the confidence ratings forthe likelihood analysis My simulations discussed earlierin this section and Greenrsquos (1990) reveal that averagingover the levels (excluding a number of initial levels toavoid the starting point bias) has about as good a preci-sion and accuracy as the best of the likelihood methodsIf the threshold estimates from the likelihood method andfrom the averaging levels method disagree that run couldbe thrown out

Summary of advantages of Brownian staircaseSince there is no substantial variance advantage to thefancier likelihood methods one can pay more attention toa number of advantages associated with simple staircases

1 At the beginning of a run likelihood methods mayhave too little inertia (the sequential stimulus levels jumparound easily) unless a ldquostiff rdquo prior is carefully chosenBy inertia I refer to how difficult it is for the next level todeviate from the present level At the beginning of a runone would like to familiarize the subject with the stimu-

lus by presenting a series of stimuli that gradually increasein difficulty This natural feature of the Brownian stair-case is usually missing in QUEST-like methods Theslightly inefficient first few trials of a simple staircasemay actually be a benefit since it gives the subject somepractice with slightly visible stimuli

2 Toward the end of a maximum likelihood run one canhave the opposite problem of excessive inertia wherebyit is difficult to have substantial shifts of test contrastThis can be a problem if nonstationary effects are presentFor example thresholds can increase in mid-run owingto fatigue or to adaptation if a suprathreshold pedestal ormask is present In an old-fashioned Brownian staircasewith moderate inertia a quick look at the staircase trackshows the change of threshold in mid-run That observa-tion can provide an important clue to possible problemswith the current procedures A method that reveals non-stationarity is especially important for difficult subjectssuch as infants or patients in whom fatigue or inatten-tion can be an important factor

3 Several researchers have told me that the simplicity ofthe staircase rules is itself an advantage It makes writingprograms easier It is nicer for teaching students It alsosimplifies the uncertain business of guessing likelihoodpriors and communicating the methodology when writ-ing articles

Methods That Can Be Used to Estimate Slope as Well as Threshold

There are good reasons why one might want to learnmore about the PF than just the single threshold valueEstimating the PF slope is the first step in this directionTo estimate slope one must test the PF at more than onepoint Methods are available to do this (Kaernbach 2001bKontsevich amp Tyler 1999) and one could even adapt sim-ple Brownian staircase procedures to give better slope es-timates with minimum harm to threshold estimates (seethe comments in Strasburger 2001b)

Probably the best method for getting both thresholds andslopes is the Y (Psi) method of Kontsevich and Tyler(1999) The Y method is not totally novel it borrows tech-niques from many previous researchersmdashas Kontsevichand Tyler point out in a delightful paragraph that both clar-ifies their method and states its ancestry This paragraphsummarizes what is probably the most accurate and preciseapproach to estimating thresholds and slope so I reproduceit here

To summarize the Y method is a combination of solutionsknown from the literature The method updates the poste-rior probability distribution across the sampled space ofthe psychometric functions based on Bayesrsquo rule (Hall1968 Watson amp Pelli 1983) The space of the psychome-tric functions is two-dimensional (King-Smith amp Rose1997 Watson amp Pelli 1983) Evaluation of the psycho-metric function is based on computing the mean of theposterior probability distribution (Emerson 1986 King-Smith et al 1994) The termination rule is based on thenumber of trials as the most practical option (King-Smith

1438 KLEIN

et al 1994 Watson amp Pelli 1983) The placement of eachnew trial is based on one-step ahead minimum search(King-Smith 1984) of the expected entropy cost function(Pelli 1987)

A simplified rough description of the Y trial placementfor 2AFC runs is that initially trials are placed at about the90 correct level to establish a solid threshold Oncethreshold has been roughly established trials are placed atabout the 90 and the 70 levels to pin down the slopeThe precise levels depend on the assumed PF shape

Before one starts measuring slopes however one mustfirst be convinced that knowledge of slope can be impor-tant It is so for several reasons

1 Parametric fitting procedures either require knowl-edge of slope b or must have data that adequately con-strain the slope estimate Part of the discussion that fol-lows will be on how to obtain reliable threshold estimateswhen slope is not well known A tight knowledge of slopecan minimize the error of the threshold estimate

2 Strasburger (2001a 2001b) gives good argumentsfor the desirability of knowing the shape of the PF He isinterested in differences between foveal and peripheralvisual processing and he uses the PF slope as an indica-tion of differences

3 Slopes are different for different stimuli and thiscan lead to misleading results if slope is ignored It mayhelp to describe my experience with the Modelfest pro-ject (Carney et al 2000) to make this issue concreteThis continuing project involves a dozen laboratoriesaround the world We have measured detection thresh-olds for a wide variety of visual stimuli The data areused for developing an improved model of spatial visionSome of the stimuli were simple easily memorized lowspatial frequency patterns that tend to have shallowerslopes because a stimulus-known-exactly template canbe used efficiently and with minimal uncertainty On theother hand tiny Modelfest stimuli can have a good dealof spatial uncertainty and are therefore expected to pro-duce PFs with steeper slopes Because of the slope changethe pattern of thresholds will depend on the level atwhich threshold is measured When this rich dataset isanalyzed with f ilter models that make estimates formechanism bandwidths and mechanism pooling thesedifferent slope-dependent threshold patterns will lead todifferent estimates for the model parameters Several ofus in the Modelfest data gathering group argued that weshould use a data-gathering methodology that wouldallow estimation of PF slope of each of the 45 stimuliHowever I must admit that when I was a subject my pa-tience wore thin and I too chose the quicker runs Al-though I fully understand the attraction of short runs fo-cused on thresholds and ignoring slope I still think thereare good reasons for spreading trials out For one thinginterspersing trials that are slightly visible helps the sub-ject maintain memory of the test pattern

4 Slopes can directly resolve differences between mod-els In my own research for example my colleagues and

I are able to use the PF slope to distinguish long-range fa-cilitation that produces a low-level excitation or noise re-duction (typical with a cross-orientation surround) from ahigher level uncertainty reduction (typical with a same-orientation surround)

Throwing-Out Rules Goodness-of-FitThere are occasions when a staircase goes sour Most

often this happens when the subjectrsquos sensitivity changesin the middle of a run possibly because of fatigue It istherefore good to have a well-specified rule that allowsnonstationary runs to be thrown out The throwing-out rulecould be based on whether the PF slope estimate is toolow or too high Alternatively a standard approach is tolook at the chi-square goodness-of-f it statistic whichwould involve an assumption about the shape of the un-derlying PF I have a commentary in Section III regard-ing biases in the chi-square metric that were found byWichmann and Hill (2001a) Biases make it difficult touse standard chi-square tables The trouble with this ap-proach of basing the goodness-of-fit just on the cumulateddata is that it ignores the sequential history of the run TheBrownian upndashdown staircase makes the sequential historyvividly visible with a plot of sequentially tested levelsWhen fatigue or adaptation sets in one sees the tested lev-els shift from one stable point to another Leek Hanna andMarshall (1991) developed a method of assessing the non-stationarity of a psychometric function over the course ofits measurement using two interleaved tracks EarlierHall (1983) also suggested a technique to address the sameproblem I encourage further research into the develop-ment of a non-stationarity statistic that can be used to dis-card bad runs This statistic would take the sequential his-tory into account as well as the PF shape and the chi-square

The development of a ldquothrowing-outrdquo rule is the ex-treme case of the notion of weighted averaging One im-portant reason for developing good estimators for good-ness-of-fit and good estimators for threshold variance isto be able to optimally combine thresholds from repeatedexperiments I shall return to this issue in connectionwith the Wichmann and Hill (2001a) analysis of bias ingoodness-of-fit metrics

Final Thought The Method Should Depend on the Situation

Also important is that the method chosen should be ap-propriate for the task For example suppose one is usingwell-experienced observers for testing and retesting sim-ilar conditions in which one is looking for small changesin threshold or one is interested in the PF shape In suchcases one has a good estimate of the expected thresholdand the signal detection method of constant stimuli withblanks and with ratings might be best On the other handfor clinical testing in which one is highly uncertain aboutan individualrsquos threshold and criterion stability a 2AFClikelihood or staircase method (with a confidence rating)might be best

MEASURING THE PSYCHOMETRIC FUNCTION 1439

III DATA-FITTING METHODS FOR ESTIMATING THRESHOLD AND SLOPEPLUS GOODNESS-OF-FIT ESTIMATION

This final section of the paper is concerned with whatto do with the data after they have been collected Wehave here the standard Bayesian situation Given the PFit is easy to generate samples of data with confidence in-tervals But we have the inverse situation in which we aregiven the data and need to estimate properties of the PFand get confidence limits for those estimates The pair ofarticles in this special issue by Wichmann and Hill (2001a2001b) as well as the articles by King-Smith et al (1994)Treutwein (1995) and Treutwein and Strasburger (1999)do an outstanding job of introducing the issues and providemany important insights For example Emerson (1986)and King-Smith et al emphasize the advantages of meanlikelihood over maximum likelihood The speed of mod-ern computers makes it just as easy to calculate meanlikelihood as it is to calculate maximum likelihood As an-other example Treutwein and Strasburger (1999) demon-strate the power of using Bayesian priors for estimatingparameters by showing how one can estimate all four pa-rameters of Equation 1A in fewer than 100 trials This im-portant finding deserves further examination

In this section I will focus on four items First I will dis-cuss the relative merits of parametric versus nonparamet-ric methods for estimating the PF properties The paperby Miller and Ulrich (2001) provides a detailed analysis ofa nonparametric method for obtaining all the moments ofthe PF I will discuss some advantages and limitations oftheir method Second is the question of goodness-of-fitWichmann and Hill (2001a) show that when the numberof trials is small the goodness-of-fit can be very biasedI will examine the cause of that bias Third is the issue dis-cussed by Wichmann and Hill (2001a) that of how lapseseffect slope estimates This topic is relevant to Strasburg-errsquos (2001b) data and analysis Finally there is the possiblycontroversial topic of an upward slope bias in adaptivemethods that is relevant to the papers of Kaernbach (2001b)and Strasburger (2001b)

Parametric Versus Nonparametric FitsThe main split between methods for estimating pa-

rameters of the psychometric function is the distinctionbetween parametric and nonparametric methods In aparametric method one uses a PF that involves severalparameters and uses the data to estimate the parametersFor example one might know the shape of the PF and es-timate just the threshold parameter McKee et al (1985)show how slope uncertainty affects threshold uncertaintyin probit analysis The two papers by Wichmann and Hill(2001a 2001b) are an excellent introduction to many is-sues in parametric fitting An example of a nonparamet-ric method for estimating threshold would be to averagethe levels of a Brownian staircase after excluding the firstfour or so reversals in order to avoid the starting level biasBefore I served as guest editor of this special symposium

issue I believed that if one had prior knowledge of the exactPF shape a parametric method for estimating threshold wasprobably better than nonparametric methods After read-ing the articles presented here as well as carrying out myrecent simulations I no longer believe this I think thatin the proper situation nonparametric methods are as goodas parametric methods Furthermore there are situationsin which the shape of the PF is unknown and nonparamet-ric methods can be better

Miller and Ulrichrsquos Article on the Nonparametric SpearmanndashKaumlrber Method of Analysis

The standard way to extract information from psycho-metric data is to assume a functional shape such as a Wei-bull cumulative normal logit dcent and then vary the pa-rameters until likelihood is maximized or chi-square isminimized Miller and Ulrich (2001) propose using theSpearmanndashKaumlrber (SK) nonparametric method whichdoes not require a functional shape They start with a prob-ability density function defined by them as the derivativeof the PF

(33)

where s(i) is the stimulus intensity at the ith level andP(i) is the probability correct at the that level The notationpdf(i) is meant to indicate that it is the pdf for the intervalbetween i ndash 1 and i Miller and Ulrich give the r th momentof this distribution as their Equation 3

(34)

The next four equations are my attempt to clarify their ap-proach I do this because when applicable it is a finemethod that deserves more attention Equation 34 prob-ably came from the mean value theorem of calculus

(35)

where f (s) 5 sr11 and f cent(smid) 5 (r 1 1)smidr is the deriv-

ative of f with respect to s somewhere within the intervalbetween s(i 1) and s(i) For a slowly varying functionsmid would be near the midpoint Given Equation35 Equa-tion 34 becomes

(36)

where Ds 5 s(i) s(i 1) Equation 36 is what one wouldmean by the rth moment of the distribution The case r 50 is important since it gives the area under the pdf

(37)

by using the definition in Equation 33 The summation inEquation 37 gives the area under the pdf For it to be aproper pdf its area must be unity which means that theprobability at the highest level P(imax) must be unity andthe probability at the lowest level Pmin must be zero asMiller and Ulrich (2001) point out

m0 = =aringpdf( ) ( ) ( ) max mini s P i P ii

D

mrr

i

i s s= aring pdf mid( ) D

f s i f s i f s s i s i( ) ( ) ( ) ( ) ( ) [ ] [ ] = cent [ ]1 1mid

mrr r

ri s i s i=

+ [ ]aring + +11

11 1pdf( ) ( ) ( )

pdf( )( ) ( )

( ) ( )i

P i P i

s i s i= 1

1

1440 KLEIN

The next simplest SK moment is for r 5 1

(38)

from Equation 33 This first moment provides an esti-mate of the centroid of the distribution corresponding tothe center of the psychometric function It could be usedas an estimate of threshold for detection or PSE for dis-crimination An alternate estimator would be the medianof the pdf corresponding to the 50 point of the PF (theproper definition of PSE for discrimination)

Miller and Ulrich (2001) use Monte Carlo simulationsto claim that the SK method does a better job of extract-ing properties of the psychometric function than do para-metric methods One must be a bit careful here since intheir parametric fitting they generate the data by usingone PF and then fit it with other PFs But still it is a sur-prising claim if this simple method is so good why havepeople not been using it for years There are two possibleanswers to this question One is that people never thoughtof using it before Miller and Ulrich The other possibil-ity is that it has problems I have concluded that the an-swer is a combination of both factors I think that there isan important place for the SK approach but its limitationsmust be understood That is my next topic

The most important limitation of the SK nonparamet-ric method is that it has only been shown to work well formethod of constant stimulus data in which the PF goesfrom 0 to 1 I question whether it can be applied with thesame confidence to 2AFC and to adaptive procedures withunequal numbers of trials at each level The reason for thislimitation is that the SK method does not offer a simpleway to weight the different samples Let us consider twosituations with unequal weighting adaptive methods and2AFC detection tasks

Adaptive methods place trials near threshold but withvery uneven distribution of number of trials at differentlevels The SK method gives equal weighting to sparselypopulated levels that can be highly variable thus con-tributing to a greater variance of parameter estimates Letus illustrate this point for an extreme case with five levelss 5 [1 2 3 4 5] Suppose the adaptive method placed [11 1 98 1] trials at the five levels and the number correctwere [0 1 1 49 1] giving probabilities [0 1 1 5 1] andpdf 5 [1 0 5 5] The following PEST-like (Taylor ampCreelman 1967) rules could lead to this unusual dataMove down a level if the presently tested level has greaterthan P1 correct move up three levels if the presentlytested level has less than P2 correct P1 can be anynumber less than 33 and P2 can be any number greaterthan 67 The example data would be produced by start-ing at Level 5 and getting four in a row correct followedby an error followed by a wide set possible of alternat-ing responses for which the PEST rules would leave thefourth level as the level to be tested The SK estimate of the

center of the PF is given by Equation 38 to be micro1 5 200Since a pdf with a negative probability is distasteful tosome one can monotonize the PF according to the proce-dure of Miller and Ulrich (which does include the numberof trials at each level) The new probabilities are [0 5151 51 1] with pdf 5 [51 0 0 49] The new estimate ofthreshold is micro1 5 297 However given the data with 4998correct at Level 4 an estimate that includes a weightingaccording to the number of trials at each level would placethe centroid of the distribution very close to micro1 5 4 Thuswe see that because the SK procedure ignores the numberof trials at each level it produces erroneous estimates ofthe PF properties This example with shifting thresholdswas chosen to dramatize the problems of unequal weight-ing and the effect of monotonizing

A similar problem can occur even with the method ofconstant stimuli with equal number of trials at each levelbut with unequal binomial weighting as occurs in 2AFCdata The unequal weighting occurs because of the 50lower asymptote The binomial statistics error bar of dataat 55 correct is much larger than for data at 95 cor-rect Parametric procedures such as probit analysis giveless weighting to the lower data This is why the most ef-ficient placement of trials in 2AFC is above 90 correctas I have discussed in Section II Thus it is expected thatfor 2AFC the SK method would give threshold estimatesless precise than the estimates given by a parametricmethod that includes weighting

I originally thought that the SK method would fail on alltypes of 2AFC data However in the discussion followingEquation 7 I pointed out that for 2AFC discriminationtasks there is a way of extending the data to the 0ndash100range that produces weightings similar to those for a yesnotask Rather than use the standard Equation 2 to deal withthe 50 lower asymptote one should use the method dis-cussed in connection with Figure 3 in which one can usefor the ordinate the percent of correct responses in Inter-val 2 and use the signal strength in Interval 2 minus thatin Interval 1 for the abscissa That procedure produces aPF that goes from 0 to 100 thereby making it suit-able for SK analysis while also removing a possible bias

Now let us examine the SK method as applied to datafor which it is expected to work such as discriminationPFs that go from 0 to 100 These are tasks in which thelocation of the psychometric function is the PSE and theslope is the threshold

A potentially important claim by Miller and Ulrich(2001) is that the SK method performs better than probitanalysis This is a surprising claim since their data aregenerated by a probit PF one would expect probit analy-sis to do well especially since probit analysis does espe-cially well with discrimination PFs that go from 0 to 100To help clarify this issue Miller and I did a number ofsimulations We focused on the case of N 5 40 with 8 tri-als at each of 5 levels The cumulative normal PF (Equa-tion 14) was used with equally spaced z scores and with thelowest and highest probabilities being 1 and 99 Theprobabilities at the sample points are P(i) 5 1 1222

m1

2 21

2

11

2

=

= [ ] +

aring

aring

pdf( )( ) ( )

( ) ( )( ) ( )

is i s i

P i P is i s i

MEASURING THE PSYCHOMETRIC FUNCTION 1441

50 8778 and 99 corresponding to z scores ofz(i) 5 ndash2328 1164 0 1164 and 2328 Nonmonot-onicity is eliminated as Miller and Ulrich suggest (seethe Appendix for Matlab Code 4 for monotonizing) I didMonte Carlo simulations to obtain the standard deviationof the location of the PF micro1 given by Equation38 I foundthat both the SK method and the parametric probit method(assuming a fixed unity slope) gave a standard deviationof between 028 and 029 Miller and Ulrichrsquos result re-ported in their Table 17 gives a standard error of 0071However their Gaussian had a standard deviation of 025whereas my z-score units correspond to a standard devi-ation of unity Thus their standard error must be multipliedby 4 giving a standard error of 0284 in excellent agree-ment with my value So at least in this instance the SKmethod did not do better than the parametric method (withslope fixed) and the surprise mentioned at the beginningof this paragraph did not materialize I did not examineother examples where nonparametric methods might dobetter than the matched parametric method However thesimulations of McKee et al (1985) give some insight intowhy probit analysis with floating slope can do poorly evenon data generated from probit PFs McKee et al showthat when the number of trials is low and if stimulus lev-els near both 0 and 100 are not tested the slope can bevery flat and the predicted threshold can be quite distantIf care is not taken to place a bound on these probit esti-mates one would expect to find deviant threshold esti-mates The SK method on the other hand has built-inbounds for threshold estimates so outliers are avoidedThese bounds could explain why SK can do better thanprobit When the PF slope is uncertain nonparametricmethods might work better than parametric methods adecided advantage for the SK method

It is instructive to also do an analytic calculation of thevariance based on an analysis similar to what was doneearlier If only the ith level of the five had been testedthe variance of the PF location (the PSE in a discrimina-tion task) would have been as follows (Gourevitch ampGalanter 1967)

(39)

where var(P) is given by Equation 27 and PF slope is

(40)

where for the cumulative normal dzi dxi 5 1s where sis the standard deviation of the pdf and

(41)

from Equation 3A By combining these equations one gets

(42)

Note that if all trials were placed at the optimal stimuluslocation of z 5 0 (P 5 5) Equation 42 would becomes2p 2Ni as discussed earlier When all five levels arecombined the total variance is smaller than each individ-ual variance as given by the variance summation formula

(43)

Finally upon plugging in the five values of zi rangingfrom ndash2328 to 12328 and Pi ranging from 1 to 99the standard error of the PF location is

(44)

This value is precisely the value obtained in our MonteCarlo simulations of the nonparametric SK method andalso by the slope fixed parametric probit analysis Thistriple confirmation shows that the SK method is excel-lent in the realm where it applies (equal number of trialsat levels that span 0 to 100) It also confirms our sim-ple analytic formula

The data presented in Miller and Ulrichrsquos Table 17 giveus a fine opportunity to ask about the efficiency of themethod of constant stimuli that they use For the N 5 40example discussed here the variance is

(45)

This value is more than twice the variance of p (2 Ntot)that is obtained for the same PF using adaptive methodsincluding the simple upndashdown staircase that place trialsnear the optimal point

The large variance for the Miller and Ulrich stimulusplacement is not surprising given that of the five stimu-lus levels the two at P 5 1 and 99 are quite inefficientThe inefficient placement of trials is a common problemfor the method of constant stimuli as opposed to adaptivemethods However as I mentioned earlier the method ofconstant stimuli combined with multiple response ratingsin a signal detection framework may be the best of allmethods for detection studies for revealing properties of thesystem near threshold while maintaining a high efficiency

A curious question comes up about the optimal placementof trials for measuring the location of the psychometric func-tion With the original function the optimal placement is atthe steepest part of the function (at P 5 0) However withthe SK method when one is determining the location ofthe pdf one might think that the optimal placement of sam-ples should occur at the points where the pdf is steepestThis contradiction is resolved when one realizes that onlythe original samples are independent The pdf samples areobtained by taking differences of adjacent PF values soneighboring samples are anticorrelated Thus one cannot

var tottot

= =SEN

2 3 24

SE = =var tot12 0 2844

var var tot = aring1 1i

var( )exp

PSE i i i

i

i

P Pz

N= ( ) ( )

2 12

2

ps

dP

dz

zi

i

iaeligegraveccedil

oumloslashdivide

=( )2 2

2

exp

p

dP

dx

dP

dz

dz

dxi

i

i

i

i

i

= times

var( )var

PSE i

i

i

i

P

dP

dx

=( )

aeligegraveccedil

oumloslashdivide

2

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

MEASURING THE PSYCHOMETRIC FUNCTION 1433

18379 3131 4421] for b 5 [1 2 35 5] If the dcent slopeis taken to be a measure of the dcent exponent b then theratio bb is [08847 09190 08946 08842] for the fourb values Our value of bb 5 106 differs substantiallyfrom 08 (Pelli 1985) and 088 (Strasburger 2001a) Pelli(1985) and Strasburger (2001a) used dcent functions with a 51 in Equation 20 With a 5 0614 our dcent function (Equa-tion 20) starts saturating near threshold in agreement withexperimental data The saturation lowers the effective valueof b I was very surprised to find that the StromeyerndashFoley dcent function did such a good job in matching theWeibull function across the whole range of b and x Fora long time I had thought a different fit would be neededfor each value of b

For the same Weibull function in a yesno method (falsealarm rate 5 50) the parameters b and w are identicalto the 2AFC case above Only the value of a shifts froma 5 0614 (2AFC) to a 5 054 (yesno) In order to makext 5 1 at dcent 5 1 k becomes 03173 (see Equation 18C)instead of 04795 (2AFC) As with 2AFC the differencebetween the Weibull and dcent function is less than 0004(lt04) for all stimulus strengths and all values of b andfor both a linear and a logarithmic abscissa These valuesmean that for both the yes no and the 2AFC methods thedcent function (Equation 20) corresponding to a Weibullfunction with b 5 2 has a low-contrast logndashlog slope ofb 5 212 and a high-contrast logndashlog slope of 1 w 5078 The high-contrast tvc (test contrast vs pedestal con-trast discrimination function sometimes called the ldquodip-perrdquo function) logndashlog slope would be w 5 022

I also carried out a similar analysis asking what cumu-lative normal s is best matched to the Weibull I founds 5 11720b However the match is poor with errors of4 (rather than lt004 in the preceding analysis) Thepoor fit of the two functions makes the fitting proceduresensitive to the weighting used in the fitting procedure Ifone uses a weighting factor of [P(1 P)N] where P is thePF one will find a quite different value for s than if oneignored the weighting or if one chose the weighting on thebasis of the data rather than the fitting function

II COMPARING EXPERIMENTAL TECHNIQUES FOR MEASURING

THE PSYCHOMETRIC FUNCTION

This section inspired by Linschoten et al (2001) is anexamination of various experimental methods for gather-ing data to estimate thresholds Linschoten et al comparethree 2AFC methods for measuring smell and taste thresh-olds a maximum-likelihood adaptive method an upndashdownstaircase method and an ascending method of limits Thespecial aspect of their experiments was that in measuringsmell and taste each 2AFC trial takes between 40 and60 sec (Lew Harvey personal communication) because ofthe long duration needed for the nostrils or mouth to re-cover their sensitivity The purpose of the Linschoten et alpaper is to examine which of the three methods is mostaccurate when the number of trials is low Linschoten et al

conclude that the 2AFC method is a good method for thistask My prior experience with forced choice tasks and lownumber of trials (Manny amp Klein 1985 McKee et al1985) made me wonder whether there might be better meth-ods to use in circumstances in which the number of trialsis limited This topic is considered in Section II

Compare 2AFC to YesNoThe commonly accepted framework for comparing

2AFC and yesno threshold estimates is signal detectiontheory Initially I will make the common assumption thatthe dcent function is a power function (Equation 20 witha 5 1)

(23)

Later in this section the experimentally more accurateform Equation 20 with a 1 will be used The varianceof the threshold estimate will now be calculated for 2AFCand yesno for the simplest case of trials placed at a sin-gle stimulus level y the goal of most adaptive methodsThe PF slope relates ordinate variance to abscissa vari-ance (from Gourevitch amp Galanter 1967)

(24)

where Y 5 y yt is the threshold estimate for a natural logabscissa as given in Equation 9 This formula for var(Y )is called the asymptotic formula in that it becomes exactin the limit of large number of total trials N Wichmannand Hill (2001b) show that for the method of constantstimuli a proper choice of testing levels brings one to theasymptotic region for N 120 (the smallest value thatthey explored) Improper choice of testing levels requiresa larger N in order to get to the asymptotic region Equa-tion 24 is based on having the data at a single level as oc-curs asymptotically in adaptive procedures In addition thedefinition of threshold must be at the point being testedIf levels are tested away from threshold there will be anadditional contribution to the threshold variance (Finney1971 McKee et al 1985) if the slope is a variable param-eter in the fitting procedure Equation 24 reaches its as-ymptotic value more rapidly for adaptive methods with fo-cused placement of trials than with the constant stimulusmethod when slope is a free parameter

Equation 24 needs analytic expressions for the deriva-tive and the variance of dcent The derivative is obtained fromEquation 23

(25)

The variance of dcent for both yesno (dcent 5 zhit 1 zcorrect rejection)and 2AFC (dcent 5 Iuml2zave) is

(26)var var( ) exp var( ) cent( ) = = ( )d z z P2 4 2p

dddy

bdt

cent = cent

var( )var( )

Yd

dddyt

= cent

centaeligegraveccedil

oumloslashdivide

2

cent = = ( )d c bytb

texp

1434 KLEIN

For the yes no case Equation 26 is based on a unityslope ROC curve with the subjectrsquos criterion at the neg-ative diagonal of the ROC curve where the hit rate equalsthe correct rejection rate This would be the most effi-cient operating point The variance for a nonunity slopeROC curve was considered by Klein Stromeyer andGanz (1974) From binomial statistics

(27)

with n 5 N for 2AFC and n 5 N2 for yesno since forthis binary yesno task the number of signal and blanktrials is half the total number of trials

Putting all of these factors together gives

(28)

for both the yesno and the 2AFC cases The only dif-ference between the two cases is the definition of n andthe definition of dcent (dcent2AFC 5 205z and dcentYN 5 2z for asymmetric criterion) When this is expressed in terms ofN (number of trials) and z (z score of probability cor-rect) we get

(29)

for both 2AFC and yesno That is when the percent cor-rects are all equal (P2AFC 5 Phit 5 Pcorrect rejection) thethreshold variance for 2AFC and yesno are equal Theminimum value of var(Y ) is 164(Nb2) occurring at z 5157 (P 5 94) This value of 94 correct is the optimumvalue at which to test for both 2AFC and yesno tasks in-dependent of b

This result that the threshold variance is the same for2AFC and yesno methods seems to disagree with Figure 4of McKee et al (1985) where the standard errors of thethreshold estimates are plotted for 2AFC and yesno asa function of the number of trials The 2AFC standarderror (their Figure 4) is twice that of yesno correspond-ing to a factor of four in variance However the McKeeet al analysis is misleading since all they (we) did for theyesno task was to use Equation 2 to expand the PF to a 0to 100 range thereby doubling the PF slope Thatrescaling is not the way to convert a 2AFC detection taskto the yesno task of the same detectability A signal de-tection methodology is needed for that conversion aswas done in the present analysis

One surprise is that the optimal testing probability isindependent of the PF slope b This finding is a generalproperty of PFs in which the dependence on the slopeparameter has the form xb The 94 level corresponds todcentYN 5 315 and dcent2AFC 5 223 For the commonly usedP 5 75 (z 5 0765 dcentYN 5 135 dcent2AFC 5 0954)var(Y ) 5 408(Nb2) which is about 25 times the opti-mal value obtained by placing the stimuli at 94 Green(1990) had previously pointed out that the 94 and 92levels were the optimal points to test for the cumulative

normal and logit functions respectively because of theminimum threshold estimate variability when one is test-ing at that point on the PF Other PFs can have optimalpoints at slightly lower levels but these are still above thelevels to which adaptive methods are normally directed

It is also useful to compare the 2AFC and yesno at thesame dcent rather than at their optimal points In that case fordcent values fixed at [1 2 3] the ratio of the 2AFC varianceto the yesno variance is [182 136 079] At low dcent thereis an advantage for the 2AFC method and that reverses athigh dcent as is expected from the optimal dcent being higher foryesno than for 2AFC In general one should run the yesno method at dcent levels above 20 (84 correct in the blankand signal intervals corresponds to dcent 5 2)

The equality of the optimal var(Y ) for 2AFC and yesno methods is not only surprising it seems paradoxicalmdashas can be seen by the following gedankenexperiment Sup-pose subjects close their eyes on the first interval of the2AFC trial and base their judgments on just the secondinterval as if they were doing a yesno task since blanksand stimuli are randomly intermixed Instead of sayingldquoyesrdquo or ldquonordquo they would respond ldquoInterval 2rdquo or ldquoInter-val 1rdquo respectively so that the 2AFC task would become ayesno task The paradox is that one would expect the vari-ance to be worse than for the eyes open 2AFC case sinceone is ignoring information However Equation29 showsthat the 2AFC and yesno methods have identical optimalvariance I suspect that the resolution of this paradox is thatin the yesno method with a stable criterion the optimalvariance occurs at a higher dcent value (315) than the 2AFCvalue (223) The dcent power law function that I chose doesnot saturate at large dcent levels giving a benefit to the yesno method

In order to check whether the paradox still holds witha more realistic PF (a dcent function that saturates at largedcent ) I now compare the 2AFC and yesno variance usinga Weibull function with a slope b In order to connect2AFC and yesno I need to use signal detection theoryand the StromeyerndashFoley dcent function (Equation 20) withthe ldquoWeibull parametersrdquo b 5 106b a 5 0614 and w 51 39b Following Equation 22 I pointed out that withthese parameter values the difference between the 2AFCWeibull function and the 2AFC dcent function (converted toa probability ordinate) is less than 0004 After the de-rivative Equation 25 is appropriately modified we findthat the optimal 2AFC variance is 342(Nb2) For the yesno case the optimal variance is 402(Nb2) This result re-solves the paradox The optimal 2AFC variance is about18 lower than the yesno variance so closing onersquos eyesin one of the intervals does indeed hurt

We shouldnrsquot forget that if one counts stimulus presen-tations Np rather than trials N the optimal 2AFC variancebecomes 684(Npb2) as opposed to 402(Npb2) for yesnoThus for the Linschoten experiments with each stimuluspresentation being slow the yesno methodology is moreaccurate for a given number of stimulus presentationseven at low dcent values Kaernbach (1990) compares 2AFC

var( ) exp( )

( )Y z

P P

N bz= ( )2

122

p

var( ) exp( )

( )Y z

P P

n bd= ( ) cent

412

2p

var( )( )

PP P

n= 1

MEASURING THE PSYCHOMETRIC FUNCTION 1435

and yesno adaptive methods with simulations and ac-tual experiments His abscissa is number of presenta-tions and his results are compatible with our findings

The objective yesno method that I have been discussingis inefficient since 50 of the trials are blanks Greaterefficiency is possible if several points on the PF are mea-sured (method of constant stimuli) with the number of blanktrials the same as at the other levels For example if fivelevels of stimuli were measured including blanks then theblanks would be only 20 of the total rather than 50 Forthe 2AFC paradigm the subject would still have to look at50 of the presentations being blanks The yesno eff i-ciency can be further improved by having the subjectrespond to the multiple stimuli with multiple ratings ratherthan a binary yesno response This rating scale methodof constant stimuli is my favorite method and I have beenusing it for 20 years (Klein amp Stromeyer 1980) The mul-tiple ratings allow one to measure the psychometric func-tion shape for dcents as large as 4 (well into the dipper regionwhere stimuli are slightly above threshold) It also allowsone to measure the ROC slope (presence of multiplicativenoise) Simulations of its benefits will be presented inKlein (2002)

Insight into how the number of responses affects thresh-old variance can be obtained by using the cumulative nor-mal PF (Equation 17) considered by Kaernbach (2001b)and Miller and Ulrich (2001) with g 5 0 Typically this isthe case of yesno discrimination but it can also be 2AFCdiscrimination with an ordinate going from 0 to 100as I have discussed in connection with Figure 3 For a cu-mulative normal PF with a binary response and for a sin-gle stimulus level being tested the threshold variance is(Equation 19) as follows

(30)

from Equation 27 and forthcoming Equations 40 and 41The optimal level to test is at P 5 05 so the optimal vari-ance becomes

(31)

If instead of a binary response the observer gives an ana-log response (many many rating categories) then the vari-ance is the familiar connection between standard deviationand standard error

(32)

With four response categories the variance is 119s2Nwhich is closer to the analog than to the binary case Ifstimuli with multiple strengths are intermixed (methodof constant stimuli) then the benefit of going from a bi-nary response to multiple responses is greater than that in-dicated in Equations 31ndash32 (Klein 2002)

A brief list of other problems with the 2AFC method ap-plied to detection comprises the following

1 2AFC requires the subject to memorize and thencompare the results of two subjective responses It is cog-nitively simpler to report each response as it arrives Sub-jects without much experience can easily make mistakes(lapses) simply becoming confused about the order of thestimuli Klein (1985) discusses memory load issues rele-vant to 2AFC

2 The 2AFC methodology makes various types ofanalysis and modeling more difficult because it requiresthat one average over all criteria The response variable in2AFC is the Interval 2 activation minus the Interval 1 ac-tivation (see the abscissa in Figure 2) This variable goesfrom minus infinity to plus infinity whereas the yesnovariable goes from zero to infinity It therefore seems moredifficult to model the effects of probability summation anduncertainty for 2AFC than for yesno

3 Models that relate psychophysical performance to un-derlying processes require information about how the noisevaries with signal strength (multiplicative noise) The rat-ing scale method of constant stimuli not only measures dcentas a function of contrast it also is able to provide an esti-mate of how the variance of the signal distribution (theROC slope) increases with contrast The 2AFC methodlacks that information

4 Both the yesno and 2AFC methods are supposed toeliminate the effects of bias Earlier I pointed out that in myrecent 2AFC discrimination experiments when the stim-ulus is near threshold I find I have a strong bias in favor ofresponding ldquoStimulus 2rdquo Until I learned to compensatefor this bias (not typically done by naiumlve subjects) my dcentwas underestimated As I have discussed it is possible tocompensate for this bias but that is rarely done

Improving the Brownian (Up ndashDown) StaircaseAdaptive methods with a goal of placing trials at an

optimal point on the PF are efficient Leek (2001) pro-vides a nice overview of these methods There are twogeneral categories of adaptive methods those based onmaximum (or mean) likelihood and those based on sim-pler staircase rules The present section is concernedwith the older staircase methods which I will callBrownian methods I used the name ldquoBrownianrdquo becauseof the upndashdown staircasersquos similarity to one-dimensionalBrownian motion in which the direction of each step israndom The familiar upndashdown staircase is Brownianbecause at each trial the decision about whether to go upor down is decided by a random factor governed by prob-ability P(x) I want to make this connection to Brownianmotion because I suspect that there are results in thephysics literature that are relevant to our psychophysicalstaircases I hope these connections get made

A Brownian staircase has a minimal memory of the runhistory it needs to remember the previous step (includingdirection) and the previous number of reversals or numberof trials (used for when to stop and for when to decreasestep size) A typical Brownian rule is to decrease signalstrength by one step (like 1 dB) after a correct responseand to increase it three steps after an incorrect response (a

var( thresh) = s 2

N

var( thresh) = timesp s2

2

N

var(var( )

( )exp thresh) =aeligegraveccedil

oumloslashdivide

= ( )P

N dPdy

P P zN2

22

2 1p s

1436 KLEIN

one downndashthree up rule) I was pleased to learn recentlythat his rule works nicely for an objective yesno task(Kaernbach 1990) as well as 2AFC as discussed earlier

In recent years likelihood methods (Pentland 1980Watson amp Pelli 1983) have become popular and havesomewhat displaced Brownian methods There is a wide-spread feeling that likelihood methods have substantiallysmaller threshold variance than do staircases in whichthresholds are obtained by simply averaging the levelstested Given the importance of this issue for informingresearchers about how best to measure thresholds it issurprising that there have been so few simulations com-paring staircases and optimal methods Pentlandrsquos (1980)results showing very large threshold variance for a stair-case method are so dramatically different from the re-sults of my simulations that I am skeptical The best pre-vious simulations of which I am aware are those of Green(1990) who compared the threshold variance from sev-eral 2AFC staircase methods with the ideal variance(Equation 24) He found that as long as the staircase ruleplaced trials above 80 correct the ratio of observed toideal threshold variance was about 14 This value is thesquare of the inverse of Greenrsquos finding that the ratio ofthe expected to the observed sigma is about 85

In my own simulations (Klein 2002) I found that aslong as the final step size was less than 02s the thresh-old variance from simply averaging over levels was closeto the optimal value Thus the staircase threshold varianceis about the same as the variance found by other optimalmethods One especially interesting finding was that thecommon method of averaging an even number of rever-sal points gave a variance that was about 10 higher thanaveraging over all levels This result made me wonderhow the common practice of averaging reversals ever gotstarted Since Green (1990) averaged over just the reversalpoints I suspect that his results are an overestimate of thestaircase variance In addition since Green specifies stepsize in decibels it is difficult for me to determine whetherhis final step size was small enough Since the thresholdvariance is inversely related to slope it is important to re-late step size to slope rather than to decibels The most nat-ural reference unit is s the standard deviation associatedwith the cumulative normal In summary I suggest that thisgeneral feeling of distrust of the simple Brownian staircaseis misguided and that simple staircases are often as goodas likelihood methods In the section Summary of Advan-tages of Brownian Staircase I point out several advantagesof the simple staircase over the likelihood methods

A qualification should be made to the claim above thataveraging levels of a simple staircase is as good as a fancylikelihood method If the staircase method uses a too smallstep size at the beginning and if the starting point is farfrom threshold then the simple staircase can be inefficientin reaching the threshold region However at the end ofthat run the astute experimenter should notice the largediscrepancy between the starting point and the thresholdand the run should be redone with an improved startingpoint or with a larger initial step size that will be reducedas the staircase proceeds

Increase number of 2AFC response categories In-dependent of the type of adaptive method used the 2AFCtask has a flaw whereby a series of lucky guesses at lowstimulus levels can erroneously drive adaptive methods tolevels that are too low Some adaptive methods with a smallnumber of trials are unable to fully recover from a string oflucky guesses early in the run One solution is to allow thesubject to have more than two responses For reasons thatI have never understood people like to have only two re-sponse categories in a 2AFC task Just as I am reluctant totake part in 2AFC experiments I am even more reluctantto take part in two response experiments The reason issimple I always feel that I have within me more than onebit of information after looking at a stimulus I usually haveat least four or more subjective states (2 bits) for a yesnotask HN LN LY HY where H and L are high and lowconfidence and N and Y are did not and did see it For a2AFC my internal states are H1 L1 L2 H2 for high andlow confidence on Intervals 1 and 2 Kaernbach (2001a) inthis special issue does an excellent job of justifying thelow-confidence response category He provides solid ar-guments simulations and experimental data on how alow-confidence category can minimize the ldquolucky guessrdquoproblem

Kaernbach (2001a) groups the L1 and L2 categoriestogether into a ldquodonrsquot knowrdquo category and argues per-suasively that the inclusion of this third response can in-crease the precision and accuracy of threshold estimateswhile also leaving the subject a happier person He callsit the ldquounforced choicerdquo method Most of his article con-cerns the understandable fear of introducing a responsebias exactly what 2AFC was invented to avoid He has de-veloped an intelligent rule for what stimulus level shouldfollow a ldquodonrsquot knowrdquo response Kaernbach shows withboth simulations and real data that with his method thispotential response bias problem has only a second-ordereffect on threshold estimates (similar to the 2AFC inter-val bias discussed around Equation 7) and its advantagesoutweigh the disadvantages I urge anyone using the 2AFCmethod to read it I conjecture that Kaernbachrsquos method isespecially useful in trimming the lower tail of the distri-bution of threshold estimates and also in reducing the in-terval bias of 2AFC tasks since the low confidence ratingwould tend to be used for trials on which there is the mostuncertainty

Although Kaernbach has managed to convince me ofthe usefulness of his three-response method he might havetrouble convincing others because of the response biasquestion I should like to offer a solution that might quellthe response bias fears Rather than using 2AFC with threeresponses one can increase the number of responses tofourmdashH1 L1 L2 H2mdashas discussed two paragraphsabove The L rating can be used when subjects feel theyare guessing

To illustrate how this staircase would work considerthe case in which we would like the 2AFC staircase to con-verge to about 84 correct A 5 upndash1 down staircase(5 levels up for a wrong response and 1 level down for acorrect response) leads to an average probability correct

MEASURING THE PSYCHOMETRIC FUNCTION 1437

of 56 5 8333 If one had zero information about thetrial and guessed then one would be correct half the timeand the staircase would shift an average of (5 1)2 512 levels Thus in his scheme with a single ldquodonrsquot knowrdquoresponse Kaernbach would have the staircase move up2 levels following a ldquodonrsquot knowrdquo response One mightworry that continued use of the ldquodonrsquot knowrdquo responsewould continuously increase the level so that one wouldget an upward bias But Kaernbach points out that theldquodonrsquot knowrdquo response replaces incorrect responses to agreater extent than it replaces correct responses so thatthe 12 level shift is actually a more negative shift than the15 shift associated with a wrong response For my sug-gestion of four responses the step shifts would be ndash1 1113 and 15 levels for responses HC LC LI and HIwhere H and L stand for high and low confidence and Cand I stand for correct and incorrect A slightly more con-servative set of responses would be ndash1 0 14 15 inwhich case the low-confidence correct judgment leaves thelevel unchanged In all these examples including Kaern-bachrsquos three-response method the low-confidence re-sponse has an average level shift of 12 when one is guess-ing The most important feature of the low confidencerating is that when one is below threshold giving a lowconfidence rating will prevent one from long strings oflucky guesses that bring one to an inappropriately lowlevel This is the situation that can be difficult to overcomeif it occurs in the early phase of a staircase when the stepsize can be large The use of the confidence rating wouldbe especially important for short runs where the extra bitof information from the subject is useful for minimizingerroneous estimates of threshold

Another reason not to be fearful of the multiple-response2AFC method is that after the run is finished one can es-timate threshold by using a maximum likelihood proce-dure rather than by averaging the run locations Standardlikelihood analysis would have to be modified for Kaern-bachrsquos three-response method in order to deal with theldquodonrsquot knowrdquo category For the suggested four-responsemethod one could simply ignore the confidence ratings forthe likelihood analysis My simulations discussed earlierin this section and Greenrsquos (1990) reveal that averagingover the levels (excluding a number of initial levels toavoid the starting point bias) has about as good a preci-sion and accuracy as the best of the likelihood methodsIf the threshold estimates from the likelihood method andfrom the averaging levels method disagree that run couldbe thrown out

Summary of advantages of Brownian staircaseSince there is no substantial variance advantage to thefancier likelihood methods one can pay more attention toa number of advantages associated with simple staircases

1 At the beginning of a run likelihood methods mayhave too little inertia (the sequential stimulus levels jumparound easily) unless a ldquostiff rdquo prior is carefully chosenBy inertia I refer to how difficult it is for the next level todeviate from the present level At the beginning of a runone would like to familiarize the subject with the stimu-

lus by presenting a series of stimuli that gradually increasein difficulty This natural feature of the Brownian stair-case is usually missing in QUEST-like methods Theslightly inefficient first few trials of a simple staircasemay actually be a benefit since it gives the subject somepractice with slightly visible stimuli

2 Toward the end of a maximum likelihood run one canhave the opposite problem of excessive inertia wherebyit is difficult to have substantial shifts of test contrastThis can be a problem if nonstationary effects are presentFor example thresholds can increase in mid-run owingto fatigue or to adaptation if a suprathreshold pedestal ormask is present In an old-fashioned Brownian staircasewith moderate inertia a quick look at the staircase trackshows the change of threshold in mid-run That observa-tion can provide an important clue to possible problemswith the current procedures A method that reveals non-stationarity is especially important for difficult subjectssuch as infants or patients in whom fatigue or inatten-tion can be an important factor

3 Several researchers have told me that the simplicity ofthe staircase rules is itself an advantage It makes writingprograms easier It is nicer for teaching students It alsosimplifies the uncertain business of guessing likelihoodpriors and communicating the methodology when writ-ing articles

Methods That Can Be Used to Estimate Slope as Well as Threshold

There are good reasons why one might want to learnmore about the PF than just the single threshold valueEstimating the PF slope is the first step in this directionTo estimate slope one must test the PF at more than onepoint Methods are available to do this (Kaernbach 2001bKontsevich amp Tyler 1999) and one could even adapt sim-ple Brownian staircase procedures to give better slope es-timates with minimum harm to threshold estimates (seethe comments in Strasburger 2001b)

Probably the best method for getting both thresholds andslopes is the Y (Psi) method of Kontsevich and Tyler(1999) The Y method is not totally novel it borrows tech-niques from many previous researchersmdashas Kontsevichand Tyler point out in a delightful paragraph that both clar-ifies their method and states its ancestry This paragraphsummarizes what is probably the most accurate and preciseapproach to estimating thresholds and slope so I reproduceit here

To summarize the Y method is a combination of solutionsknown from the literature The method updates the poste-rior probability distribution across the sampled space ofthe psychometric functions based on Bayesrsquo rule (Hall1968 Watson amp Pelli 1983) The space of the psychome-tric functions is two-dimensional (King-Smith amp Rose1997 Watson amp Pelli 1983) Evaluation of the psycho-metric function is based on computing the mean of theposterior probability distribution (Emerson 1986 King-Smith et al 1994) The termination rule is based on thenumber of trials as the most practical option (King-Smith

1438 KLEIN

et al 1994 Watson amp Pelli 1983) The placement of eachnew trial is based on one-step ahead minimum search(King-Smith 1984) of the expected entropy cost function(Pelli 1987)

A simplified rough description of the Y trial placementfor 2AFC runs is that initially trials are placed at about the90 correct level to establish a solid threshold Oncethreshold has been roughly established trials are placed atabout the 90 and the 70 levels to pin down the slopeThe precise levels depend on the assumed PF shape

Before one starts measuring slopes however one mustfirst be convinced that knowledge of slope can be impor-tant It is so for several reasons

1 Parametric fitting procedures either require knowl-edge of slope b or must have data that adequately con-strain the slope estimate Part of the discussion that fol-lows will be on how to obtain reliable threshold estimateswhen slope is not well known A tight knowledge of slopecan minimize the error of the threshold estimate

2 Strasburger (2001a 2001b) gives good argumentsfor the desirability of knowing the shape of the PF He isinterested in differences between foveal and peripheralvisual processing and he uses the PF slope as an indica-tion of differences

3 Slopes are different for different stimuli and thiscan lead to misleading results if slope is ignored It mayhelp to describe my experience with the Modelfest pro-ject (Carney et al 2000) to make this issue concreteThis continuing project involves a dozen laboratoriesaround the world We have measured detection thresh-olds for a wide variety of visual stimuli The data areused for developing an improved model of spatial visionSome of the stimuli were simple easily memorized lowspatial frequency patterns that tend to have shallowerslopes because a stimulus-known-exactly template canbe used efficiently and with minimal uncertainty On theother hand tiny Modelfest stimuli can have a good dealof spatial uncertainty and are therefore expected to pro-duce PFs with steeper slopes Because of the slope changethe pattern of thresholds will depend on the level atwhich threshold is measured When this rich dataset isanalyzed with f ilter models that make estimates formechanism bandwidths and mechanism pooling thesedifferent slope-dependent threshold patterns will lead todifferent estimates for the model parameters Several ofus in the Modelfest data gathering group argued that weshould use a data-gathering methodology that wouldallow estimation of PF slope of each of the 45 stimuliHowever I must admit that when I was a subject my pa-tience wore thin and I too chose the quicker runs Al-though I fully understand the attraction of short runs fo-cused on thresholds and ignoring slope I still think thereare good reasons for spreading trials out For one thinginterspersing trials that are slightly visible helps the sub-ject maintain memory of the test pattern

4 Slopes can directly resolve differences between mod-els In my own research for example my colleagues and

I are able to use the PF slope to distinguish long-range fa-cilitation that produces a low-level excitation or noise re-duction (typical with a cross-orientation surround) from ahigher level uncertainty reduction (typical with a same-orientation surround)

Throwing-Out Rules Goodness-of-FitThere are occasions when a staircase goes sour Most

often this happens when the subjectrsquos sensitivity changesin the middle of a run possibly because of fatigue It istherefore good to have a well-specified rule that allowsnonstationary runs to be thrown out The throwing-out rulecould be based on whether the PF slope estimate is toolow or too high Alternatively a standard approach is tolook at the chi-square goodness-of-f it statistic whichwould involve an assumption about the shape of the un-derlying PF I have a commentary in Section III regard-ing biases in the chi-square metric that were found byWichmann and Hill (2001a) Biases make it difficult touse standard chi-square tables The trouble with this ap-proach of basing the goodness-of-fit just on the cumulateddata is that it ignores the sequential history of the run TheBrownian upndashdown staircase makes the sequential historyvividly visible with a plot of sequentially tested levelsWhen fatigue or adaptation sets in one sees the tested lev-els shift from one stable point to another Leek Hanna andMarshall (1991) developed a method of assessing the non-stationarity of a psychometric function over the course ofits measurement using two interleaved tracks EarlierHall (1983) also suggested a technique to address the sameproblem I encourage further research into the develop-ment of a non-stationarity statistic that can be used to dis-card bad runs This statistic would take the sequential his-tory into account as well as the PF shape and the chi-square

The development of a ldquothrowing-outrdquo rule is the ex-treme case of the notion of weighted averaging One im-portant reason for developing good estimators for good-ness-of-fit and good estimators for threshold variance isto be able to optimally combine thresholds from repeatedexperiments I shall return to this issue in connectionwith the Wichmann and Hill (2001a) analysis of bias ingoodness-of-fit metrics

Final Thought The Method Should Depend on the Situation

Also important is that the method chosen should be ap-propriate for the task For example suppose one is usingwell-experienced observers for testing and retesting sim-ilar conditions in which one is looking for small changesin threshold or one is interested in the PF shape In suchcases one has a good estimate of the expected thresholdand the signal detection method of constant stimuli withblanks and with ratings might be best On the other handfor clinical testing in which one is highly uncertain aboutan individualrsquos threshold and criterion stability a 2AFClikelihood or staircase method (with a confidence rating)might be best

MEASURING THE PSYCHOMETRIC FUNCTION 1439

III DATA-FITTING METHODS FOR ESTIMATING THRESHOLD AND SLOPEPLUS GOODNESS-OF-FIT ESTIMATION

This final section of the paper is concerned with whatto do with the data after they have been collected Wehave here the standard Bayesian situation Given the PFit is easy to generate samples of data with confidence in-tervals But we have the inverse situation in which we aregiven the data and need to estimate properties of the PFand get confidence limits for those estimates The pair ofarticles in this special issue by Wichmann and Hill (2001a2001b) as well as the articles by King-Smith et al (1994)Treutwein (1995) and Treutwein and Strasburger (1999)do an outstanding job of introducing the issues and providemany important insights For example Emerson (1986)and King-Smith et al emphasize the advantages of meanlikelihood over maximum likelihood The speed of mod-ern computers makes it just as easy to calculate meanlikelihood as it is to calculate maximum likelihood As an-other example Treutwein and Strasburger (1999) demon-strate the power of using Bayesian priors for estimatingparameters by showing how one can estimate all four pa-rameters of Equation 1A in fewer than 100 trials This im-portant finding deserves further examination

In this section I will focus on four items First I will dis-cuss the relative merits of parametric versus nonparamet-ric methods for estimating the PF properties The paperby Miller and Ulrich (2001) provides a detailed analysis ofa nonparametric method for obtaining all the moments ofthe PF I will discuss some advantages and limitations oftheir method Second is the question of goodness-of-fitWichmann and Hill (2001a) show that when the numberof trials is small the goodness-of-fit can be very biasedI will examine the cause of that bias Third is the issue dis-cussed by Wichmann and Hill (2001a) that of how lapseseffect slope estimates This topic is relevant to Strasburg-errsquos (2001b) data and analysis Finally there is the possiblycontroversial topic of an upward slope bias in adaptivemethods that is relevant to the papers of Kaernbach (2001b)and Strasburger (2001b)

Parametric Versus Nonparametric FitsThe main split between methods for estimating pa-

rameters of the psychometric function is the distinctionbetween parametric and nonparametric methods In aparametric method one uses a PF that involves severalparameters and uses the data to estimate the parametersFor example one might know the shape of the PF and es-timate just the threshold parameter McKee et al (1985)show how slope uncertainty affects threshold uncertaintyin probit analysis The two papers by Wichmann and Hill(2001a 2001b) are an excellent introduction to many is-sues in parametric fitting An example of a nonparamet-ric method for estimating threshold would be to averagethe levels of a Brownian staircase after excluding the firstfour or so reversals in order to avoid the starting level biasBefore I served as guest editor of this special symposium

issue I believed that if one had prior knowledge of the exactPF shape a parametric method for estimating threshold wasprobably better than nonparametric methods After read-ing the articles presented here as well as carrying out myrecent simulations I no longer believe this I think thatin the proper situation nonparametric methods are as goodas parametric methods Furthermore there are situationsin which the shape of the PF is unknown and nonparamet-ric methods can be better

Miller and Ulrichrsquos Article on the Nonparametric SpearmanndashKaumlrber Method of Analysis

The standard way to extract information from psycho-metric data is to assume a functional shape such as a Wei-bull cumulative normal logit dcent and then vary the pa-rameters until likelihood is maximized or chi-square isminimized Miller and Ulrich (2001) propose using theSpearmanndashKaumlrber (SK) nonparametric method whichdoes not require a functional shape They start with a prob-ability density function defined by them as the derivativeof the PF

(33)

where s(i) is the stimulus intensity at the ith level andP(i) is the probability correct at the that level The notationpdf(i) is meant to indicate that it is the pdf for the intervalbetween i ndash 1 and i Miller and Ulrich give the r th momentof this distribution as their Equation 3

(34)

The next four equations are my attempt to clarify their ap-proach I do this because when applicable it is a finemethod that deserves more attention Equation 34 prob-ably came from the mean value theorem of calculus

(35)

where f (s) 5 sr11 and f cent(smid) 5 (r 1 1)smidr is the deriv-

ative of f with respect to s somewhere within the intervalbetween s(i 1) and s(i) For a slowly varying functionsmid would be near the midpoint Given Equation35 Equa-tion 34 becomes

(36)

where Ds 5 s(i) s(i 1) Equation 36 is what one wouldmean by the rth moment of the distribution The case r 50 is important since it gives the area under the pdf

(37)

by using the definition in Equation 33 The summation inEquation 37 gives the area under the pdf For it to be aproper pdf its area must be unity which means that theprobability at the highest level P(imax) must be unity andthe probability at the lowest level Pmin must be zero asMiller and Ulrich (2001) point out

m0 = =aringpdf( ) ( ) ( ) max mini s P i P ii

D

mrr

i

i s s= aring pdf mid( ) D

f s i f s i f s s i s i( ) ( ) ( ) ( ) ( ) [ ] [ ] = cent [ ]1 1mid

mrr r

ri s i s i=

+ [ ]aring + +11

11 1pdf( ) ( ) ( )

pdf( )( ) ( )

( ) ( )i

P i P i

s i s i= 1

1

1440 KLEIN

The next simplest SK moment is for r 5 1

(38)

from Equation 33 This first moment provides an esti-mate of the centroid of the distribution corresponding tothe center of the psychometric function It could be usedas an estimate of threshold for detection or PSE for dis-crimination An alternate estimator would be the medianof the pdf corresponding to the 50 point of the PF (theproper definition of PSE for discrimination)

Miller and Ulrich (2001) use Monte Carlo simulationsto claim that the SK method does a better job of extract-ing properties of the psychometric function than do para-metric methods One must be a bit careful here since intheir parametric fitting they generate the data by usingone PF and then fit it with other PFs But still it is a sur-prising claim if this simple method is so good why havepeople not been using it for years There are two possibleanswers to this question One is that people never thoughtof using it before Miller and Ulrich The other possibil-ity is that it has problems I have concluded that the an-swer is a combination of both factors I think that there isan important place for the SK approach but its limitationsmust be understood That is my next topic

The most important limitation of the SK nonparamet-ric method is that it has only been shown to work well formethod of constant stimulus data in which the PF goesfrom 0 to 1 I question whether it can be applied with thesame confidence to 2AFC and to adaptive procedures withunequal numbers of trials at each level The reason for thislimitation is that the SK method does not offer a simpleway to weight the different samples Let us consider twosituations with unequal weighting adaptive methods and2AFC detection tasks

Adaptive methods place trials near threshold but withvery uneven distribution of number of trials at differentlevels The SK method gives equal weighting to sparselypopulated levels that can be highly variable thus con-tributing to a greater variance of parameter estimates Letus illustrate this point for an extreme case with five levelss 5 [1 2 3 4 5] Suppose the adaptive method placed [11 1 98 1] trials at the five levels and the number correctwere [0 1 1 49 1] giving probabilities [0 1 1 5 1] andpdf 5 [1 0 5 5] The following PEST-like (Taylor ampCreelman 1967) rules could lead to this unusual dataMove down a level if the presently tested level has greaterthan P1 correct move up three levels if the presentlytested level has less than P2 correct P1 can be anynumber less than 33 and P2 can be any number greaterthan 67 The example data would be produced by start-ing at Level 5 and getting four in a row correct followedby an error followed by a wide set possible of alternat-ing responses for which the PEST rules would leave thefourth level as the level to be tested The SK estimate of the

center of the PF is given by Equation 38 to be micro1 5 200Since a pdf with a negative probability is distasteful tosome one can monotonize the PF according to the proce-dure of Miller and Ulrich (which does include the numberof trials at each level) The new probabilities are [0 5151 51 1] with pdf 5 [51 0 0 49] The new estimate ofthreshold is micro1 5 297 However given the data with 4998correct at Level 4 an estimate that includes a weightingaccording to the number of trials at each level would placethe centroid of the distribution very close to micro1 5 4 Thuswe see that because the SK procedure ignores the numberof trials at each level it produces erroneous estimates ofthe PF properties This example with shifting thresholdswas chosen to dramatize the problems of unequal weight-ing and the effect of monotonizing

A similar problem can occur even with the method ofconstant stimuli with equal number of trials at each levelbut with unequal binomial weighting as occurs in 2AFCdata The unequal weighting occurs because of the 50lower asymptote The binomial statistics error bar of dataat 55 correct is much larger than for data at 95 cor-rect Parametric procedures such as probit analysis giveless weighting to the lower data This is why the most ef-ficient placement of trials in 2AFC is above 90 correctas I have discussed in Section II Thus it is expected thatfor 2AFC the SK method would give threshold estimatesless precise than the estimates given by a parametricmethod that includes weighting

I originally thought that the SK method would fail on alltypes of 2AFC data However in the discussion followingEquation 7 I pointed out that for 2AFC discriminationtasks there is a way of extending the data to the 0ndash100range that produces weightings similar to those for a yesnotask Rather than use the standard Equation 2 to deal withthe 50 lower asymptote one should use the method dis-cussed in connection with Figure 3 in which one can usefor the ordinate the percent of correct responses in Inter-val 2 and use the signal strength in Interval 2 minus thatin Interval 1 for the abscissa That procedure produces aPF that goes from 0 to 100 thereby making it suit-able for SK analysis while also removing a possible bias

Now let us examine the SK method as applied to datafor which it is expected to work such as discriminationPFs that go from 0 to 100 These are tasks in which thelocation of the psychometric function is the PSE and theslope is the threshold

A potentially important claim by Miller and Ulrich(2001) is that the SK method performs better than probitanalysis This is a surprising claim since their data aregenerated by a probit PF one would expect probit analy-sis to do well especially since probit analysis does espe-cially well with discrimination PFs that go from 0 to 100To help clarify this issue Miller and I did a number ofsimulations We focused on the case of N 5 40 with 8 tri-als at each of 5 levels The cumulative normal PF (Equa-tion 14) was used with equally spaced z scores and with thelowest and highest probabilities being 1 and 99 Theprobabilities at the sample points are P(i) 5 1 1222

m1

2 21

2

11

2

=

= [ ] +

aring

aring

pdf( )( ) ( )

( ) ( )( ) ( )

is i s i

P i P is i s i

MEASURING THE PSYCHOMETRIC FUNCTION 1441

50 8778 and 99 corresponding to z scores ofz(i) 5 ndash2328 1164 0 1164 and 2328 Nonmonot-onicity is eliminated as Miller and Ulrich suggest (seethe Appendix for Matlab Code 4 for monotonizing) I didMonte Carlo simulations to obtain the standard deviationof the location of the PF micro1 given by Equation38 I foundthat both the SK method and the parametric probit method(assuming a fixed unity slope) gave a standard deviationof between 028 and 029 Miller and Ulrichrsquos result re-ported in their Table 17 gives a standard error of 0071However their Gaussian had a standard deviation of 025whereas my z-score units correspond to a standard devi-ation of unity Thus their standard error must be multipliedby 4 giving a standard error of 0284 in excellent agree-ment with my value So at least in this instance the SKmethod did not do better than the parametric method (withslope fixed) and the surprise mentioned at the beginningof this paragraph did not materialize I did not examineother examples where nonparametric methods might dobetter than the matched parametric method However thesimulations of McKee et al (1985) give some insight intowhy probit analysis with floating slope can do poorly evenon data generated from probit PFs McKee et al showthat when the number of trials is low and if stimulus lev-els near both 0 and 100 are not tested the slope can bevery flat and the predicted threshold can be quite distantIf care is not taken to place a bound on these probit esti-mates one would expect to find deviant threshold esti-mates The SK method on the other hand has built-inbounds for threshold estimates so outliers are avoidedThese bounds could explain why SK can do better thanprobit When the PF slope is uncertain nonparametricmethods might work better than parametric methods adecided advantage for the SK method

It is instructive to also do an analytic calculation of thevariance based on an analysis similar to what was doneearlier If only the ith level of the five had been testedthe variance of the PF location (the PSE in a discrimina-tion task) would have been as follows (Gourevitch ampGalanter 1967)

(39)

where var(P) is given by Equation 27 and PF slope is

(40)

where for the cumulative normal dzi dxi 5 1s where sis the standard deviation of the pdf and

(41)

from Equation 3A By combining these equations one gets

(42)

Note that if all trials were placed at the optimal stimuluslocation of z 5 0 (P 5 5) Equation 42 would becomes2p 2Ni as discussed earlier When all five levels arecombined the total variance is smaller than each individ-ual variance as given by the variance summation formula

(43)

Finally upon plugging in the five values of zi rangingfrom ndash2328 to 12328 and Pi ranging from 1 to 99the standard error of the PF location is

(44)

This value is precisely the value obtained in our MonteCarlo simulations of the nonparametric SK method andalso by the slope fixed parametric probit analysis Thistriple confirmation shows that the SK method is excel-lent in the realm where it applies (equal number of trialsat levels that span 0 to 100) It also confirms our sim-ple analytic formula

The data presented in Miller and Ulrichrsquos Table 17 giveus a fine opportunity to ask about the efficiency of themethod of constant stimuli that they use For the N 5 40example discussed here the variance is

(45)

This value is more than twice the variance of p (2 Ntot)that is obtained for the same PF using adaptive methodsincluding the simple upndashdown staircase that place trialsnear the optimal point

The large variance for the Miller and Ulrich stimulusplacement is not surprising given that of the five stimu-lus levels the two at P 5 1 and 99 are quite inefficientThe inefficient placement of trials is a common problemfor the method of constant stimuli as opposed to adaptivemethods However as I mentioned earlier the method ofconstant stimuli combined with multiple response ratingsin a signal detection framework may be the best of allmethods for detection studies for revealing properties of thesystem near threshold while maintaining a high efficiency

A curious question comes up about the optimal placementof trials for measuring the location of the psychometric func-tion With the original function the optimal placement is atthe steepest part of the function (at P 5 0) However withthe SK method when one is determining the location ofthe pdf one might think that the optimal placement of sam-ples should occur at the points where the pdf is steepestThis contradiction is resolved when one realizes that onlythe original samples are independent The pdf samples areobtained by taking differences of adjacent PF values soneighboring samples are anticorrelated Thus one cannot

var tottot

= =SEN

2 3 24

SE = =var tot12 0 2844

var var tot = aring1 1i

var( )exp

PSE i i i

i

i

P Pz

N= ( ) ( )

2 12

2

ps

dP

dz

zi

i

iaeligegraveccedil

oumloslashdivide

=( )2 2

2

exp

p

dP

dx

dP

dz

dz

dxi

i

i

i

i

i

= times

var( )var

PSE i

i

i

i

P

dP

dx

=( )

aeligegraveccedil

oumloslashdivide

2

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

1434 KLEIN

For the yes no case Equation 26 is based on a unityslope ROC curve with the subjectrsquos criterion at the neg-ative diagonal of the ROC curve where the hit rate equalsthe correct rejection rate This would be the most effi-cient operating point The variance for a nonunity slopeROC curve was considered by Klein Stromeyer andGanz (1974) From binomial statistics

(27)

with n 5 N for 2AFC and n 5 N2 for yesno since forthis binary yesno task the number of signal and blanktrials is half the total number of trials

Putting all of these factors together gives

(28)

for both the yesno and the 2AFC cases The only dif-ference between the two cases is the definition of n andthe definition of dcent (dcent2AFC 5 205z and dcentYN 5 2z for asymmetric criterion) When this is expressed in terms ofN (number of trials) and z (z score of probability cor-rect) we get

(29)

for both 2AFC and yesno That is when the percent cor-rects are all equal (P2AFC 5 Phit 5 Pcorrect rejection) thethreshold variance for 2AFC and yesno are equal Theminimum value of var(Y ) is 164(Nb2) occurring at z 5157 (P 5 94) This value of 94 correct is the optimumvalue at which to test for both 2AFC and yesno tasks in-dependent of b

This result that the threshold variance is the same for2AFC and yesno methods seems to disagree with Figure 4of McKee et al (1985) where the standard errors of thethreshold estimates are plotted for 2AFC and yesno asa function of the number of trials The 2AFC standarderror (their Figure 4) is twice that of yesno correspond-ing to a factor of four in variance However the McKeeet al analysis is misleading since all they (we) did for theyesno task was to use Equation 2 to expand the PF to a 0to 100 range thereby doubling the PF slope Thatrescaling is not the way to convert a 2AFC detection taskto the yesno task of the same detectability A signal de-tection methodology is needed for that conversion aswas done in the present analysis

One surprise is that the optimal testing probability isindependent of the PF slope b This finding is a generalproperty of PFs in which the dependence on the slopeparameter has the form xb The 94 level corresponds todcentYN 5 315 and dcent2AFC 5 223 For the commonly usedP 5 75 (z 5 0765 dcentYN 5 135 dcent2AFC 5 0954)var(Y ) 5 408(Nb2) which is about 25 times the opti-mal value obtained by placing the stimuli at 94 Green(1990) had previously pointed out that the 94 and 92levels were the optimal points to test for the cumulative

normal and logit functions respectively because of theminimum threshold estimate variability when one is test-ing at that point on the PF Other PFs can have optimalpoints at slightly lower levels but these are still above thelevels to which adaptive methods are normally directed

It is also useful to compare the 2AFC and yesno at thesame dcent rather than at their optimal points In that case fordcent values fixed at [1 2 3] the ratio of the 2AFC varianceto the yesno variance is [182 136 079] At low dcent thereis an advantage for the 2AFC method and that reverses athigh dcent as is expected from the optimal dcent being higher foryesno than for 2AFC In general one should run the yesno method at dcent levels above 20 (84 correct in the blankand signal intervals corresponds to dcent 5 2)

The equality of the optimal var(Y ) for 2AFC and yesno methods is not only surprising it seems paradoxicalmdashas can be seen by the following gedankenexperiment Sup-pose subjects close their eyes on the first interval of the2AFC trial and base their judgments on just the secondinterval as if they were doing a yesno task since blanksand stimuli are randomly intermixed Instead of sayingldquoyesrdquo or ldquonordquo they would respond ldquoInterval 2rdquo or ldquoInter-val 1rdquo respectively so that the 2AFC task would become ayesno task The paradox is that one would expect the vari-ance to be worse than for the eyes open 2AFC case sinceone is ignoring information However Equation29 showsthat the 2AFC and yesno methods have identical optimalvariance I suspect that the resolution of this paradox is thatin the yesno method with a stable criterion the optimalvariance occurs at a higher dcent value (315) than the 2AFCvalue (223) The dcent power law function that I chose doesnot saturate at large dcent levels giving a benefit to the yesno method

In order to check whether the paradox still holds witha more realistic PF (a dcent function that saturates at largedcent ) I now compare the 2AFC and yesno variance usinga Weibull function with a slope b In order to connect2AFC and yesno I need to use signal detection theoryand the StromeyerndashFoley dcent function (Equation 20) withthe ldquoWeibull parametersrdquo b 5 106b a 5 0614 and w 51 39b Following Equation 22 I pointed out that withthese parameter values the difference between the 2AFCWeibull function and the 2AFC dcent function (converted toa probability ordinate) is less than 0004 After the de-rivative Equation 25 is appropriately modified we findthat the optimal 2AFC variance is 342(Nb2) For the yesno case the optimal variance is 402(Nb2) This result re-solves the paradox The optimal 2AFC variance is about18 lower than the yesno variance so closing onersquos eyesin one of the intervals does indeed hurt

We shouldnrsquot forget that if one counts stimulus presen-tations Np rather than trials N the optimal 2AFC variancebecomes 684(Npb2) as opposed to 402(Npb2) for yesnoThus for the Linschoten experiments with each stimuluspresentation being slow the yesno methodology is moreaccurate for a given number of stimulus presentationseven at low dcent values Kaernbach (1990) compares 2AFC

var( ) exp( )

( )Y z

P P

N bz= ( )2

122

p

var( ) exp( )

( )Y z

P P

n bd= ( ) cent

412

2p

var( )( )

PP P

n= 1

MEASURING THE PSYCHOMETRIC FUNCTION 1435

and yesno adaptive methods with simulations and ac-tual experiments His abscissa is number of presenta-tions and his results are compatible with our findings

The objective yesno method that I have been discussingis inefficient since 50 of the trials are blanks Greaterefficiency is possible if several points on the PF are mea-sured (method of constant stimuli) with the number of blanktrials the same as at the other levels For example if fivelevels of stimuli were measured including blanks then theblanks would be only 20 of the total rather than 50 Forthe 2AFC paradigm the subject would still have to look at50 of the presentations being blanks The yesno eff i-ciency can be further improved by having the subjectrespond to the multiple stimuli with multiple ratings ratherthan a binary yesno response This rating scale methodof constant stimuli is my favorite method and I have beenusing it for 20 years (Klein amp Stromeyer 1980) The mul-tiple ratings allow one to measure the psychometric func-tion shape for dcents as large as 4 (well into the dipper regionwhere stimuli are slightly above threshold) It also allowsone to measure the ROC slope (presence of multiplicativenoise) Simulations of its benefits will be presented inKlein (2002)

Insight into how the number of responses affects thresh-old variance can be obtained by using the cumulative nor-mal PF (Equation 17) considered by Kaernbach (2001b)and Miller and Ulrich (2001) with g 5 0 Typically this isthe case of yesno discrimination but it can also be 2AFCdiscrimination with an ordinate going from 0 to 100as I have discussed in connection with Figure 3 For a cu-mulative normal PF with a binary response and for a sin-gle stimulus level being tested the threshold variance is(Equation 19) as follows

(30)

from Equation 27 and forthcoming Equations 40 and 41The optimal level to test is at P 5 05 so the optimal vari-ance becomes

(31)

If instead of a binary response the observer gives an ana-log response (many many rating categories) then the vari-ance is the familiar connection between standard deviationand standard error

(32)

With four response categories the variance is 119s2Nwhich is closer to the analog than to the binary case Ifstimuli with multiple strengths are intermixed (methodof constant stimuli) then the benefit of going from a bi-nary response to multiple responses is greater than that in-dicated in Equations 31ndash32 (Klein 2002)

A brief list of other problems with the 2AFC method ap-plied to detection comprises the following

1 2AFC requires the subject to memorize and thencompare the results of two subjective responses It is cog-nitively simpler to report each response as it arrives Sub-jects without much experience can easily make mistakes(lapses) simply becoming confused about the order of thestimuli Klein (1985) discusses memory load issues rele-vant to 2AFC

2 The 2AFC methodology makes various types ofanalysis and modeling more difficult because it requiresthat one average over all criteria The response variable in2AFC is the Interval 2 activation minus the Interval 1 ac-tivation (see the abscissa in Figure 2) This variable goesfrom minus infinity to plus infinity whereas the yesnovariable goes from zero to infinity It therefore seems moredifficult to model the effects of probability summation anduncertainty for 2AFC than for yesno

3 Models that relate psychophysical performance to un-derlying processes require information about how the noisevaries with signal strength (multiplicative noise) The rat-ing scale method of constant stimuli not only measures dcentas a function of contrast it also is able to provide an esti-mate of how the variance of the signal distribution (theROC slope) increases with contrast The 2AFC methodlacks that information

4 Both the yesno and 2AFC methods are supposed toeliminate the effects of bias Earlier I pointed out that in myrecent 2AFC discrimination experiments when the stim-ulus is near threshold I find I have a strong bias in favor ofresponding ldquoStimulus 2rdquo Until I learned to compensatefor this bias (not typically done by naiumlve subjects) my dcentwas underestimated As I have discussed it is possible tocompensate for this bias but that is rarely done

Improving the Brownian (Up ndashDown) StaircaseAdaptive methods with a goal of placing trials at an

optimal point on the PF are efficient Leek (2001) pro-vides a nice overview of these methods There are twogeneral categories of adaptive methods those based onmaximum (or mean) likelihood and those based on sim-pler staircase rules The present section is concernedwith the older staircase methods which I will callBrownian methods I used the name ldquoBrownianrdquo becauseof the upndashdown staircasersquos similarity to one-dimensionalBrownian motion in which the direction of each step israndom The familiar upndashdown staircase is Brownianbecause at each trial the decision about whether to go upor down is decided by a random factor governed by prob-ability P(x) I want to make this connection to Brownianmotion because I suspect that there are results in thephysics literature that are relevant to our psychophysicalstaircases I hope these connections get made

A Brownian staircase has a minimal memory of the runhistory it needs to remember the previous step (includingdirection) and the previous number of reversals or numberof trials (used for when to stop and for when to decreasestep size) A typical Brownian rule is to decrease signalstrength by one step (like 1 dB) after a correct responseand to increase it three steps after an incorrect response (a

var( thresh) = s 2

N

var( thresh) = timesp s2

2

N

var(var( )

( )exp thresh) =aeligegraveccedil

oumloslashdivide

= ( )P

N dPdy

P P zN2

22

2 1p s

1436 KLEIN

one downndashthree up rule) I was pleased to learn recentlythat his rule works nicely for an objective yesno task(Kaernbach 1990) as well as 2AFC as discussed earlier

In recent years likelihood methods (Pentland 1980Watson amp Pelli 1983) have become popular and havesomewhat displaced Brownian methods There is a wide-spread feeling that likelihood methods have substantiallysmaller threshold variance than do staircases in whichthresholds are obtained by simply averaging the levelstested Given the importance of this issue for informingresearchers about how best to measure thresholds it issurprising that there have been so few simulations com-paring staircases and optimal methods Pentlandrsquos (1980)results showing very large threshold variance for a stair-case method are so dramatically different from the re-sults of my simulations that I am skeptical The best pre-vious simulations of which I am aware are those of Green(1990) who compared the threshold variance from sev-eral 2AFC staircase methods with the ideal variance(Equation 24) He found that as long as the staircase ruleplaced trials above 80 correct the ratio of observed toideal threshold variance was about 14 This value is thesquare of the inverse of Greenrsquos finding that the ratio ofthe expected to the observed sigma is about 85

In my own simulations (Klein 2002) I found that aslong as the final step size was less than 02s the thresh-old variance from simply averaging over levels was closeto the optimal value Thus the staircase threshold varianceis about the same as the variance found by other optimalmethods One especially interesting finding was that thecommon method of averaging an even number of rever-sal points gave a variance that was about 10 higher thanaveraging over all levels This result made me wonderhow the common practice of averaging reversals ever gotstarted Since Green (1990) averaged over just the reversalpoints I suspect that his results are an overestimate of thestaircase variance In addition since Green specifies stepsize in decibels it is difficult for me to determine whetherhis final step size was small enough Since the thresholdvariance is inversely related to slope it is important to re-late step size to slope rather than to decibels The most nat-ural reference unit is s the standard deviation associatedwith the cumulative normal In summary I suggest that thisgeneral feeling of distrust of the simple Brownian staircaseis misguided and that simple staircases are often as goodas likelihood methods In the section Summary of Advan-tages of Brownian Staircase I point out several advantagesof the simple staircase over the likelihood methods

A qualification should be made to the claim above thataveraging levels of a simple staircase is as good as a fancylikelihood method If the staircase method uses a too smallstep size at the beginning and if the starting point is farfrom threshold then the simple staircase can be inefficientin reaching the threshold region However at the end ofthat run the astute experimenter should notice the largediscrepancy between the starting point and the thresholdand the run should be redone with an improved startingpoint or with a larger initial step size that will be reducedas the staircase proceeds

Increase number of 2AFC response categories In-dependent of the type of adaptive method used the 2AFCtask has a flaw whereby a series of lucky guesses at lowstimulus levels can erroneously drive adaptive methods tolevels that are too low Some adaptive methods with a smallnumber of trials are unable to fully recover from a string oflucky guesses early in the run One solution is to allow thesubject to have more than two responses For reasons thatI have never understood people like to have only two re-sponse categories in a 2AFC task Just as I am reluctant totake part in 2AFC experiments I am even more reluctantto take part in two response experiments The reason issimple I always feel that I have within me more than onebit of information after looking at a stimulus I usually haveat least four or more subjective states (2 bits) for a yesnotask HN LN LY HY where H and L are high and lowconfidence and N and Y are did not and did see it For a2AFC my internal states are H1 L1 L2 H2 for high andlow confidence on Intervals 1 and 2 Kaernbach (2001a) inthis special issue does an excellent job of justifying thelow-confidence response category He provides solid ar-guments simulations and experimental data on how alow-confidence category can minimize the ldquolucky guessrdquoproblem

Kaernbach (2001a) groups the L1 and L2 categoriestogether into a ldquodonrsquot knowrdquo category and argues per-suasively that the inclusion of this third response can in-crease the precision and accuracy of threshold estimateswhile also leaving the subject a happier person He callsit the ldquounforced choicerdquo method Most of his article con-cerns the understandable fear of introducing a responsebias exactly what 2AFC was invented to avoid He has de-veloped an intelligent rule for what stimulus level shouldfollow a ldquodonrsquot knowrdquo response Kaernbach shows withboth simulations and real data that with his method thispotential response bias problem has only a second-ordereffect on threshold estimates (similar to the 2AFC inter-val bias discussed around Equation 7) and its advantagesoutweigh the disadvantages I urge anyone using the 2AFCmethod to read it I conjecture that Kaernbachrsquos method isespecially useful in trimming the lower tail of the distri-bution of threshold estimates and also in reducing the in-terval bias of 2AFC tasks since the low confidence ratingwould tend to be used for trials on which there is the mostuncertainty

Although Kaernbach has managed to convince me ofthe usefulness of his three-response method he might havetrouble convincing others because of the response biasquestion I should like to offer a solution that might quellthe response bias fears Rather than using 2AFC with threeresponses one can increase the number of responses tofourmdashH1 L1 L2 H2mdashas discussed two paragraphsabove The L rating can be used when subjects feel theyare guessing

To illustrate how this staircase would work considerthe case in which we would like the 2AFC staircase to con-verge to about 84 correct A 5 upndash1 down staircase(5 levels up for a wrong response and 1 level down for acorrect response) leads to an average probability correct

MEASURING THE PSYCHOMETRIC FUNCTION 1437

of 56 5 8333 If one had zero information about thetrial and guessed then one would be correct half the timeand the staircase would shift an average of (5 1)2 512 levels Thus in his scheme with a single ldquodonrsquot knowrdquoresponse Kaernbach would have the staircase move up2 levels following a ldquodonrsquot knowrdquo response One mightworry that continued use of the ldquodonrsquot knowrdquo responsewould continuously increase the level so that one wouldget an upward bias But Kaernbach points out that theldquodonrsquot knowrdquo response replaces incorrect responses to agreater extent than it replaces correct responses so thatthe 12 level shift is actually a more negative shift than the15 shift associated with a wrong response For my sug-gestion of four responses the step shifts would be ndash1 1113 and 15 levels for responses HC LC LI and HIwhere H and L stand for high and low confidence and Cand I stand for correct and incorrect A slightly more con-servative set of responses would be ndash1 0 14 15 inwhich case the low-confidence correct judgment leaves thelevel unchanged In all these examples including Kaern-bachrsquos three-response method the low-confidence re-sponse has an average level shift of 12 when one is guess-ing The most important feature of the low confidencerating is that when one is below threshold giving a lowconfidence rating will prevent one from long strings oflucky guesses that bring one to an inappropriately lowlevel This is the situation that can be difficult to overcomeif it occurs in the early phase of a staircase when the stepsize can be large The use of the confidence rating wouldbe especially important for short runs where the extra bitof information from the subject is useful for minimizingerroneous estimates of threshold

Another reason not to be fearful of the multiple-response2AFC method is that after the run is finished one can es-timate threshold by using a maximum likelihood proce-dure rather than by averaging the run locations Standardlikelihood analysis would have to be modified for Kaern-bachrsquos three-response method in order to deal with theldquodonrsquot knowrdquo category For the suggested four-responsemethod one could simply ignore the confidence ratings forthe likelihood analysis My simulations discussed earlierin this section and Greenrsquos (1990) reveal that averagingover the levels (excluding a number of initial levels toavoid the starting point bias) has about as good a preci-sion and accuracy as the best of the likelihood methodsIf the threshold estimates from the likelihood method andfrom the averaging levels method disagree that run couldbe thrown out

Summary of advantages of Brownian staircaseSince there is no substantial variance advantage to thefancier likelihood methods one can pay more attention toa number of advantages associated with simple staircases

1 At the beginning of a run likelihood methods mayhave too little inertia (the sequential stimulus levels jumparound easily) unless a ldquostiff rdquo prior is carefully chosenBy inertia I refer to how difficult it is for the next level todeviate from the present level At the beginning of a runone would like to familiarize the subject with the stimu-

lus by presenting a series of stimuli that gradually increasein difficulty This natural feature of the Brownian stair-case is usually missing in QUEST-like methods Theslightly inefficient first few trials of a simple staircasemay actually be a benefit since it gives the subject somepractice with slightly visible stimuli

2 Toward the end of a maximum likelihood run one canhave the opposite problem of excessive inertia wherebyit is difficult to have substantial shifts of test contrastThis can be a problem if nonstationary effects are presentFor example thresholds can increase in mid-run owingto fatigue or to adaptation if a suprathreshold pedestal ormask is present In an old-fashioned Brownian staircasewith moderate inertia a quick look at the staircase trackshows the change of threshold in mid-run That observa-tion can provide an important clue to possible problemswith the current procedures A method that reveals non-stationarity is especially important for difficult subjectssuch as infants or patients in whom fatigue or inatten-tion can be an important factor

3 Several researchers have told me that the simplicity ofthe staircase rules is itself an advantage It makes writingprograms easier It is nicer for teaching students It alsosimplifies the uncertain business of guessing likelihoodpriors and communicating the methodology when writ-ing articles

Methods That Can Be Used to Estimate Slope as Well as Threshold

There are good reasons why one might want to learnmore about the PF than just the single threshold valueEstimating the PF slope is the first step in this directionTo estimate slope one must test the PF at more than onepoint Methods are available to do this (Kaernbach 2001bKontsevich amp Tyler 1999) and one could even adapt sim-ple Brownian staircase procedures to give better slope es-timates with minimum harm to threshold estimates (seethe comments in Strasburger 2001b)

Probably the best method for getting both thresholds andslopes is the Y (Psi) method of Kontsevich and Tyler(1999) The Y method is not totally novel it borrows tech-niques from many previous researchersmdashas Kontsevichand Tyler point out in a delightful paragraph that both clar-ifies their method and states its ancestry This paragraphsummarizes what is probably the most accurate and preciseapproach to estimating thresholds and slope so I reproduceit here

To summarize the Y method is a combination of solutionsknown from the literature The method updates the poste-rior probability distribution across the sampled space ofthe psychometric functions based on Bayesrsquo rule (Hall1968 Watson amp Pelli 1983) The space of the psychome-tric functions is two-dimensional (King-Smith amp Rose1997 Watson amp Pelli 1983) Evaluation of the psycho-metric function is based on computing the mean of theposterior probability distribution (Emerson 1986 King-Smith et al 1994) The termination rule is based on thenumber of trials as the most practical option (King-Smith

1438 KLEIN

et al 1994 Watson amp Pelli 1983) The placement of eachnew trial is based on one-step ahead minimum search(King-Smith 1984) of the expected entropy cost function(Pelli 1987)

A simplified rough description of the Y trial placementfor 2AFC runs is that initially trials are placed at about the90 correct level to establish a solid threshold Oncethreshold has been roughly established trials are placed atabout the 90 and the 70 levels to pin down the slopeThe precise levels depend on the assumed PF shape

Before one starts measuring slopes however one mustfirst be convinced that knowledge of slope can be impor-tant It is so for several reasons

1 Parametric fitting procedures either require knowl-edge of slope b or must have data that adequately con-strain the slope estimate Part of the discussion that fol-lows will be on how to obtain reliable threshold estimateswhen slope is not well known A tight knowledge of slopecan minimize the error of the threshold estimate

2 Strasburger (2001a 2001b) gives good argumentsfor the desirability of knowing the shape of the PF He isinterested in differences between foveal and peripheralvisual processing and he uses the PF slope as an indica-tion of differences

3 Slopes are different for different stimuli and thiscan lead to misleading results if slope is ignored It mayhelp to describe my experience with the Modelfest pro-ject (Carney et al 2000) to make this issue concreteThis continuing project involves a dozen laboratoriesaround the world We have measured detection thresh-olds for a wide variety of visual stimuli The data areused for developing an improved model of spatial visionSome of the stimuli were simple easily memorized lowspatial frequency patterns that tend to have shallowerslopes because a stimulus-known-exactly template canbe used efficiently and with minimal uncertainty On theother hand tiny Modelfest stimuli can have a good dealof spatial uncertainty and are therefore expected to pro-duce PFs with steeper slopes Because of the slope changethe pattern of thresholds will depend on the level atwhich threshold is measured When this rich dataset isanalyzed with f ilter models that make estimates formechanism bandwidths and mechanism pooling thesedifferent slope-dependent threshold patterns will lead todifferent estimates for the model parameters Several ofus in the Modelfest data gathering group argued that weshould use a data-gathering methodology that wouldallow estimation of PF slope of each of the 45 stimuliHowever I must admit that when I was a subject my pa-tience wore thin and I too chose the quicker runs Al-though I fully understand the attraction of short runs fo-cused on thresholds and ignoring slope I still think thereare good reasons for spreading trials out For one thinginterspersing trials that are slightly visible helps the sub-ject maintain memory of the test pattern

4 Slopes can directly resolve differences between mod-els In my own research for example my colleagues and

I are able to use the PF slope to distinguish long-range fa-cilitation that produces a low-level excitation or noise re-duction (typical with a cross-orientation surround) from ahigher level uncertainty reduction (typical with a same-orientation surround)

Throwing-Out Rules Goodness-of-FitThere are occasions when a staircase goes sour Most

often this happens when the subjectrsquos sensitivity changesin the middle of a run possibly because of fatigue It istherefore good to have a well-specified rule that allowsnonstationary runs to be thrown out The throwing-out rulecould be based on whether the PF slope estimate is toolow or too high Alternatively a standard approach is tolook at the chi-square goodness-of-f it statistic whichwould involve an assumption about the shape of the un-derlying PF I have a commentary in Section III regard-ing biases in the chi-square metric that were found byWichmann and Hill (2001a) Biases make it difficult touse standard chi-square tables The trouble with this ap-proach of basing the goodness-of-fit just on the cumulateddata is that it ignores the sequential history of the run TheBrownian upndashdown staircase makes the sequential historyvividly visible with a plot of sequentially tested levelsWhen fatigue or adaptation sets in one sees the tested lev-els shift from one stable point to another Leek Hanna andMarshall (1991) developed a method of assessing the non-stationarity of a psychometric function over the course ofits measurement using two interleaved tracks EarlierHall (1983) also suggested a technique to address the sameproblem I encourage further research into the develop-ment of a non-stationarity statistic that can be used to dis-card bad runs This statistic would take the sequential his-tory into account as well as the PF shape and the chi-square

The development of a ldquothrowing-outrdquo rule is the ex-treme case of the notion of weighted averaging One im-portant reason for developing good estimators for good-ness-of-fit and good estimators for threshold variance isto be able to optimally combine thresholds from repeatedexperiments I shall return to this issue in connectionwith the Wichmann and Hill (2001a) analysis of bias ingoodness-of-fit metrics

Final Thought The Method Should Depend on the Situation

Also important is that the method chosen should be ap-propriate for the task For example suppose one is usingwell-experienced observers for testing and retesting sim-ilar conditions in which one is looking for small changesin threshold or one is interested in the PF shape In suchcases one has a good estimate of the expected thresholdand the signal detection method of constant stimuli withblanks and with ratings might be best On the other handfor clinical testing in which one is highly uncertain aboutan individualrsquos threshold and criterion stability a 2AFClikelihood or staircase method (with a confidence rating)might be best

MEASURING THE PSYCHOMETRIC FUNCTION 1439

III DATA-FITTING METHODS FOR ESTIMATING THRESHOLD AND SLOPEPLUS GOODNESS-OF-FIT ESTIMATION

This final section of the paper is concerned with whatto do with the data after they have been collected Wehave here the standard Bayesian situation Given the PFit is easy to generate samples of data with confidence in-tervals But we have the inverse situation in which we aregiven the data and need to estimate properties of the PFand get confidence limits for those estimates The pair ofarticles in this special issue by Wichmann and Hill (2001a2001b) as well as the articles by King-Smith et al (1994)Treutwein (1995) and Treutwein and Strasburger (1999)do an outstanding job of introducing the issues and providemany important insights For example Emerson (1986)and King-Smith et al emphasize the advantages of meanlikelihood over maximum likelihood The speed of mod-ern computers makes it just as easy to calculate meanlikelihood as it is to calculate maximum likelihood As an-other example Treutwein and Strasburger (1999) demon-strate the power of using Bayesian priors for estimatingparameters by showing how one can estimate all four pa-rameters of Equation 1A in fewer than 100 trials This im-portant finding deserves further examination

In this section I will focus on four items First I will dis-cuss the relative merits of parametric versus nonparamet-ric methods for estimating the PF properties The paperby Miller and Ulrich (2001) provides a detailed analysis ofa nonparametric method for obtaining all the moments ofthe PF I will discuss some advantages and limitations oftheir method Second is the question of goodness-of-fitWichmann and Hill (2001a) show that when the numberof trials is small the goodness-of-fit can be very biasedI will examine the cause of that bias Third is the issue dis-cussed by Wichmann and Hill (2001a) that of how lapseseffect slope estimates This topic is relevant to Strasburg-errsquos (2001b) data and analysis Finally there is the possiblycontroversial topic of an upward slope bias in adaptivemethods that is relevant to the papers of Kaernbach (2001b)and Strasburger (2001b)

Parametric Versus Nonparametric FitsThe main split between methods for estimating pa-

rameters of the psychometric function is the distinctionbetween parametric and nonparametric methods In aparametric method one uses a PF that involves severalparameters and uses the data to estimate the parametersFor example one might know the shape of the PF and es-timate just the threshold parameter McKee et al (1985)show how slope uncertainty affects threshold uncertaintyin probit analysis The two papers by Wichmann and Hill(2001a 2001b) are an excellent introduction to many is-sues in parametric fitting An example of a nonparamet-ric method for estimating threshold would be to averagethe levels of a Brownian staircase after excluding the firstfour or so reversals in order to avoid the starting level biasBefore I served as guest editor of this special symposium

issue I believed that if one had prior knowledge of the exactPF shape a parametric method for estimating threshold wasprobably better than nonparametric methods After read-ing the articles presented here as well as carrying out myrecent simulations I no longer believe this I think thatin the proper situation nonparametric methods are as goodas parametric methods Furthermore there are situationsin which the shape of the PF is unknown and nonparamet-ric methods can be better

Miller and Ulrichrsquos Article on the Nonparametric SpearmanndashKaumlrber Method of Analysis

The standard way to extract information from psycho-metric data is to assume a functional shape such as a Wei-bull cumulative normal logit dcent and then vary the pa-rameters until likelihood is maximized or chi-square isminimized Miller and Ulrich (2001) propose using theSpearmanndashKaumlrber (SK) nonparametric method whichdoes not require a functional shape They start with a prob-ability density function defined by them as the derivativeof the PF

(33)

where s(i) is the stimulus intensity at the ith level andP(i) is the probability correct at the that level The notationpdf(i) is meant to indicate that it is the pdf for the intervalbetween i ndash 1 and i Miller and Ulrich give the r th momentof this distribution as their Equation 3

(34)

The next four equations are my attempt to clarify their ap-proach I do this because when applicable it is a finemethod that deserves more attention Equation 34 prob-ably came from the mean value theorem of calculus

(35)

where f (s) 5 sr11 and f cent(smid) 5 (r 1 1)smidr is the deriv-

ative of f with respect to s somewhere within the intervalbetween s(i 1) and s(i) For a slowly varying functionsmid would be near the midpoint Given Equation35 Equa-tion 34 becomes

(36)

where Ds 5 s(i) s(i 1) Equation 36 is what one wouldmean by the rth moment of the distribution The case r 50 is important since it gives the area under the pdf

(37)

by using the definition in Equation 33 The summation inEquation 37 gives the area under the pdf For it to be aproper pdf its area must be unity which means that theprobability at the highest level P(imax) must be unity andthe probability at the lowest level Pmin must be zero asMiller and Ulrich (2001) point out

m0 = =aringpdf( ) ( ) ( ) max mini s P i P ii

D

mrr

i

i s s= aring pdf mid( ) D

f s i f s i f s s i s i( ) ( ) ( ) ( ) ( ) [ ] [ ] = cent [ ]1 1mid

mrr r

ri s i s i=

+ [ ]aring + +11

11 1pdf( ) ( ) ( )

pdf( )( ) ( )

( ) ( )i

P i P i

s i s i= 1

1

1440 KLEIN

The next simplest SK moment is for r 5 1

(38)

from Equation 33 This first moment provides an esti-mate of the centroid of the distribution corresponding tothe center of the psychometric function It could be usedas an estimate of threshold for detection or PSE for dis-crimination An alternate estimator would be the medianof the pdf corresponding to the 50 point of the PF (theproper definition of PSE for discrimination)

Miller and Ulrich (2001) use Monte Carlo simulationsto claim that the SK method does a better job of extract-ing properties of the psychometric function than do para-metric methods One must be a bit careful here since intheir parametric fitting they generate the data by usingone PF and then fit it with other PFs But still it is a sur-prising claim if this simple method is so good why havepeople not been using it for years There are two possibleanswers to this question One is that people never thoughtof using it before Miller and Ulrich The other possibil-ity is that it has problems I have concluded that the an-swer is a combination of both factors I think that there isan important place for the SK approach but its limitationsmust be understood That is my next topic

The most important limitation of the SK nonparamet-ric method is that it has only been shown to work well formethod of constant stimulus data in which the PF goesfrom 0 to 1 I question whether it can be applied with thesame confidence to 2AFC and to adaptive procedures withunequal numbers of trials at each level The reason for thislimitation is that the SK method does not offer a simpleway to weight the different samples Let us consider twosituations with unequal weighting adaptive methods and2AFC detection tasks

Adaptive methods place trials near threshold but withvery uneven distribution of number of trials at differentlevels The SK method gives equal weighting to sparselypopulated levels that can be highly variable thus con-tributing to a greater variance of parameter estimates Letus illustrate this point for an extreme case with five levelss 5 [1 2 3 4 5] Suppose the adaptive method placed [11 1 98 1] trials at the five levels and the number correctwere [0 1 1 49 1] giving probabilities [0 1 1 5 1] andpdf 5 [1 0 5 5] The following PEST-like (Taylor ampCreelman 1967) rules could lead to this unusual dataMove down a level if the presently tested level has greaterthan P1 correct move up three levels if the presentlytested level has less than P2 correct P1 can be anynumber less than 33 and P2 can be any number greaterthan 67 The example data would be produced by start-ing at Level 5 and getting four in a row correct followedby an error followed by a wide set possible of alternat-ing responses for which the PEST rules would leave thefourth level as the level to be tested The SK estimate of the

center of the PF is given by Equation 38 to be micro1 5 200Since a pdf with a negative probability is distasteful tosome one can monotonize the PF according to the proce-dure of Miller and Ulrich (which does include the numberof trials at each level) The new probabilities are [0 5151 51 1] with pdf 5 [51 0 0 49] The new estimate ofthreshold is micro1 5 297 However given the data with 4998correct at Level 4 an estimate that includes a weightingaccording to the number of trials at each level would placethe centroid of the distribution very close to micro1 5 4 Thuswe see that because the SK procedure ignores the numberof trials at each level it produces erroneous estimates ofthe PF properties This example with shifting thresholdswas chosen to dramatize the problems of unequal weight-ing and the effect of monotonizing

A similar problem can occur even with the method ofconstant stimuli with equal number of trials at each levelbut with unequal binomial weighting as occurs in 2AFCdata The unequal weighting occurs because of the 50lower asymptote The binomial statistics error bar of dataat 55 correct is much larger than for data at 95 cor-rect Parametric procedures such as probit analysis giveless weighting to the lower data This is why the most ef-ficient placement of trials in 2AFC is above 90 correctas I have discussed in Section II Thus it is expected thatfor 2AFC the SK method would give threshold estimatesless precise than the estimates given by a parametricmethod that includes weighting

I originally thought that the SK method would fail on alltypes of 2AFC data However in the discussion followingEquation 7 I pointed out that for 2AFC discriminationtasks there is a way of extending the data to the 0ndash100range that produces weightings similar to those for a yesnotask Rather than use the standard Equation 2 to deal withthe 50 lower asymptote one should use the method dis-cussed in connection with Figure 3 in which one can usefor the ordinate the percent of correct responses in Inter-val 2 and use the signal strength in Interval 2 minus thatin Interval 1 for the abscissa That procedure produces aPF that goes from 0 to 100 thereby making it suit-able for SK analysis while also removing a possible bias

Now let us examine the SK method as applied to datafor which it is expected to work such as discriminationPFs that go from 0 to 100 These are tasks in which thelocation of the psychometric function is the PSE and theslope is the threshold

A potentially important claim by Miller and Ulrich(2001) is that the SK method performs better than probitanalysis This is a surprising claim since their data aregenerated by a probit PF one would expect probit analy-sis to do well especially since probit analysis does espe-cially well with discrimination PFs that go from 0 to 100To help clarify this issue Miller and I did a number ofsimulations We focused on the case of N 5 40 with 8 tri-als at each of 5 levels The cumulative normal PF (Equa-tion 14) was used with equally spaced z scores and with thelowest and highest probabilities being 1 and 99 Theprobabilities at the sample points are P(i) 5 1 1222

m1

2 21

2

11

2

=

= [ ] +

aring

aring

pdf( )( ) ( )

( ) ( )( ) ( )

is i s i

P i P is i s i

MEASURING THE PSYCHOMETRIC FUNCTION 1441

50 8778 and 99 corresponding to z scores ofz(i) 5 ndash2328 1164 0 1164 and 2328 Nonmonot-onicity is eliminated as Miller and Ulrich suggest (seethe Appendix for Matlab Code 4 for monotonizing) I didMonte Carlo simulations to obtain the standard deviationof the location of the PF micro1 given by Equation38 I foundthat both the SK method and the parametric probit method(assuming a fixed unity slope) gave a standard deviationof between 028 and 029 Miller and Ulrichrsquos result re-ported in their Table 17 gives a standard error of 0071However their Gaussian had a standard deviation of 025whereas my z-score units correspond to a standard devi-ation of unity Thus their standard error must be multipliedby 4 giving a standard error of 0284 in excellent agree-ment with my value So at least in this instance the SKmethod did not do better than the parametric method (withslope fixed) and the surprise mentioned at the beginningof this paragraph did not materialize I did not examineother examples where nonparametric methods might dobetter than the matched parametric method However thesimulations of McKee et al (1985) give some insight intowhy probit analysis with floating slope can do poorly evenon data generated from probit PFs McKee et al showthat when the number of trials is low and if stimulus lev-els near both 0 and 100 are not tested the slope can bevery flat and the predicted threshold can be quite distantIf care is not taken to place a bound on these probit esti-mates one would expect to find deviant threshold esti-mates The SK method on the other hand has built-inbounds for threshold estimates so outliers are avoidedThese bounds could explain why SK can do better thanprobit When the PF slope is uncertain nonparametricmethods might work better than parametric methods adecided advantage for the SK method

It is instructive to also do an analytic calculation of thevariance based on an analysis similar to what was doneearlier If only the ith level of the five had been testedthe variance of the PF location (the PSE in a discrimina-tion task) would have been as follows (Gourevitch ampGalanter 1967)

(39)

where var(P) is given by Equation 27 and PF slope is

(40)

where for the cumulative normal dzi dxi 5 1s where sis the standard deviation of the pdf and

(41)

from Equation 3A By combining these equations one gets

(42)

Note that if all trials were placed at the optimal stimuluslocation of z 5 0 (P 5 5) Equation 42 would becomes2p 2Ni as discussed earlier When all five levels arecombined the total variance is smaller than each individ-ual variance as given by the variance summation formula

(43)

Finally upon plugging in the five values of zi rangingfrom ndash2328 to 12328 and Pi ranging from 1 to 99the standard error of the PF location is

(44)

This value is precisely the value obtained in our MonteCarlo simulations of the nonparametric SK method andalso by the slope fixed parametric probit analysis Thistriple confirmation shows that the SK method is excel-lent in the realm where it applies (equal number of trialsat levels that span 0 to 100) It also confirms our sim-ple analytic formula

The data presented in Miller and Ulrichrsquos Table 17 giveus a fine opportunity to ask about the efficiency of themethod of constant stimuli that they use For the N 5 40example discussed here the variance is

(45)

This value is more than twice the variance of p (2 Ntot)that is obtained for the same PF using adaptive methodsincluding the simple upndashdown staircase that place trialsnear the optimal point

The large variance for the Miller and Ulrich stimulusplacement is not surprising given that of the five stimu-lus levels the two at P 5 1 and 99 are quite inefficientThe inefficient placement of trials is a common problemfor the method of constant stimuli as opposed to adaptivemethods However as I mentioned earlier the method ofconstant stimuli combined with multiple response ratingsin a signal detection framework may be the best of allmethods for detection studies for revealing properties of thesystem near threshold while maintaining a high efficiency

A curious question comes up about the optimal placementof trials for measuring the location of the psychometric func-tion With the original function the optimal placement is atthe steepest part of the function (at P 5 0) However withthe SK method when one is determining the location ofthe pdf one might think that the optimal placement of sam-ples should occur at the points where the pdf is steepestThis contradiction is resolved when one realizes that onlythe original samples are independent The pdf samples areobtained by taking differences of adjacent PF values soneighboring samples are anticorrelated Thus one cannot

var tottot

= =SEN

2 3 24

SE = =var tot12 0 2844

var var tot = aring1 1i

var( )exp

PSE i i i

i

i

P Pz

N= ( ) ( )

2 12

2

ps

dP

dz

zi

i

iaeligegraveccedil

oumloslashdivide

=( )2 2

2

exp

p

dP

dx

dP

dz

dz

dxi

i

i

i

i

i

= times

var( )var

PSE i

i

i

i

P

dP

dx

=( )

aeligegraveccedil

oumloslashdivide

2

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

MEASURING THE PSYCHOMETRIC FUNCTION 1435

and yesno adaptive methods with simulations and ac-tual experiments His abscissa is number of presenta-tions and his results are compatible with our findings

The objective yesno method that I have been discussingis inefficient since 50 of the trials are blanks Greaterefficiency is possible if several points on the PF are mea-sured (method of constant stimuli) with the number of blanktrials the same as at the other levels For example if fivelevels of stimuli were measured including blanks then theblanks would be only 20 of the total rather than 50 Forthe 2AFC paradigm the subject would still have to look at50 of the presentations being blanks The yesno eff i-ciency can be further improved by having the subjectrespond to the multiple stimuli with multiple ratings ratherthan a binary yesno response This rating scale methodof constant stimuli is my favorite method and I have beenusing it for 20 years (Klein amp Stromeyer 1980) The mul-tiple ratings allow one to measure the psychometric func-tion shape for dcents as large as 4 (well into the dipper regionwhere stimuli are slightly above threshold) It also allowsone to measure the ROC slope (presence of multiplicativenoise) Simulations of its benefits will be presented inKlein (2002)

Insight into how the number of responses affects thresh-old variance can be obtained by using the cumulative nor-mal PF (Equation 17) considered by Kaernbach (2001b)and Miller and Ulrich (2001) with g 5 0 Typically this isthe case of yesno discrimination but it can also be 2AFCdiscrimination with an ordinate going from 0 to 100as I have discussed in connection with Figure 3 For a cu-mulative normal PF with a binary response and for a sin-gle stimulus level being tested the threshold variance is(Equation 19) as follows

(30)

from Equation 27 and forthcoming Equations 40 and 41The optimal level to test is at P 5 05 so the optimal vari-ance becomes

(31)

If instead of a binary response the observer gives an ana-log response (many many rating categories) then the vari-ance is the familiar connection between standard deviationand standard error

(32)

With four response categories the variance is 119s2Nwhich is closer to the analog than to the binary case Ifstimuli with multiple strengths are intermixed (methodof constant stimuli) then the benefit of going from a bi-nary response to multiple responses is greater than that in-dicated in Equations 31ndash32 (Klein 2002)

A brief list of other problems with the 2AFC method ap-plied to detection comprises the following

1 2AFC requires the subject to memorize and thencompare the results of two subjective responses It is cog-nitively simpler to report each response as it arrives Sub-jects without much experience can easily make mistakes(lapses) simply becoming confused about the order of thestimuli Klein (1985) discusses memory load issues rele-vant to 2AFC

2 The 2AFC methodology makes various types ofanalysis and modeling more difficult because it requiresthat one average over all criteria The response variable in2AFC is the Interval 2 activation minus the Interval 1 ac-tivation (see the abscissa in Figure 2) This variable goesfrom minus infinity to plus infinity whereas the yesnovariable goes from zero to infinity It therefore seems moredifficult to model the effects of probability summation anduncertainty for 2AFC than for yesno

3 Models that relate psychophysical performance to un-derlying processes require information about how the noisevaries with signal strength (multiplicative noise) The rat-ing scale method of constant stimuli not only measures dcentas a function of contrast it also is able to provide an esti-mate of how the variance of the signal distribution (theROC slope) increases with contrast The 2AFC methodlacks that information

4 Both the yesno and 2AFC methods are supposed toeliminate the effects of bias Earlier I pointed out that in myrecent 2AFC discrimination experiments when the stim-ulus is near threshold I find I have a strong bias in favor ofresponding ldquoStimulus 2rdquo Until I learned to compensatefor this bias (not typically done by naiumlve subjects) my dcentwas underestimated As I have discussed it is possible tocompensate for this bias but that is rarely done

Improving the Brownian (Up ndashDown) StaircaseAdaptive methods with a goal of placing trials at an

optimal point on the PF are efficient Leek (2001) pro-vides a nice overview of these methods There are twogeneral categories of adaptive methods those based onmaximum (or mean) likelihood and those based on sim-pler staircase rules The present section is concernedwith the older staircase methods which I will callBrownian methods I used the name ldquoBrownianrdquo becauseof the upndashdown staircasersquos similarity to one-dimensionalBrownian motion in which the direction of each step israndom The familiar upndashdown staircase is Brownianbecause at each trial the decision about whether to go upor down is decided by a random factor governed by prob-ability P(x) I want to make this connection to Brownianmotion because I suspect that there are results in thephysics literature that are relevant to our psychophysicalstaircases I hope these connections get made

A Brownian staircase has a minimal memory of the runhistory it needs to remember the previous step (includingdirection) and the previous number of reversals or numberof trials (used for when to stop and for when to decreasestep size) A typical Brownian rule is to decrease signalstrength by one step (like 1 dB) after a correct responseand to increase it three steps after an incorrect response (a

var( thresh) = s 2

N

var( thresh) = timesp s2

2

N

var(var( )

( )exp thresh) =aeligegraveccedil

oumloslashdivide

= ( )P

N dPdy

P P zN2

22

2 1p s

1436 KLEIN

one downndashthree up rule) I was pleased to learn recentlythat his rule works nicely for an objective yesno task(Kaernbach 1990) as well as 2AFC as discussed earlier

In recent years likelihood methods (Pentland 1980Watson amp Pelli 1983) have become popular and havesomewhat displaced Brownian methods There is a wide-spread feeling that likelihood methods have substantiallysmaller threshold variance than do staircases in whichthresholds are obtained by simply averaging the levelstested Given the importance of this issue for informingresearchers about how best to measure thresholds it issurprising that there have been so few simulations com-paring staircases and optimal methods Pentlandrsquos (1980)results showing very large threshold variance for a stair-case method are so dramatically different from the re-sults of my simulations that I am skeptical The best pre-vious simulations of which I am aware are those of Green(1990) who compared the threshold variance from sev-eral 2AFC staircase methods with the ideal variance(Equation 24) He found that as long as the staircase ruleplaced trials above 80 correct the ratio of observed toideal threshold variance was about 14 This value is thesquare of the inverse of Greenrsquos finding that the ratio ofthe expected to the observed sigma is about 85

In my own simulations (Klein 2002) I found that aslong as the final step size was less than 02s the thresh-old variance from simply averaging over levels was closeto the optimal value Thus the staircase threshold varianceis about the same as the variance found by other optimalmethods One especially interesting finding was that thecommon method of averaging an even number of rever-sal points gave a variance that was about 10 higher thanaveraging over all levels This result made me wonderhow the common practice of averaging reversals ever gotstarted Since Green (1990) averaged over just the reversalpoints I suspect that his results are an overestimate of thestaircase variance In addition since Green specifies stepsize in decibels it is difficult for me to determine whetherhis final step size was small enough Since the thresholdvariance is inversely related to slope it is important to re-late step size to slope rather than to decibels The most nat-ural reference unit is s the standard deviation associatedwith the cumulative normal In summary I suggest that thisgeneral feeling of distrust of the simple Brownian staircaseis misguided and that simple staircases are often as goodas likelihood methods In the section Summary of Advan-tages of Brownian Staircase I point out several advantagesof the simple staircase over the likelihood methods

A qualification should be made to the claim above thataveraging levels of a simple staircase is as good as a fancylikelihood method If the staircase method uses a too smallstep size at the beginning and if the starting point is farfrom threshold then the simple staircase can be inefficientin reaching the threshold region However at the end ofthat run the astute experimenter should notice the largediscrepancy between the starting point and the thresholdand the run should be redone with an improved startingpoint or with a larger initial step size that will be reducedas the staircase proceeds

Increase number of 2AFC response categories In-dependent of the type of adaptive method used the 2AFCtask has a flaw whereby a series of lucky guesses at lowstimulus levels can erroneously drive adaptive methods tolevels that are too low Some adaptive methods with a smallnumber of trials are unable to fully recover from a string oflucky guesses early in the run One solution is to allow thesubject to have more than two responses For reasons thatI have never understood people like to have only two re-sponse categories in a 2AFC task Just as I am reluctant totake part in 2AFC experiments I am even more reluctantto take part in two response experiments The reason issimple I always feel that I have within me more than onebit of information after looking at a stimulus I usually haveat least four or more subjective states (2 bits) for a yesnotask HN LN LY HY where H and L are high and lowconfidence and N and Y are did not and did see it For a2AFC my internal states are H1 L1 L2 H2 for high andlow confidence on Intervals 1 and 2 Kaernbach (2001a) inthis special issue does an excellent job of justifying thelow-confidence response category He provides solid ar-guments simulations and experimental data on how alow-confidence category can minimize the ldquolucky guessrdquoproblem

Kaernbach (2001a) groups the L1 and L2 categoriestogether into a ldquodonrsquot knowrdquo category and argues per-suasively that the inclusion of this third response can in-crease the precision and accuracy of threshold estimateswhile also leaving the subject a happier person He callsit the ldquounforced choicerdquo method Most of his article con-cerns the understandable fear of introducing a responsebias exactly what 2AFC was invented to avoid He has de-veloped an intelligent rule for what stimulus level shouldfollow a ldquodonrsquot knowrdquo response Kaernbach shows withboth simulations and real data that with his method thispotential response bias problem has only a second-ordereffect on threshold estimates (similar to the 2AFC inter-val bias discussed around Equation 7) and its advantagesoutweigh the disadvantages I urge anyone using the 2AFCmethod to read it I conjecture that Kaernbachrsquos method isespecially useful in trimming the lower tail of the distri-bution of threshold estimates and also in reducing the in-terval bias of 2AFC tasks since the low confidence ratingwould tend to be used for trials on which there is the mostuncertainty

Although Kaernbach has managed to convince me ofthe usefulness of his three-response method he might havetrouble convincing others because of the response biasquestion I should like to offer a solution that might quellthe response bias fears Rather than using 2AFC with threeresponses one can increase the number of responses tofourmdashH1 L1 L2 H2mdashas discussed two paragraphsabove The L rating can be used when subjects feel theyare guessing

To illustrate how this staircase would work considerthe case in which we would like the 2AFC staircase to con-verge to about 84 correct A 5 upndash1 down staircase(5 levels up for a wrong response and 1 level down for acorrect response) leads to an average probability correct

MEASURING THE PSYCHOMETRIC FUNCTION 1437

of 56 5 8333 If one had zero information about thetrial and guessed then one would be correct half the timeand the staircase would shift an average of (5 1)2 512 levels Thus in his scheme with a single ldquodonrsquot knowrdquoresponse Kaernbach would have the staircase move up2 levels following a ldquodonrsquot knowrdquo response One mightworry that continued use of the ldquodonrsquot knowrdquo responsewould continuously increase the level so that one wouldget an upward bias But Kaernbach points out that theldquodonrsquot knowrdquo response replaces incorrect responses to agreater extent than it replaces correct responses so thatthe 12 level shift is actually a more negative shift than the15 shift associated with a wrong response For my sug-gestion of four responses the step shifts would be ndash1 1113 and 15 levels for responses HC LC LI and HIwhere H and L stand for high and low confidence and Cand I stand for correct and incorrect A slightly more con-servative set of responses would be ndash1 0 14 15 inwhich case the low-confidence correct judgment leaves thelevel unchanged In all these examples including Kaern-bachrsquos three-response method the low-confidence re-sponse has an average level shift of 12 when one is guess-ing The most important feature of the low confidencerating is that when one is below threshold giving a lowconfidence rating will prevent one from long strings oflucky guesses that bring one to an inappropriately lowlevel This is the situation that can be difficult to overcomeif it occurs in the early phase of a staircase when the stepsize can be large The use of the confidence rating wouldbe especially important for short runs where the extra bitof information from the subject is useful for minimizingerroneous estimates of threshold

Another reason not to be fearful of the multiple-response2AFC method is that after the run is finished one can es-timate threshold by using a maximum likelihood proce-dure rather than by averaging the run locations Standardlikelihood analysis would have to be modified for Kaern-bachrsquos three-response method in order to deal with theldquodonrsquot knowrdquo category For the suggested four-responsemethod one could simply ignore the confidence ratings forthe likelihood analysis My simulations discussed earlierin this section and Greenrsquos (1990) reveal that averagingover the levels (excluding a number of initial levels toavoid the starting point bias) has about as good a preci-sion and accuracy as the best of the likelihood methodsIf the threshold estimates from the likelihood method andfrom the averaging levels method disagree that run couldbe thrown out

Summary of advantages of Brownian staircaseSince there is no substantial variance advantage to thefancier likelihood methods one can pay more attention toa number of advantages associated with simple staircases

1 At the beginning of a run likelihood methods mayhave too little inertia (the sequential stimulus levels jumparound easily) unless a ldquostiff rdquo prior is carefully chosenBy inertia I refer to how difficult it is for the next level todeviate from the present level At the beginning of a runone would like to familiarize the subject with the stimu-

lus by presenting a series of stimuli that gradually increasein difficulty This natural feature of the Brownian stair-case is usually missing in QUEST-like methods Theslightly inefficient first few trials of a simple staircasemay actually be a benefit since it gives the subject somepractice with slightly visible stimuli

2 Toward the end of a maximum likelihood run one canhave the opposite problem of excessive inertia wherebyit is difficult to have substantial shifts of test contrastThis can be a problem if nonstationary effects are presentFor example thresholds can increase in mid-run owingto fatigue or to adaptation if a suprathreshold pedestal ormask is present In an old-fashioned Brownian staircasewith moderate inertia a quick look at the staircase trackshows the change of threshold in mid-run That observa-tion can provide an important clue to possible problemswith the current procedures A method that reveals non-stationarity is especially important for difficult subjectssuch as infants or patients in whom fatigue or inatten-tion can be an important factor

3 Several researchers have told me that the simplicity ofthe staircase rules is itself an advantage It makes writingprograms easier It is nicer for teaching students It alsosimplifies the uncertain business of guessing likelihoodpriors and communicating the methodology when writ-ing articles

Methods That Can Be Used to Estimate Slope as Well as Threshold

There are good reasons why one might want to learnmore about the PF than just the single threshold valueEstimating the PF slope is the first step in this directionTo estimate slope one must test the PF at more than onepoint Methods are available to do this (Kaernbach 2001bKontsevich amp Tyler 1999) and one could even adapt sim-ple Brownian staircase procedures to give better slope es-timates with minimum harm to threshold estimates (seethe comments in Strasburger 2001b)

Probably the best method for getting both thresholds andslopes is the Y (Psi) method of Kontsevich and Tyler(1999) The Y method is not totally novel it borrows tech-niques from many previous researchersmdashas Kontsevichand Tyler point out in a delightful paragraph that both clar-ifies their method and states its ancestry This paragraphsummarizes what is probably the most accurate and preciseapproach to estimating thresholds and slope so I reproduceit here

To summarize the Y method is a combination of solutionsknown from the literature The method updates the poste-rior probability distribution across the sampled space ofthe psychometric functions based on Bayesrsquo rule (Hall1968 Watson amp Pelli 1983) The space of the psychome-tric functions is two-dimensional (King-Smith amp Rose1997 Watson amp Pelli 1983) Evaluation of the psycho-metric function is based on computing the mean of theposterior probability distribution (Emerson 1986 King-Smith et al 1994) The termination rule is based on thenumber of trials as the most practical option (King-Smith

1438 KLEIN

et al 1994 Watson amp Pelli 1983) The placement of eachnew trial is based on one-step ahead minimum search(King-Smith 1984) of the expected entropy cost function(Pelli 1987)

A simplified rough description of the Y trial placementfor 2AFC runs is that initially trials are placed at about the90 correct level to establish a solid threshold Oncethreshold has been roughly established trials are placed atabout the 90 and the 70 levels to pin down the slopeThe precise levels depend on the assumed PF shape

Before one starts measuring slopes however one mustfirst be convinced that knowledge of slope can be impor-tant It is so for several reasons

1 Parametric fitting procedures either require knowl-edge of slope b or must have data that adequately con-strain the slope estimate Part of the discussion that fol-lows will be on how to obtain reliable threshold estimateswhen slope is not well known A tight knowledge of slopecan minimize the error of the threshold estimate

2 Strasburger (2001a 2001b) gives good argumentsfor the desirability of knowing the shape of the PF He isinterested in differences between foveal and peripheralvisual processing and he uses the PF slope as an indica-tion of differences

3 Slopes are different for different stimuli and thiscan lead to misleading results if slope is ignored It mayhelp to describe my experience with the Modelfest pro-ject (Carney et al 2000) to make this issue concreteThis continuing project involves a dozen laboratoriesaround the world We have measured detection thresh-olds for a wide variety of visual stimuli The data areused for developing an improved model of spatial visionSome of the stimuli were simple easily memorized lowspatial frequency patterns that tend to have shallowerslopes because a stimulus-known-exactly template canbe used efficiently and with minimal uncertainty On theother hand tiny Modelfest stimuli can have a good dealof spatial uncertainty and are therefore expected to pro-duce PFs with steeper slopes Because of the slope changethe pattern of thresholds will depend on the level atwhich threshold is measured When this rich dataset isanalyzed with f ilter models that make estimates formechanism bandwidths and mechanism pooling thesedifferent slope-dependent threshold patterns will lead todifferent estimates for the model parameters Several ofus in the Modelfest data gathering group argued that weshould use a data-gathering methodology that wouldallow estimation of PF slope of each of the 45 stimuliHowever I must admit that when I was a subject my pa-tience wore thin and I too chose the quicker runs Al-though I fully understand the attraction of short runs fo-cused on thresholds and ignoring slope I still think thereare good reasons for spreading trials out For one thinginterspersing trials that are slightly visible helps the sub-ject maintain memory of the test pattern

4 Slopes can directly resolve differences between mod-els In my own research for example my colleagues and

I are able to use the PF slope to distinguish long-range fa-cilitation that produces a low-level excitation or noise re-duction (typical with a cross-orientation surround) from ahigher level uncertainty reduction (typical with a same-orientation surround)

Throwing-Out Rules Goodness-of-FitThere are occasions when a staircase goes sour Most

often this happens when the subjectrsquos sensitivity changesin the middle of a run possibly because of fatigue It istherefore good to have a well-specified rule that allowsnonstationary runs to be thrown out The throwing-out rulecould be based on whether the PF slope estimate is toolow or too high Alternatively a standard approach is tolook at the chi-square goodness-of-f it statistic whichwould involve an assumption about the shape of the un-derlying PF I have a commentary in Section III regard-ing biases in the chi-square metric that were found byWichmann and Hill (2001a) Biases make it difficult touse standard chi-square tables The trouble with this ap-proach of basing the goodness-of-fit just on the cumulateddata is that it ignores the sequential history of the run TheBrownian upndashdown staircase makes the sequential historyvividly visible with a plot of sequentially tested levelsWhen fatigue or adaptation sets in one sees the tested lev-els shift from one stable point to another Leek Hanna andMarshall (1991) developed a method of assessing the non-stationarity of a psychometric function over the course ofits measurement using two interleaved tracks EarlierHall (1983) also suggested a technique to address the sameproblem I encourage further research into the develop-ment of a non-stationarity statistic that can be used to dis-card bad runs This statistic would take the sequential his-tory into account as well as the PF shape and the chi-square

The development of a ldquothrowing-outrdquo rule is the ex-treme case of the notion of weighted averaging One im-portant reason for developing good estimators for good-ness-of-fit and good estimators for threshold variance isto be able to optimally combine thresholds from repeatedexperiments I shall return to this issue in connectionwith the Wichmann and Hill (2001a) analysis of bias ingoodness-of-fit metrics

Final Thought The Method Should Depend on the Situation

Also important is that the method chosen should be ap-propriate for the task For example suppose one is usingwell-experienced observers for testing and retesting sim-ilar conditions in which one is looking for small changesin threshold or one is interested in the PF shape In suchcases one has a good estimate of the expected thresholdand the signal detection method of constant stimuli withblanks and with ratings might be best On the other handfor clinical testing in which one is highly uncertain aboutan individualrsquos threshold and criterion stability a 2AFClikelihood or staircase method (with a confidence rating)might be best

MEASURING THE PSYCHOMETRIC FUNCTION 1439

III DATA-FITTING METHODS FOR ESTIMATING THRESHOLD AND SLOPEPLUS GOODNESS-OF-FIT ESTIMATION

This final section of the paper is concerned with whatto do with the data after they have been collected Wehave here the standard Bayesian situation Given the PFit is easy to generate samples of data with confidence in-tervals But we have the inverse situation in which we aregiven the data and need to estimate properties of the PFand get confidence limits for those estimates The pair ofarticles in this special issue by Wichmann and Hill (2001a2001b) as well as the articles by King-Smith et al (1994)Treutwein (1995) and Treutwein and Strasburger (1999)do an outstanding job of introducing the issues and providemany important insights For example Emerson (1986)and King-Smith et al emphasize the advantages of meanlikelihood over maximum likelihood The speed of mod-ern computers makes it just as easy to calculate meanlikelihood as it is to calculate maximum likelihood As an-other example Treutwein and Strasburger (1999) demon-strate the power of using Bayesian priors for estimatingparameters by showing how one can estimate all four pa-rameters of Equation 1A in fewer than 100 trials This im-portant finding deserves further examination

In this section I will focus on four items First I will dis-cuss the relative merits of parametric versus nonparamet-ric methods for estimating the PF properties The paperby Miller and Ulrich (2001) provides a detailed analysis ofa nonparametric method for obtaining all the moments ofthe PF I will discuss some advantages and limitations oftheir method Second is the question of goodness-of-fitWichmann and Hill (2001a) show that when the numberof trials is small the goodness-of-fit can be very biasedI will examine the cause of that bias Third is the issue dis-cussed by Wichmann and Hill (2001a) that of how lapseseffect slope estimates This topic is relevant to Strasburg-errsquos (2001b) data and analysis Finally there is the possiblycontroversial topic of an upward slope bias in adaptivemethods that is relevant to the papers of Kaernbach (2001b)and Strasburger (2001b)

Parametric Versus Nonparametric FitsThe main split between methods for estimating pa-

rameters of the psychometric function is the distinctionbetween parametric and nonparametric methods In aparametric method one uses a PF that involves severalparameters and uses the data to estimate the parametersFor example one might know the shape of the PF and es-timate just the threshold parameter McKee et al (1985)show how slope uncertainty affects threshold uncertaintyin probit analysis The two papers by Wichmann and Hill(2001a 2001b) are an excellent introduction to many is-sues in parametric fitting An example of a nonparamet-ric method for estimating threshold would be to averagethe levels of a Brownian staircase after excluding the firstfour or so reversals in order to avoid the starting level biasBefore I served as guest editor of this special symposium

issue I believed that if one had prior knowledge of the exactPF shape a parametric method for estimating threshold wasprobably better than nonparametric methods After read-ing the articles presented here as well as carrying out myrecent simulations I no longer believe this I think thatin the proper situation nonparametric methods are as goodas parametric methods Furthermore there are situationsin which the shape of the PF is unknown and nonparamet-ric methods can be better

Miller and Ulrichrsquos Article on the Nonparametric SpearmanndashKaumlrber Method of Analysis

The standard way to extract information from psycho-metric data is to assume a functional shape such as a Wei-bull cumulative normal logit dcent and then vary the pa-rameters until likelihood is maximized or chi-square isminimized Miller and Ulrich (2001) propose using theSpearmanndashKaumlrber (SK) nonparametric method whichdoes not require a functional shape They start with a prob-ability density function defined by them as the derivativeof the PF

(33)

where s(i) is the stimulus intensity at the ith level andP(i) is the probability correct at the that level The notationpdf(i) is meant to indicate that it is the pdf for the intervalbetween i ndash 1 and i Miller and Ulrich give the r th momentof this distribution as their Equation 3

(34)

The next four equations are my attempt to clarify their ap-proach I do this because when applicable it is a finemethod that deserves more attention Equation 34 prob-ably came from the mean value theorem of calculus

(35)

where f (s) 5 sr11 and f cent(smid) 5 (r 1 1)smidr is the deriv-

ative of f with respect to s somewhere within the intervalbetween s(i 1) and s(i) For a slowly varying functionsmid would be near the midpoint Given Equation35 Equa-tion 34 becomes

(36)

where Ds 5 s(i) s(i 1) Equation 36 is what one wouldmean by the rth moment of the distribution The case r 50 is important since it gives the area under the pdf

(37)

by using the definition in Equation 33 The summation inEquation 37 gives the area under the pdf For it to be aproper pdf its area must be unity which means that theprobability at the highest level P(imax) must be unity andthe probability at the lowest level Pmin must be zero asMiller and Ulrich (2001) point out

m0 = =aringpdf( ) ( ) ( ) max mini s P i P ii

D

mrr

i

i s s= aring pdf mid( ) D

f s i f s i f s s i s i( ) ( ) ( ) ( ) ( ) [ ] [ ] = cent [ ]1 1mid

mrr r

ri s i s i=

+ [ ]aring + +11

11 1pdf( ) ( ) ( )

pdf( )( ) ( )

( ) ( )i

P i P i

s i s i= 1

1

1440 KLEIN

The next simplest SK moment is for r 5 1

(38)

from Equation 33 This first moment provides an esti-mate of the centroid of the distribution corresponding tothe center of the psychometric function It could be usedas an estimate of threshold for detection or PSE for dis-crimination An alternate estimator would be the medianof the pdf corresponding to the 50 point of the PF (theproper definition of PSE for discrimination)

Miller and Ulrich (2001) use Monte Carlo simulationsto claim that the SK method does a better job of extract-ing properties of the psychometric function than do para-metric methods One must be a bit careful here since intheir parametric fitting they generate the data by usingone PF and then fit it with other PFs But still it is a sur-prising claim if this simple method is so good why havepeople not been using it for years There are two possibleanswers to this question One is that people never thoughtof using it before Miller and Ulrich The other possibil-ity is that it has problems I have concluded that the an-swer is a combination of both factors I think that there isan important place for the SK approach but its limitationsmust be understood That is my next topic

The most important limitation of the SK nonparamet-ric method is that it has only been shown to work well formethod of constant stimulus data in which the PF goesfrom 0 to 1 I question whether it can be applied with thesame confidence to 2AFC and to adaptive procedures withunequal numbers of trials at each level The reason for thislimitation is that the SK method does not offer a simpleway to weight the different samples Let us consider twosituations with unequal weighting adaptive methods and2AFC detection tasks

Adaptive methods place trials near threshold but withvery uneven distribution of number of trials at differentlevels The SK method gives equal weighting to sparselypopulated levels that can be highly variable thus con-tributing to a greater variance of parameter estimates Letus illustrate this point for an extreme case with five levelss 5 [1 2 3 4 5] Suppose the adaptive method placed [11 1 98 1] trials at the five levels and the number correctwere [0 1 1 49 1] giving probabilities [0 1 1 5 1] andpdf 5 [1 0 5 5] The following PEST-like (Taylor ampCreelman 1967) rules could lead to this unusual dataMove down a level if the presently tested level has greaterthan P1 correct move up three levels if the presentlytested level has less than P2 correct P1 can be anynumber less than 33 and P2 can be any number greaterthan 67 The example data would be produced by start-ing at Level 5 and getting four in a row correct followedby an error followed by a wide set possible of alternat-ing responses for which the PEST rules would leave thefourth level as the level to be tested The SK estimate of the

center of the PF is given by Equation 38 to be micro1 5 200Since a pdf with a negative probability is distasteful tosome one can monotonize the PF according to the proce-dure of Miller and Ulrich (which does include the numberof trials at each level) The new probabilities are [0 5151 51 1] with pdf 5 [51 0 0 49] The new estimate ofthreshold is micro1 5 297 However given the data with 4998correct at Level 4 an estimate that includes a weightingaccording to the number of trials at each level would placethe centroid of the distribution very close to micro1 5 4 Thuswe see that because the SK procedure ignores the numberof trials at each level it produces erroneous estimates ofthe PF properties This example with shifting thresholdswas chosen to dramatize the problems of unequal weight-ing and the effect of monotonizing

A similar problem can occur even with the method ofconstant stimuli with equal number of trials at each levelbut with unequal binomial weighting as occurs in 2AFCdata The unequal weighting occurs because of the 50lower asymptote The binomial statistics error bar of dataat 55 correct is much larger than for data at 95 cor-rect Parametric procedures such as probit analysis giveless weighting to the lower data This is why the most ef-ficient placement of trials in 2AFC is above 90 correctas I have discussed in Section II Thus it is expected thatfor 2AFC the SK method would give threshold estimatesless precise than the estimates given by a parametricmethod that includes weighting

I originally thought that the SK method would fail on alltypes of 2AFC data However in the discussion followingEquation 7 I pointed out that for 2AFC discriminationtasks there is a way of extending the data to the 0ndash100range that produces weightings similar to those for a yesnotask Rather than use the standard Equation 2 to deal withthe 50 lower asymptote one should use the method dis-cussed in connection with Figure 3 in which one can usefor the ordinate the percent of correct responses in Inter-val 2 and use the signal strength in Interval 2 minus thatin Interval 1 for the abscissa That procedure produces aPF that goes from 0 to 100 thereby making it suit-able for SK analysis while also removing a possible bias

Now let us examine the SK method as applied to datafor which it is expected to work such as discriminationPFs that go from 0 to 100 These are tasks in which thelocation of the psychometric function is the PSE and theslope is the threshold

A potentially important claim by Miller and Ulrich(2001) is that the SK method performs better than probitanalysis This is a surprising claim since their data aregenerated by a probit PF one would expect probit analy-sis to do well especially since probit analysis does espe-cially well with discrimination PFs that go from 0 to 100To help clarify this issue Miller and I did a number ofsimulations We focused on the case of N 5 40 with 8 tri-als at each of 5 levels The cumulative normal PF (Equa-tion 14) was used with equally spaced z scores and with thelowest and highest probabilities being 1 and 99 Theprobabilities at the sample points are P(i) 5 1 1222

m1

2 21

2

11

2

=

= [ ] +

aring

aring

pdf( )( ) ( )

( ) ( )( ) ( )

is i s i

P i P is i s i

MEASURING THE PSYCHOMETRIC FUNCTION 1441

50 8778 and 99 corresponding to z scores ofz(i) 5 ndash2328 1164 0 1164 and 2328 Nonmonot-onicity is eliminated as Miller and Ulrich suggest (seethe Appendix for Matlab Code 4 for monotonizing) I didMonte Carlo simulations to obtain the standard deviationof the location of the PF micro1 given by Equation38 I foundthat both the SK method and the parametric probit method(assuming a fixed unity slope) gave a standard deviationof between 028 and 029 Miller and Ulrichrsquos result re-ported in their Table 17 gives a standard error of 0071However their Gaussian had a standard deviation of 025whereas my z-score units correspond to a standard devi-ation of unity Thus their standard error must be multipliedby 4 giving a standard error of 0284 in excellent agree-ment with my value So at least in this instance the SKmethod did not do better than the parametric method (withslope fixed) and the surprise mentioned at the beginningof this paragraph did not materialize I did not examineother examples where nonparametric methods might dobetter than the matched parametric method However thesimulations of McKee et al (1985) give some insight intowhy probit analysis with floating slope can do poorly evenon data generated from probit PFs McKee et al showthat when the number of trials is low and if stimulus lev-els near both 0 and 100 are not tested the slope can bevery flat and the predicted threshold can be quite distantIf care is not taken to place a bound on these probit esti-mates one would expect to find deviant threshold esti-mates The SK method on the other hand has built-inbounds for threshold estimates so outliers are avoidedThese bounds could explain why SK can do better thanprobit When the PF slope is uncertain nonparametricmethods might work better than parametric methods adecided advantage for the SK method

It is instructive to also do an analytic calculation of thevariance based on an analysis similar to what was doneearlier If only the ith level of the five had been testedthe variance of the PF location (the PSE in a discrimina-tion task) would have been as follows (Gourevitch ampGalanter 1967)

(39)

where var(P) is given by Equation 27 and PF slope is

(40)

where for the cumulative normal dzi dxi 5 1s where sis the standard deviation of the pdf and

(41)

from Equation 3A By combining these equations one gets

(42)

Note that if all trials were placed at the optimal stimuluslocation of z 5 0 (P 5 5) Equation 42 would becomes2p 2Ni as discussed earlier When all five levels arecombined the total variance is smaller than each individ-ual variance as given by the variance summation formula

(43)

Finally upon plugging in the five values of zi rangingfrom ndash2328 to 12328 and Pi ranging from 1 to 99the standard error of the PF location is

(44)

This value is precisely the value obtained in our MonteCarlo simulations of the nonparametric SK method andalso by the slope fixed parametric probit analysis Thistriple confirmation shows that the SK method is excel-lent in the realm where it applies (equal number of trialsat levels that span 0 to 100) It also confirms our sim-ple analytic formula

The data presented in Miller and Ulrichrsquos Table 17 giveus a fine opportunity to ask about the efficiency of themethod of constant stimuli that they use For the N 5 40example discussed here the variance is

(45)

This value is more than twice the variance of p (2 Ntot)that is obtained for the same PF using adaptive methodsincluding the simple upndashdown staircase that place trialsnear the optimal point

The large variance for the Miller and Ulrich stimulusplacement is not surprising given that of the five stimu-lus levels the two at P 5 1 and 99 are quite inefficientThe inefficient placement of trials is a common problemfor the method of constant stimuli as opposed to adaptivemethods However as I mentioned earlier the method ofconstant stimuli combined with multiple response ratingsin a signal detection framework may be the best of allmethods for detection studies for revealing properties of thesystem near threshold while maintaining a high efficiency

A curious question comes up about the optimal placementof trials for measuring the location of the psychometric func-tion With the original function the optimal placement is atthe steepest part of the function (at P 5 0) However withthe SK method when one is determining the location ofthe pdf one might think that the optimal placement of sam-ples should occur at the points where the pdf is steepestThis contradiction is resolved when one realizes that onlythe original samples are independent The pdf samples areobtained by taking differences of adjacent PF values soneighboring samples are anticorrelated Thus one cannot

var tottot

= =SEN

2 3 24

SE = =var tot12 0 2844

var var tot = aring1 1i

var( )exp

PSE i i i

i

i

P Pz

N= ( ) ( )

2 12

2

ps

dP

dz

zi

i

iaeligegraveccedil

oumloslashdivide

=( )2 2

2

exp

p

dP

dx

dP

dz

dz

dxi

i

i

i

i

i

= times

var( )var

PSE i

i

i

i

P

dP

dx

=( )

aeligegraveccedil

oumloslashdivide

2

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

1436 KLEIN

one downndashthree up rule) I was pleased to learn recentlythat his rule works nicely for an objective yesno task(Kaernbach 1990) as well as 2AFC as discussed earlier

In recent years likelihood methods (Pentland 1980Watson amp Pelli 1983) have become popular and havesomewhat displaced Brownian methods There is a wide-spread feeling that likelihood methods have substantiallysmaller threshold variance than do staircases in whichthresholds are obtained by simply averaging the levelstested Given the importance of this issue for informingresearchers about how best to measure thresholds it issurprising that there have been so few simulations com-paring staircases and optimal methods Pentlandrsquos (1980)results showing very large threshold variance for a stair-case method are so dramatically different from the re-sults of my simulations that I am skeptical The best pre-vious simulations of which I am aware are those of Green(1990) who compared the threshold variance from sev-eral 2AFC staircase methods with the ideal variance(Equation 24) He found that as long as the staircase ruleplaced trials above 80 correct the ratio of observed toideal threshold variance was about 14 This value is thesquare of the inverse of Greenrsquos finding that the ratio ofthe expected to the observed sigma is about 85

In my own simulations (Klein 2002) I found that aslong as the final step size was less than 02s the thresh-old variance from simply averaging over levels was closeto the optimal value Thus the staircase threshold varianceis about the same as the variance found by other optimalmethods One especially interesting finding was that thecommon method of averaging an even number of rever-sal points gave a variance that was about 10 higher thanaveraging over all levels This result made me wonderhow the common practice of averaging reversals ever gotstarted Since Green (1990) averaged over just the reversalpoints I suspect that his results are an overestimate of thestaircase variance In addition since Green specifies stepsize in decibels it is difficult for me to determine whetherhis final step size was small enough Since the thresholdvariance is inversely related to slope it is important to re-late step size to slope rather than to decibels The most nat-ural reference unit is s the standard deviation associatedwith the cumulative normal In summary I suggest that thisgeneral feeling of distrust of the simple Brownian staircaseis misguided and that simple staircases are often as goodas likelihood methods In the section Summary of Advan-tages of Brownian Staircase I point out several advantagesof the simple staircase over the likelihood methods

A qualification should be made to the claim above thataveraging levels of a simple staircase is as good as a fancylikelihood method If the staircase method uses a too smallstep size at the beginning and if the starting point is farfrom threshold then the simple staircase can be inefficientin reaching the threshold region However at the end ofthat run the astute experimenter should notice the largediscrepancy between the starting point and the thresholdand the run should be redone with an improved startingpoint or with a larger initial step size that will be reducedas the staircase proceeds

Increase number of 2AFC response categories In-dependent of the type of adaptive method used the 2AFCtask has a flaw whereby a series of lucky guesses at lowstimulus levels can erroneously drive adaptive methods tolevels that are too low Some adaptive methods with a smallnumber of trials are unable to fully recover from a string oflucky guesses early in the run One solution is to allow thesubject to have more than two responses For reasons thatI have never understood people like to have only two re-sponse categories in a 2AFC task Just as I am reluctant totake part in 2AFC experiments I am even more reluctantto take part in two response experiments The reason issimple I always feel that I have within me more than onebit of information after looking at a stimulus I usually haveat least four or more subjective states (2 bits) for a yesnotask HN LN LY HY where H and L are high and lowconfidence and N and Y are did not and did see it For a2AFC my internal states are H1 L1 L2 H2 for high andlow confidence on Intervals 1 and 2 Kaernbach (2001a) inthis special issue does an excellent job of justifying thelow-confidence response category He provides solid ar-guments simulations and experimental data on how alow-confidence category can minimize the ldquolucky guessrdquoproblem

Kaernbach (2001a) groups the L1 and L2 categoriestogether into a ldquodonrsquot knowrdquo category and argues per-suasively that the inclusion of this third response can in-crease the precision and accuracy of threshold estimateswhile also leaving the subject a happier person He callsit the ldquounforced choicerdquo method Most of his article con-cerns the understandable fear of introducing a responsebias exactly what 2AFC was invented to avoid He has de-veloped an intelligent rule for what stimulus level shouldfollow a ldquodonrsquot knowrdquo response Kaernbach shows withboth simulations and real data that with his method thispotential response bias problem has only a second-ordereffect on threshold estimates (similar to the 2AFC inter-val bias discussed around Equation 7) and its advantagesoutweigh the disadvantages I urge anyone using the 2AFCmethod to read it I conjecture that Kaernbachrsquos method isespecially useful in trimming the lower tail of the distri-bution of threshold estimates and also in reducing the in-terval bias of 2AFC tasks since the low confidence ratingwould tend to be used for trials on which there is the mostuncertainty

Although Kaernbach has managed to convince me ofthe usefulness of his three-response method he might havetrouble convincing others because of the response biasquestion I should like to offer a solution that might quellthe response bias fears Rather than using 2AFC with threeresponses one can increase the number of responses tofourmdashH1 L1 L2 H2mdashas discussed two paragraphsabove The L rating can be used when subjects feel theyare guessing

To illustrate how this staircase would work considerthe case in which we would like the 2AFC staircase to con-verge to about 84 correct A 5 upndash1 down staircase(5 levels up for a wrong response and 1 level down for acorrect response) leads to an average probability correct

MEASURING THE PSYCHOMETRIC FUNCTION 1437

of 56 5 8333 If one had zero information about thetrial and guessed then one would be correct half the timeand the staircase would shift an average of (5 1)2 512 levels Thus in his scheme with a single ldquodonrsquot knowrdquoresponse Kaernbach would have the staircase move up2 levels following a ldquodonrsquot knowrdquo response One mightworry that continued use of the ldquodonrsquot knowrdquo responsewould continuously increase the level so that one wouldget an upward bias But Kaernbach points out that theldquodonrsquot knowrdquo response replaces incorrect responses to agreater extent than it replaces correct responses so thatthe 12 level shift is actually a more negative shift than the15 shift associated with a wrong response For my sug-gestion of four responses the step shifts would be ndash1 1113 and 15 levels for responses HC LC LI and HIwhere H and L stand for high and low confidence and Cand I stand for correct and incorrect A slightly more con-servative set of responses would be ndash1 0 14 15 inwhich case the low-confidence correct judgment leaves thelevel unchanged In all these examples including Kaern-bachrsquos three-response method the low-confidence re-sponse has an average level shift of 12 when one is guess-ing The most important feature of the low confidencerating is that when one is below threshold giving a lowconfidence rating will prevent one from long strings oflucky guesses that bring one to an inappropriately lowlevel This is the situation that can be difficult to overcomeif it occurs in the early phase of a staircase when the stepsize can be large The use of the confidence rating wouldbe especially important for short runs where the extra bitof information from the subject is useful for minimizingerroneous estimates of threshold

Another reason not to be fearful of the multiple-response2AFC method is that after the run is finished one can es-timate threshold by using a maximum likelihood proce-dure rather than by averaging the run locations Standardlikelihood analysis would have to be modified for Kaern-bachrsquos three-response method in order to deal with theldquodonrsquot knowrdquo category For the suggested four-responsemethod one could simply ignore the confidence ratings forthe likelihood analysis My simulations discussed earlierin this section and Greenrsquos (1990) reveal that averagingover the levels (excluding a number of initial levels toavoid the starting point bias) has about as good a preci-sion and accuracy as the best of the likelihood methodsIf the threshold estimates from the likelihood method andfrom the averaging levels method disagree that run couldbe thrown out

Summary of advantages of Brownian staircaseSince there is no substantial variance advantage to thefancier likelihood methods one can pay more attention toa number of advantages associated with simple staircases

1 At the beginning of a run likelihood methods mayhave too little inertia (the sequential stimulus levels jumparound easily) unless a ldquostiff rdquo prior is carefully chosenBy inertia I refer to how difficult it is for the next level todeviate from the present level At the beginning of a runone would like to familiarize the subject with the stimu-

lus by presenting a series of stimuli that gradually increasein difficulty This natural feature of the Brownian stair-case is usually missing in QUEST-like methods Theslightly inefficient first few trials of a simple staircasemay actually be a benefit since it gives the subject somepractice with slightly visible stimuli

2 Toward the end of a maximum likelihood run one canhave the opposite problem of excessive inertia wherebyit is difficult to have substantial shifts of test contrastThis can be a problem if nonstationary effects are presentFor example thresholds can increase in mid-run owingto fatigue or to adaptation if a suprathreshold pedestal ormask is present In an old-fashioned Brownian staircasewith moderate inertia a quick look at the staircase trackshows the change of threshold in mid-run That observa-tion can provide an important clue to possible problemswith the current procedures A method that reveals non-stationarity is especially important for difficult subjectssuch as infants or patients in whom fatigue or inatten-tion can be an important factor

3 Several researchers have told me that the simplicity ofthe staircase rules is itself an advantage It makes writingprograms easier It is nicer for teaching students It alsosimplifies the uncertain business of guessing likelihoodpriors and communicating the methodology when writ-ing articles

Methods That Can Be Used to Estimate Slope as Well as Threshold

There are good reasons why one might want to learnmore about the PF than just the single threshold valueEstimating the PF slope is the first step in this directionTo estimate slope one must test the PF at more than onepoint Methods are available to do this (Kaernbach 2001bKontsevich amp Tyler 1999) and one could even adapt sim-ple Brownian staircase procedures to give better slope es-timates with minimum harm to threshold estimates (seethe comments in Strasburger 2001b)

Probably the best method for getting both thresholds andslopes is the Y (Psi) method of Kontsevich and Tyler(1999) The Y method is not totally novel it borrows tech-niques from many previous researchersmdashas Kontsevichand Tyler point out in a delightful paragraph that both clar-ifies their method and states its ancestry This paragraphsummarizes what is probably the most accurate and preciseapproach to estimating thresholds and slope so I reproduceit here

To summarize the Y method is a combination of solutionsknown from the literature The method updates the poste-rior probability distribution across the sampled space ofthe psychometric functions based on Bayesrsquo rule (Hall1968 Watson amp Pelli 1983) The space of the psychome-tric functions is two-dimensional (King-Smith amp Rose1997 Watson amp Pelli 1983) Evaluation of the psycho-metric function is based on computing the mean of theposterior probability distribution (Emerson 1986 King-Smith et al 1994) The termination rule is based on thenumber of trials as the most practical option (King-Smith

1438 KLEIN

et al 1994 Watson amp Pelli 1983) The placement of eachnew trial is based on one-step ahead minimum search(King-Smith 1984) of the expected entropy cost function(Pelli 1987)

A simplified rough description of the Y trial placementfor 2AFC runs is that initially trials are placed at about the90 correct level to establish a solid threshold Oncethreshold has been roughly established trials are placed atabout the 90 and the 70 levels to pin down the slopeThe precise levels depend on the assumed PF shape

Before one starts measuring slopes however one mustfirst be convinced that knowledge of slope can be impor-tant It is so for several reasons

1 Parametric fitting procedures either require knowl-edge of slope b or must have data that adequately con-strain the slope estimate Part of the discussion that fol-lows will be on how to obtain reliable threshold estimateswhen slope is not well known A tight knowledge of slopecan minimize the error of the threshold estimate

2 Strasburger (2001a 2001b) gives good argumentsfor the desirability of knowing the shape of the PF He isinterested in differences between foveal and peripheralvisual processing and he uses the PF slope as an indica-tion of differences

3 Slopes are different for different stimuli and thiscan lead to misleading results if slope is ignored It mayhelp to describe my experience with the Modelfest pro-ject (Carney et al 2000) to make this issue concreteThis continuing project involves a dozen laboratoriesaround the world We have measured detection thresh-olds for a wide variety of visual stimuli The data areused for developing an improved model of spatial visionSome of the stimuli were simple easily memorized lowspatial frequency patterns that tend to have shallowerslopes because a stimulus-known-exactly template canbe used efficiently and with minimal uncertainty On theother hand tiny Modelfest stimuli can have a good dealof spatial uncertainty and are therefore expected to pro-duce PFs with steeper slopes Because of the slope changethe pattern of thresholds will depend on the level atwhich threshold is measured When this rich dataset isanalyzed with f ilter models that make estimates formechanism bandwidths and mechanism pooling thesedifferent slope-dependent threshold patterns will lead todifferent estimates for the model parameters Several ofus in the Modelfest data gathering group argued that weshould use a data-gathering methodology that wouldallow estimation of PF slope of each of the 45 stimuliHowever I must admit that when I was a subject my pa-tience wore thin and I too chose the quicker runs Al-though I fully understand the attraction of short runs fo-cused on thresholds and ignoring slope I still think thereare good reasons for spreading trials out For one thinginterspersing trials that are slightly visible helps the sub-ject maintain memory of the test pattern

4 Slopes can directly resolve differences between mod-els In my own research for example my colleagues and

I are able to use the PF slope to distinguish long-range fa-cilitation that produces a low-level excitation or noise re-duction (typical with a cross-orientation surround) from ahigher level uncertainty reduction (typical with a same-orientation surround)

Throwing-Out Rules Goodness-of-FitThere are occasions when a staircase goes sour Most

often this happens when the subjectrsquos sensitivity changesin the middle of a run possibly because of fatigue It istherefore good to have a well-specified rule that allowsnonstationary runs to be thrown out The throwing-out rulecould be based on whether the PF slope estimate is toolow or too high Alternatively a standard approach is tolook at the chi-square goodness-of-f it statistic whichwould involve an assumption about the shape of the un-derlying PF I have a commentary in Section III regard-ing biases in the chi-square metric that were found byWichmann and Hill (2001a) Biases make it difficult touse standard chi-square tables The trouble with this ap-proach of basing the goodness-of-fit just on the cumulateddata is that it ignores the sequential history of the run TheBrownian upndashdown staircase makes the sequential historyvividly visible with a plot of sequentially tested levelsWhen fatigue or adaptation sets in one sees the tested lev-els shift from one stable point to another Leek Hanna andMarshall (1991) developed a method of assessing the non-stationarity of a psychometric function over the course ofits measurement using two interleaved tracks EarlierHall (1983) also suggested a technique to address the sameproblem I encourage further research into the develop-ment of a non-stationarity statistic that can be used to dis-card bad runs This statistic would take the sequential his-tory into account as well as the PF shape and the chi-square

The development of a ldquothrowing-outrdquo rule is the ex-treme case of the notion of weighted averaging One im-portant reason for developing good estimators for good-ness-of-fit and good estimators for threshold variance isto be able to optimally combine thresholds from repeatedexperiments I shall return to this issue in connectionwith the Wichmann and Hill (2001a) analysis of bias ingoodness-of-fit metrics

Final Thought The Method Should Depend on the Situation

Also important is that the method chosen should be ap-propriate for the task For example suppose one is usingwell-experienced observers for testing and retesting sim-ilar conditions in which one is looking for small changesin threshold or one is interested in the PF shape In suchcases one has a good estimate of the expected thresholdand the signal detection method of constant stimuli withblanks and with ratings might be best On the other handfor clinical testing in which one is highly uncertain aboutan individualrsquos threshold and criterion stability a 2AFClikelihood or staircase method (with a confidence rating)might be best

MEASURING THE PSYCHOMETRIC FUNCTION 1439

III DATA-FITTING METHODS FOR ESTIMATING THRESHOLD AND SLOPEPLUS GOODNESS-OF-FIT ESTIMATION

This final section of the paper is concerned with whatto do with the data after they have been collected Wehave here the standard Bayesian situation Given the PFit is easy to generate samples of data with confidence in-tervals But we have the inverse situation in which we aregiven the data and need to estimate properties of the PFand get confidence limits for those estimates The pair ofarticles in this special issue by Wichmann and Hill (2001a2001b) as well as the articles by King-Smith et al (1994)Treutwein (1995) and Treutwein and Strasburger (1999)do an outstanding job of introducing the issues and providemany important insights For example Emerson (1986)and King-Smith et al emphasize the advantages of meanlikelihood over maximum likelihood The speed of mod-ern computers makes it just as easy to calculate meanlikelihood as it is to calculate maximum likelihood As an-other example Treutwein and Strasburger (1999) demon-strate the power of using Bayesian priors for estimatingparameters by showing how one can estimate all four pa-rameters of Equation 1A in fewer than 100 trials This im-portant finding deserves further examination

In this section I will focus on four items First I will dis-cuss the relative merits of parametric versus nonparamet-ric methods for estimating the PF properties The paperby Miller and Ulrich (2001) provides a detailed analysis ofa nonparametric method for obtaining all the moments ofthe PF I will discuss some advantages and limitations oftheir method Second is the question of goodness-of-fitWichmann and Hill (2001a) show that when the numberof trials is small the goodness-of-fit can be very biasedI will examine the cause of that bias Third is the issue dis-cussed by Wichmann and Hill (2001a) that of how lapseseffect slope estimates This topic is relevant to Strasburg-errsquos (2001b) data and analysis Finally there is the possiblycontroversial topic of an upward slope bias in adaptivemethods that is relevant to the papers of Kaernbach (2001b)and Strasburger (2001b)

Parametric Versus Nonparametric FitsThe main split between methods for estimating pa-

rameters of the psychometric function is the distinctionbetween parametric and nonparametric methods In aparametric method one uses a PF that involves severalparameters and uses the data to estimate the parametersFor example one might know the shape of the PF and es-timate just the threshold parameter McKee et al (1985)show how slope uncertainty affects threshold uncertaintyin probit analysis The two papers by Wichmann and Hill(2001a 2001b) are an excellent introduction to many is-sues in parametric fitting An example of a nonparamet-ric method for estimating threshold would be to averagethe levels of a Brownian staircase after excluding the firstfour or so reversals in order to avoid the starting level biasBefore I served as guest editor of this special symposium

issue I believed that if one had prior knowledge of the exactPF shape a parametric method for estimating threshold wasprobably better than nonparametric methods After read-ing the articles presented here as well as carrying out myrecent simulations I no longer believe this I think thatin the proper situation nonparametric methods are as goodas parametric methods Furthermore there are situationsin which the shape of the PF is unknown and nonparamet-ric methods can be better

Miller and Ulrichrsquos Article on the Nonparametric SpearmanndashKaumlrber Method of Analysis

The standard way to extract information from psycho-metric data is to assume a functional shape such as a Wei-bull cumulative normal logit dcent and then vary the pa-rameters until likelihood is maximized or chi-square isminimized Miller and Ulrich (2001) propose using theSpearmanndashKaumlrber (SK) nonparametric method whichdoes not require a functional shape They start with a prob-ability density function defined by them as the derivativeof the PF

(33)

where s(i) is the stimulus intensity at the ith level andP(i) is the probability correct at the that level The notationpdf(i) is meant to indicate that it is the pdf for the intervalbetween i ndash 1 and i Miller and Ulrich give the r th momentof this distribution as their Equation 3

(34)

The next four equations are my attempt to clarify their ap-proach I do this because when applicable it is a finemethod that deserves more attention Equation 34 prob-ably came from the mean value theorem of calculus

(35)

where f (s) 5 sr11 and f cent(smid) 5 (r 1 1)smidr is the deriv-

ative of f with respect to s somewhere within the intervalbetween s(i 1) and s(i) For a slowly varying functionsmid would be near the midpoint Given Equation35 Equa-tion 34 becomes

(36)

where Ds 5 s(i) s(i 1) Equation 36 is what one wouldmean by the rth moment of the distribution The case r 50 is important since it gives the area under the pdf

(37)

by using the definition in Equation 33 The summation inEquation 37 gives the area under the pdf For it to be aproper pdf its area must be unity which means that theprobability at the highest level P(imax) must be unity andthe probability at the lowest level Pmin must be zero asMiller and Ulrich (2001) point out

m0 = =aringpdf( ) ( ) ( ) max mini s P i P ii

D

mrr

i

i s s= aring pdf mid( ) D

f s i f s i f s s i s i( ) ( ) ( ) ( ) ( ) [ ] [ ] = cent [ ]1 1mid

mrr r

ri s i s i=

+ [ ]aring + +11

11 1pdf( ) ( ) ( )

pdf( )( ) ( )

( ) ( )i

P i P i

s i s i= 1

1

1440 KLEIN

The next simplest SK moment is for r 5 1

(38)

from Equation 33 This first moment provides an esti-mate of the centroid of the distribution corresponding tothe center of the psychometric function It could be usedas an estimate of threshold for detection or PSE for dis-crimination An alternate estimator would be the medianof the pdf corresponding to the 50 point of the PF (theproper definition of PSE for discrimination)

Miller and Ulrich (2001) use Monte Carlo simulationsto claim that the SK method does a better job of extract-ing properties of the psychometric function than do para-metric methods One must be a bit careful here since intheir parametric fitting they generate the data by usingone PF and then fit it with other PFs But still it is a sur-prising claim if this simple method is so good why havepeople not been using it for years There are two possibleanswers to this question One is that people never thoughtof using it before Miller and Ulrich The other possibil-ity is that it has problems I have concluded that the an-swer is a combination of both factors I think that there isan important place for the SK approach but its limitationsmust be understood That is my next topic

The most important limitation of the SK nonparamet-ric method is that it has only been shown to work well formethod of constant stimulus data in which the PF goesfrom 0 to 1 I question whether it can be applied with thesame confidence to 2AFC and to adaptive procedures withunequal numbers of trials at each level The reason for thislimitation is that the SK method does not offer a simpleway to weight the different samples Let us consider twosituations with unequal weighting adaptive methods and2AFC detection tasks

Adaptive methods place trials near threshold but withvery uneven distribution of number of trials at differentlevels The SK method gives equal weighting to sparselypopulated levels that can be highly variable thus con-tributing to a greater variance of parameter estimates Letus illustrate this point for an extreme case with five levelss 5 [1 2 3 4 5] Suppose the adaptive method placed [11 1 98 1] trials at the five levels and the number correctwere [0 1 1 49 1] giving probabilities [0 1 1 5 1] andpdf 5 [1 0 5 5] The following PEST-like (Taylor ampCreelman 1967) rules could lead to this unusual dataMove down a level if the presently tested level has greaterthan P1 correct move up three levels if the presentlytested level has less than P2 correct P1 can be anynumber less than 33 and P2 can be any number greaterthan 67 The example data would be produced by start-ing at Level 5 and getting four in a row correct followedby an error followed by a wide set possible of alternat-ing responses for which the PEST rules would leave thefourth level as the level to be tested The SK estimate of the

center of the PF is given by Equation 38 to be micro1 5 200Since a pdf with a negative probability is distasteful tosome one can monotonize the PF according to the proce-dure of Miller and Ulrich (which does include the numberof trials at each level) The new probabilities are [0 5151 51 1] with pdf 5 [51 0 0 49] The new estimate ofthreshold is micro1 5 297 However given the data with 4998correct at Level 4 an estimate that includes a weightingaccording to the number of trials at each level would placethe centroid of the distribution very close to micro1 5 4 Thuswe see that because the SK procedure ignores the numberof trials at each level it produces erroneous estimates ofthe PF properties This example with shifting thresholdswas chosen to dramatize the problems of unequal weight-ing and the effect of monotonizing

A similar problem can occur even with the method ofconstant stimuli with equal number of trials at each levelbut with unequal binomial weighting as occurs in 2AFCdata The unequal weighting occurs because of the 50lower asymptote The binomial statistics error bar of dataat 55 correct is much larger than for data at 95 cor-rect Parametric procedures such as probit analysis giveless weighting to the lower data This is why the most ef-ficient placement of trials in 2AFC is above 90 correctas I have discussed in Section II Thus it is expected thatfor 2AFC the SK method would give threshold estimatesless precise than the estimates given by a parametricmethod that includes weighting

I originally thought that the SK method would fail on alltypes of 2AFC data However in the discussion followingEquation 7 I pointed out that for 2AFC discriminationtasks there is a way of extending the data to the 0ndash100range that produces weightings similar to those for a yesnotask Rather than use the standard Equation 2 to deal withthe 50 lower asymptote one should use the method dis-cussed in connection with Figure 3 in which one can usefor the ordinate the percent of correct responses in Inter-val 2 and use the signal strength in Interval 2 minus thatin Interval 1 for the abscissa That procedure produces aPF that goes from 0 to 100 thereby making it suit-able for SK analysis while also removing a possible bias

Now let us examine the SK method as applied to datafor which it is expected to work such as discriminationPFs that go from 0 to 100 These are tasks in which thelocation of the psychometric function is the PSE and theslope is the threshold

A potentially important claim by Miller and Ulrich(2001) is that the SK method performs better than probitanalysis This is a surprising claim since their data aregenerated by a probit PF one would expect probit analy-sis to do well especially since probit analysis does espe-cially well with discrimination PFs that go from 0 to 100To help clarify this issue Miller and I did a number ofsimulations We focused on the case of N 5 40 with 8 tri-als at each of 5 levels The cumulative normal PF (Equa-tion 14) was used with equally spaced z scores and with thelowest and highest probabilities being 1 and 99 Theprobabilities at the sample points are P(i) 5 1 1222

m1

2 21

2

11

2

=

= [ ] +

aring

aring

pdf( )( ) ( )

( ) ( )( ) ( )

is i s i

P i P is i s i

MEASURING THE PSYCHOMETRIC FUNCTION 1441

50 8778 and 99 corresponding to z scores ofz(i) 5 ndash2328 1164 0 1164 and 2328 Nonmonot-onicity is eliminated as Miller and Ulrich suggest (seethe Appendix for Matlab Code 4 for monotonizing) I didMonte Carlo simulations to obtain the standard deviationof the location of the PF micro1 given by Equation38 I foundthat both the SK method and the parametric probit method(assuming a fixed unity slope) gave a standard deviationof between 028 and 029 Miller and Ulrichrsquos result re-ported in their Table 17 gives a standard error of 0071However their Gaussian had a standard deviation of 025whereas my z-score units correspond to a standard devi-ation of unity Thus their standard error must be multipliedby 4 giving a standard error of 0284 in excellent agree-ment with my value So at least in this instance the SKmethod did not do better than the parametric method (withslope fixed) and the surprise mentioned at the beginningof this paragraph did not materialize I did not examineother examples where nonparametric methods might dobetter than the matched parametric method However thesimulations of McKee et al (1985) give some insight intowhy probit analysis with floating slope can do poorly evenon data generated from probit PFs McKee et al showthat when the number of trials is low and if stimulus lev-els near both 0 and 100 are not tested the slope can bevery flat and the predicted threshold can be quite distantIf care is not taken to place a bound on these probit esti-mates one would expect to find deviant threshold esti-mates The SK method on the other hand has built-inbounds for threshold estimates so outliers are avoidedThese bounds could explain why SK can do better thanprobit When the PF slope is uncertain nonparametricmethods might work better than parametric methods adecided advantage for the SK method

It is instructive to also do an analytic calculation of thevariance based on an analysis similar to what was doneearlier If only the ith level of the five had been testedthe variance of the PF location (the PSE in a discrimina-tion task) would have been as follows (Gourevitch ampGalanter 1967)

(39)

where var(P) is given by Equation 27 and PF slope is

(40)

where for the cumulative normal dzi dxi 5 1s where sis the standard deviation of the pdf and

(41)

from Equation 3A By combining these equations one gets

(42)

Note that if all trials were placed at the optimal stimuluslocation of z 5 0 (P 5 5) Equation 42 would becomes2p 2Ni as discussed earlier When all five levels arecombined the total variance is smaller than each individ-ual variance as given by the variance summation formula

(43)

Finally upon plugging in the five values of zi rangingfrom ndash2328 to 12328 and Pi ranging from 1 to 99the standard error of the PF location is

(44)

This value is precisely the value obtained in our MonteCarlo simulations of the nonparametric SK method andalso by the slope fixed parametric probit analysis Thistriple confirmation shows that the SK method is excel-lent in the realm where it applies (equal number of trialsat levels that span 0 to 100) It also confirms our sim-ple analytic formula

The data presented in Miller and Ulrichrsquos Table 17 giveus a fine opportunity to ask about the efficiency of themethod of constant stimuli that they use For the N 5 40example discussed here the variance is

(45)

This value is more than twice the variance of p (2 Ntot)that is obtained for the same PF using adaptive methodsincluding the simple upndashdown staircase that place trialsnear the optimal point

The large variance for the Miller and Ulrich stimulusplacement is not surprising given that of the five stimu-lus levels the two at P 5 1 and 99 are quite inefficientThe inefficient placement of trials is a common problemfor the method of constant stimuli as opposed to adaptivemethods However as I mentioned earlier the method ofconstant stimuli combined with multiple response ratingsin a signal detection framework may be the best of allmethods for detection studies for revealing properties of thesystem near threshold while maintaining a high efficiency

A curious question comes up about the optimal placementof trials for measuring the location of the psychometric func-tion With the original function the optimal placement is atthe steepest part of the function (at P 5 0) However withthe SK method when one is determining the location ofthe pdf one might think that the optimal placement of sam-ples should occur at the points where the pdf is steepestThis contradiction is resolved when one realizes that onlythe original samples are independent The pdf samples areobtained by taking differences of adjacent PF values soneighboring samples are anticorrelated Thus one cannot

var tottot

= =SEN

2 3 24

SE = =var tot12 0 2844

var var tot = aring1 1i

var( )exp

PSE i i i

i

i

P Pz

N= ( ) ( )

2 12

2

ps

dP

dz

zi

i

iaeligegraveccedil

oumloslashdivide

=( )2 2

2

exp

p

dP

dx

dP

dz

dz

dxi

i

i

i

i

i

= times

var( )var

PSE i

i

i

i

P

dP

dx

=( )

aeligegraveccedil

oumloslashdivide

2

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

MEASURING THE PSYCHOMETRIC FUNCTION 1437

of 56 5 8333 If one had zero information about thetrial and guessed then one would be correct half the timeand the staircase would shift an average of (5 1)2 512 levels Thus in his scheme with a single ldquodonrsquot knowrdquoresponse Kaernbach would have the staircase move up2 levels following a ldquodonrsquot knowrdquo response One mightworry that continued use of the ldquodonrsquot knowrdquo responsewould continuously increase the level so that one wouldget an upward bias But Kaernbach points out that theldquodonrsquot knowrdquo response replaces incorrect responses to agreater extent than it replaces correct responses so thatthe 12 level shift is actually a more negative shift than the15 shift associated with a wrong response For my sug-gestion of four responses the step shifts would be ndash1 1113 and 15 levels for responses HC LC LI and HIwhere H and L stand for high and low confidence and Cand I stand for correct and incorrect A slightly more con-servative set of responses would be ndash1 0 14 15 inwhich case the low-confidence correct judgment leaves thelevel unchanged In all these examples including Kaern-bachrsquos three-response method the low-confidence re-sponse has an average level shift of 12 when one is guess-ing The most important feature of the low confidencerating is that when one is below threshold giving a lowconfidence rating will prevent one from long strings oflucky guesses that bring one to an inappropriately lowlevel This is the situation that can be difficult to overcomeif it occurs in the early phase of a staircase when the stepsize can be large The use of the confidence rating wouldbe especially important for short runs where the extra bitof information from the subject is useful for minimizingerroneous estimates of threshold

Another reason not to be fearful of the multiple-response2AFC method is that after the run is finished one can es-timate threshold by using a maximum likelihood proce-dure rather than by averaging the run locations Standardlikelihood analysis would have to be modified for Kaern-bachrsquos three-response method in order to deal with theldquodonrsquot knowrdquo category For the suggested four-responsemethod one could simply ignore the confidence ratings forthe likelihood analysis My simulations discussed earlierin this section and Greenrsquos (1990) reveal that averagingover the levels (excluding a number of initial levels toavoid the starting point bias) has about as good a preci-sion and accuracy as the best of the likelihood methodsIf the threshold estimates from the likelihood method andfrom the averaging levels method disagree that run couldbe thrown out

Summary of advantages of Brownian staircaseSince there is no substantial variance advantage to thefancier likelihood methods one can pay more attention toa number of advantages associated with simple staircases

1 At the beginning of a run likelihood methods mayhave too little inertia (the sequential stimulus levels jumparound easily) unless a ldquostiff rdquo prior is carefully chosenBy inertia I refer to how difficult it is for the next level todeviate from the present level At the beginning of a runone would like to familiarize the subject with the stimu-

lus by presenting a series of stimuli that gradually increasein difficulty This natural feature of the Brownian stair-case is usually missing in QUEST-like methods Theslightly inefficient first few trials of a simple staircasemay actually be a benefit since it gives the subject somepractice with slightly visible stimuli

2 Toward the end of a maximum likelihood run one canhave the opposite problem of excessive inertia wherebyit is difficult to have substantial shifts of test contrastThis can be a problem if nonstationary effects are presentFor example thresholds can increase in mid-run owingto fatigue or to adaptation if a suprathreshold pedestal ormask is present In an old-fashioned Brownian staircasewith moderate inertia a quick look at the staircase trackshows the change of threshold in mid-run That observa-tion can provide an important clue to possible problemswith the current procedures A method that reveals non-stationarity is especially important for difficult subjectssuch as infants or patients in whom fatigue or inatten-tion can be an important factor

3 Several researchers have told me that the simplicity ofthe staircase rules is itself an advantage It makes writingprograms easier It is nicer for teaching students It alsosimplifies the uncertain business of guessing likelihoodpriors and communicating the methodology when writ-ing articles

Methods That Can Be Used to Estimate Slope as Well as Threshold

There are good reasons why one might want to learnmore about the PF than just the single threshold valueEstimating the PF slope is the first step in this directionTo estimate slope one must test the PF at more than onepoint Methods are available to do this (Kaernbach 2001bKontsevich amp Tyler 1999) and one could even adapt sim-ple Brownian staircase procedures to give better slope es-timates with minimum harm to threshold estimates (seethe comments in Strasburger 2001b)

Probably the best method for getting both thresholds andslopes is the Y (Psi) method of Kontsevich and Tyler(1999) The Y method is not totally novel it borrows tech-niques from many previous researchersmdashas Kontsevichand Tyler point out in a delightful paragraph that both clar-ifies their method and states its ancestry This paragraphsummarizes what is probably the most accurate and preciseapproach to estimating thresholds and slope so I reproduceit here

To summarize the Y method is a combination of solutionsknown from the literature The method updates the poste-rior probability distribution across the sampled space ofthe psychometric functions based on Bayesrsquo rule (Hall1968 Watson amp Pelli 1983) The space of the psychome-tric functions is two-dimensional (King-Smith amp Rose1997 Watson amp Pelli 1983) Evaluation of the psycho-metric function is based on computing the mean of theposterior probability distribution (Emerson 1986 King-Smith et al 1994) The termination rule is based on thenumber of trials as the most practical option (King-Smith

1438 KLEIN

et al 1994 Watson amp Pelli 1983) The placement of eachnew trial is based on one-step ahead minimum search(King-Smith 1984) of the expected entropy cost function(Pelli 1987)

A simplified rough description of the Y trial placementfor 2AFC runs is that initially trials are placed at about the90 correct level to establish a solid threshold Oncethreshold has been roughly established trials are placed atabout the 90 and the 70 levels to pin down the slopeThe precise levels depend on the assumed PF shape

Before one starts measuring slopes however one mustfirst be convinced that knowledge of slope can be impor-tant It is so for several reasons

1 Parametric fitting procedures either require knowl-edge of slope b or must have data that adequately con-strain the slope estimate Part of the discussion that fol-lows will be on how to obtain reliable threshold estimateswhen slope is not well known A tight knowledge of slopecan minimize the error of the threshold estimate

2 Strasburger (2001a 2001b) gives good argumentsfor the desirability of knowing the shape of the PF He isinterested in differences between foveal and peripheralvisual processing and he uses the PF slope as an indica-tion of differences

3 Slopes are different for different stimuli and thiscan lead to misleading results if slope is ignored It mayhelp to describe my experience with the Modelfest pro-ject (Carney et al 2000) to make this issue concreteThis continuing project involves a dozen laboratoriesaround the world We have measured detection thresh-olds for a wide variety of visual stimuli The data areused for developing an improved model of spatial visionSome of the stimuli were simple easily memorized lowspatial frequency patterns that tend to have shallowerslopes because a stimulus-known-exactly template canbe used efficiently and with minimal uncertainty On theother hand tiny Modelfest stimuli can have a good dealof spatial uncertainty and are therefore expected to pro-duce PFs with steeper slopes Because of the slope changethe pattern of thresholds will depend on the level atwhich threshold is measured When this rich dataset isanalyzed with f ilter models that make estimates formechanism bandwidths and mechanism pooling thesedifferent slope-dependent threshold patterns will lead todifferent estimates for the model parameters Several ofus in the Modelfest data gathering group argued that weshould use a data-gathering methodology that wouldallow estimation of PF slope of each of the 45 stimuliHowever I must admit that when I was a subject my pa-tience wore thin and I too chose the quicker runs Al-though I fully understand the attraction of short runs fo-cused on thresholds and ignoring slope I still think thereare good reasons for spreading trials out For one thinginterspersing trials that are slightly visible helps the sub-ject maintain memory of the test pattern

4 Slopes can directly resolve differences between mod-els In my own research for example my colleagues and

I are able to use the PF slope to distinguish long-range fa-cilitation that produces a low-level excitation or noise re-duction (typical with a cross-orientation surround) from ahigher level uncertainty reduction (typical with a same-orientation surround)

Throwing-Out Rules Goodness-of-FitThere are occasions when a staircase goes sour Most

often this happens when the subjectrsquos sensitivity changesin the middle of a run possibly because of fatigue It istherefore good to have a well-specified rule that allowsnonstationary runs to be thrown out The throwing-out rulecould be based on whether the PF slope estimate is toolow or too high Alternatively a standard approach is tolook at the chi-square goodness-of-f it statistic whichwould involve an assumption about the shape of the un-derlying PF I have a commentary in Section III regard-ing biases in the chi-square metric that were found byWichmann and Hill (2001a) Biases make it difficult touse standard chi-square tables The trouble with this ap-proach of basing the goodness-of-fit just on the cumulateddata is that it ignores the sequential history of the run TheBrownian upndashdown staircase makes the sequential historyvividly visible with a plot of sequentially tested levelsWhen fatigue or adaptation sets in one sees the tested lev-els shift from one stable point to another Leek Hanna andMarshall (1991) developed a method of assessing the non-stationarity of a psychometric function over the course ofits measurement using two interleaved tracks EarlierHall (1983) also suggested a technique to address the sameproblem I encourage further research into the develop-ment of a non-stationarity statistic that can be used to dis-card bad runs This statistic would take the sequential his-tory into account as well as the PF shape and the chi-square

The development of a ldquothrowing-outrdquo rule is the ex-treme case of the notion of weighted averaging One im-portant reason for developing good estimators for good-ness-of-fit and good estimators for threshold variance isto be able to optimally combine thresholds from repeatedexperiments I shall return to this issue in connectionwith the Wichmann and Hill (2001a) analysis of bias ingoodness-of-fit metrics

Final Thought The Method Should Depend on the Situation

Also important is that the method chosen should be ap-propriate for the task For example suppose one is usingwell-experienced observers for testing and retesting sim-ilar conditions in which one is looking for small changesin threshold or one is interested in the PF shape In suchcases one has a good estimate of the expected thresholdand the signal detection method of constant stimuli withblanks and with ratings might be best On the other handfor clinical testing in which one is highly uncertain aboutan individualrsquos threshold and criterion stability a 2AFClikelihood or staircase method (with a confidence rating)might be best

MEASURING THE PSYCHOMETRIC FUNCTION 1439

III DATA-FITTING METHODS FOR ESTIMATING THRESHOLD AND SLOPEPLUS GOODNESS-OF-FIT ESTIMATION

This final section of the paper is concerned with whatto do with the data after they have been collected Wehave here the standard Bayesian situation Given the PFit is easy to generate samples of data with confidence in-tervals But we have the inverse situation in which we aregiven the data and need to estimate properties of the PFand get confidence limits for those estimates The pair ofarticles in this special issue by Wichmann and Hill (2001a2001b) as well as the articles by King-Smith et al (1994)Treutwein (1995) and Treutwein and Strasburger (1999)do an outstanding job of introducing the issues and providemany important insights For example Emerson (1986)and King-Smith et al emphasize the advantages of meanlikelihood over maximum likelihood The speed of mod-ern computers makes it just as easy to calculate meanlikelihood as it is to calculate maximum likelihood As an-other example Treutwein and Strasburger (1999) demon-strate the power of using Bayesian priors for estimatingparameters by showing how one can estimate all four pa-rameters of Equation 1A in fewer than 100 trials This im-portant finding deserves further examination

In this section I will focus on four items First I will dis-cuss the relative merits of parametric versus nonparamet-ric methods for estimating the PF properties The paperby Miller and Ulrich (2001) provides a detailed analysis ofa nonparametric method for obtaining all the moments ofthe PF I will discuss some advantages and limitations oftheir method Second is the question of goodness-of-fitWichmann and Hill (2001a) show that when the numberof trials is small the goodness-of-fit can be very biasedI will examine the cause of that bias Third is the issue dis-cussed by Wichmann and Hill (2001a) that of how lapseseffect slope estimates This topic is relevant to Strasburg-errsquos (2001b) data and analysis Finally there is the possiblycontroversial topic of an upward slope bias in adaptivemethods that is relevant to the papers of Kaernbach (2001b)and Strasburger (2001b)

Parametric Versus Nonparametric FitsThe main split between methods for estimating pa-

rameters of the psychometric function is the distinctionbetween parametric and nonparametric methods In aparametric method one uses a PF that involves severalparameters and uses the data to estimate the parametersFor example one might know the shape of the PF and es-timate just the threshold parameter McKee et al (1985)show how slope uncertainty affects threshold uncertaintyin probit analysis The two papers by Wichmann and Hill(2001a 2001b) are an excellent introduction to many is-sues in parametric fitting An example of a nonparamet-ric method for estimating threshold would be to averagethe levels of a Brownian staircase after excluding the firstfour or so reversals in order to avoid the starting level biasBefore I served as guest editor of this special symposium

issue I believed that if one had prior knowledge of the exactPF shape a parametric method for estimating threshold wasprobably better than nonparametric methods After read-ing the articles presented here as well as carrying out myrecent simulations I no longer believe this I think thatin the proper situation nonparametric methods are as goodas parametric methods Furthermore there are situationsin which the shape of the PF is unknown and nonparamet-ric methods can be better

Miller and Ulrichrsquos Article on the Nonparametric SpearmanndashKaumlrber Method of Analysis

The standard way to extract information from psycho-metric data is to assume a functional shape such as a Wei-bull cumulative normal logit dcent and then vary the pa-rameters until likelihood is maximized or chi-square isminimized Miller and Ulrich (2001) propose using theSpearmanndashKaumlrber (SK) nonparametric method whichdoes not require a functional shape They start with a prob-ability density function defined by them as the derivativeof the PF

(33)

where s(i) is the stimulus intensity at the ith level andP(i) is the probability correct at the that level The notationpdf(i) is meant to indicate that it is the pdf for the intervalbetween i ndash 1 and i Miller and Ulrich give the r th momentof this distribution as their Equation 3

(34)

The next four equations are my attempt to clarify their ap-proach I do this because when applicable it is a finemethod that deserves more attention Equation 34 prob-ably came from the mean value theorem of calculus

(35)

where f (s) 5 sr11 and f cent(smid) 5 (r 1 1)smidr is the deriv-

ative of f with respect to s somewhere within the intervalbetween s(i 1) and s(i) For a slowly varying functionsmid would be near the midpoint Given Equation35 Equa-tion 34 becomes

(36)

where Ds 5 s(i) s(i 1) Equation 36 is what one wouldmean by the rth moment of the distribution The case r 50 is important since it gives the area under the pdf

(37)

by using the definition in Equation 33 The summation inEquation 37 gives the area under the pdf For it to be aproper pdf its area must be unity which means that theprobability at the highest level P(imax) must be unity andthe probability at the lowest level Pmin must be zero asMiller and Ulrich (2001) point out

m0 = =aringpdf( ) ( ) ( ) max mini s P i P ii

D

mrr

i

i s s= aring pdf mid( ) D

f s i f s i f s s i s i( ) ( ) ( ) ( ) ( ) [ ] [ ] = cent [ ]1 1mid

mrr r

ri s i s i=

+ [ ]aring + +11

11 1pdf( ) ( ) ( )

pdf( )( ) ( )

( ) ( )i

P i P i

s i s i= 1

1

1440 KLEIN

The next simplest SK moment is for r 5 1

(38)

from Equation 33 This first moment provides an esti-mate of the centroid of the distribution corresponding tothe center of the psychometric function It could be usedas an estimate of threshold for detection or PSE for dis-crimination An alternate estimator would be the medianof the pdf corresponding to the 50 point of the PF (theproper definition of PSE for discrimination)

Miller and Ulrich (2001) use Monte Carlo simulationsto claim that the SK method does a better job of extract-ing properties of the psychometric function than do para-metric methods One must be a bit careful here since intheir parametric fitting they generate the data by usingone PF and then fit it with other PFs But still it is a sur-prising claim if this simple method is so good why havepeople not been using it for years There are two possibleanswers to this question One is that people never thoughtof using it before Miller and Ulrich The other possibil-ity is that it has problems I have concluded that the an-swer is a combination of both factors I think that there isan important place for the SK approach but its limitationsmust be understood That is my next topic

The most important limitation of the SK nonparamet-ric method is that it has only been shown to work well formethod of constant stimulus data in which the PF goesfrom 0 to 1 I question whether it can be applied with thesame confidence to 2AFC and to adaptive procedures withunequal numbers of trials at each level The reason for thislimitation is that the SK method does not offer a simpleway to weight the different samples Let us consider twosituations with unequal weighting adaptive methods and2AFC detection tasks

Adaptive methods place trials near threshold but withvery uneven distribution of number of trials at differentlevels The SK method gives equal weighting to sparselypopulated levels that can be highly variable thus con-tributing to a greater variance of parameter estimates Letus illustrate this point for an extreme case with five levelss 5 [1 2 3 4 5] Suppose the adaptive method placed [11 1 98 1] trials at the five levels and the number correctwere [0 1 1 49 1] giving probabilities [0 1 1 5 1] andpdf 5 [1 0 5 5] The following PEST-like (Taylor ampCreelman 1967) rules could lead to this unusual dataMove down a level if the presently tested level has greaterthan P1 correct move up three levels if the presentlytested level has less than P2 correct P1 can be anynumber less than 33 and P2 can be any number greaterthan 67 The example data would be produced by start-ing at Level 5 and getting four in a row correct followedby an error followed by a wide set possible of alternat-ing responses for which the PEST rules would leave thefourth level as the level to be tested The SK estimate of the

center of the PF is given by Equation 38 to be micro1 5 200Since a pdf with a negative probability is distasteful tosome one can monotonize the PF according to the proce-dure of Miller and Ulrich (which does include the numberof trials at each level) The new probabilities are [0 5151 51 1] with pdf 5 [51 0 0 49] The new estimate ofthreshold is micro1 5 297 However given the data with 4998correct at Level 4 an estimate that includes a weightingaccording to the number of trials at each level would placethe centroid of the distribution very close to micro1 5 4 Thuswe see that because the SK procedure ignores the numberof trials at each level it produces erroneous estimates ofthe PF properties This example with shifting thresholdswas chosen to dramatize the problems of unequal weight-ing and the effect of monotonizing

A similar problem can occur even with the method ofconstant stimuli with equal number of trials at each levelbut with unequal binomial weighting as occurs in 2AFCdata The unequal weighting occurs because of the 50lower asymptote The binomial statistics error bar of dataat 55 correct is much larger than for data at 95 cor-rect Parametric procedures such as probit analysis giveless weighting to the lower data This is why the most ef-ficient placement of trials in 2AFC is above 90 correctas I have discussed in Section II Thus it is expected thatfor 2AFC the SK method would give threshold estimatesless precise than the estimates given by a parametricmethod that includes weighting

I originally thought that the SK method would fail on alltypes of 2AFC data However in the discussion followingEquation 7 I pointed out that for 2AFC discriminationtasks there is a way of extending the data to the 0ndash100range that produces weightings similar to those for a yesnotask Rather than use the standard Equation 2 to deal withthe 50 lower asymptote one should use the method dis-cussed in connection with Figure 3 in which one can usefor the ordinate the percent of correct responses in Inter-val 2 and use the signal strength in Interval 2 minus thatin Interval 1 for the abscissa That procedure produces aPF that goes from 0 to 100 thereby making it suit-able for SK analysis while also removing a possible bias

Now let us examine the SK method as applied to datafor which it is expected to work such as discriminationPFs that go from 0 to 100 These are tasks in which thelocation of the psychometric function is the PSE and theslope is the threshold

A potentially important claim by Miller and Ulrich(2001) is that the SK method performs better than probitanalysis This is a surprising claim since their data aregenerated by a probit PF one would expect probit analy-sis to do well especially since probit analysis does espe-cially well with discrimination PFs that go from 0 to 100To help clarify this issue Miller and I did a number ofsimulations We focused on the case of N 5 40 with 8 tri-als at each of 5 levels The cumulative normal PF (Equa-tion 14) was used with equally spaced z scores and with thelowest and highest probabilities being 1 and 99 Theprobabilities at the sample points are P(i) 5 1 1222

m1

2 21

2

11

2

=

= [ ] +

aring

aring

pdf( )( ) ( )

( ) ( )( ) ( )

is i s i

P i P is i s i

MEASURING THE PSYCHOMETRIC FUNCTION 1441

50 8778 and 99 corresponding to z scores ofz(i) 5 ndash2328 1164 0 1164 and 2328 Nonmonot-onicity is eliminated as Miller and Ulrich suggest (seethe Appendix for Matlab Code 4 for monotonizing) I didMonte Carlo simulations to obtain the standard deviationof the location of the PF micro1 given by Equation38 I foundthat both the SK method and the parametric probit method(assuming a fixed unity slope) gave a standard deviationof between 028 and 029 Miller and Ulrichrsquos result re-ported in their Table 17 gives a standard error of 0071However their Gaussian had a standard deviation of 025whereas my z-score units correspond to a standard devi-ation of unity Thus their standard error must be multipliedby 4 giving a standard error of 0284 in excellent agree-ment with my value So at least in this instance the SKmethod did not do better than the parametric method (withslope fixed) and the surprise mentioned at the beginningof this paragraph did not materialize I did not examineother examples where nonparametric methods might dobetter than the matched parametric method However thesimulations of McKee et al (1985) give some insight intowhy probit analysis with floating slope can do poorly evenon data generated from probit PFs McKee et al showthat when the number of trials is low and if stimulus lev-els near both 0 and 100 are not tested the slope can bevery flat and the predicted threshold can be quite distantIf care is not taken to place a bound on these probit esti-mates one would expect to find deviant threshold esti-mates The SK method on the other hand has built-inbounds for threshold estimates so outliers are avoidedThese bounds could explain why SK can do better thanprobit When the PF slope is uncertain nonparametricmethods might work better than parametric methods adecided advantage for the SK method

It is instructive to also do an analytic calculation of thevariance based on an analysis similar to what was doneearlier If only the ith level of the five had been testedthe variance of the PF location (the PSE in a discrimina-tion task) would have been as follows (Gourevitch ampGalanter 1967)

(39)

where var(P) is given by Equation 27 and PF slope is

(40)

where for the cumulative normal dzi dxi 5 1s where sis the standard deviation of the pdf and

(41)

from Equation 3A By combining these equations one gets

(42)

Note that if all trials were placed at the optimal stimuluslocation of z 5 0 (P 5 5) Equation 42 would becomes2p 2Ni as discussed earlier When all five levels arecombined the total variance is smaller than each individ-ual variance as given by the variance summation formula

(43)

Finally upon plugging in the five values of zi rangingfrom ndash2328 to 12328 and Pi ranging from 1 to 99the standard error of the PF location is

(44)

This value is precisely the value obtained in our MonteCarlo simulations of the nonparametric SK method andalso by the slope fixed parametric probit analysis Thistriple confirmation shows that the SK method is excel-lent in the realm where it applies (equal number of trialsat levels that span 0 to 100) It also confirms our sim-ple analytic formula

The data presented in Miller and Ulrichrsquos Table 17 giveus a fine opportunity to ask about the efficiency of themethod of constant stimuli that they use For the N 5 40example discussed here the variance is

(45)

This value is more than twice the variance of p (2 Ntot)that is obtained for the same PF using adaptive methodsincluding the simple upndashdown staircase that place trialsnear the optimal point

The large variance for the Miller and Ulrich stimulusplacement is not surprising given that of the five stimu-lus levels the two at P 5 1 and 99 are quite inefficientThe inefficient placement of trials is a common problemfor the method of constant stimuli as opposed to adaptivemethods However as I mentioned earlier the method ofconstant stimuli combined with multiple response ratingsin a signal detection framework may be the best of allmethods for detection studies for revealing properties of thesystem near threshold while maintaining a high efficiency

A curious question comes up about the optimal placementof trials for measuring the location of the psychometric func-tion With the original function the optimal placement is atthe steepest part of the function (at P 5 0) However withthe SK method when one is determining the location ofthe pdf one might think that the optimal placement of sam-ples should occur at the points where the pdf is steepestThis contradiction is resolved when one realizes that onlythe original samples are independent The pdf samples areobtained by taking differences of adjacent PF values soneighboring samples are anticorrelated Thus one cannot

var tottot

= =SEN

2 3 24

SE = =var tot12 0 2844

var var tot = aring1 1i

var( )exp

PSE i i i

i

i

P Pz

N= ( ) ( )

2 12

2

ps

dP

dz

zi

i

iaeligegraveccedil

oumloslashdivide

=( )2 2

2

exp

p

dP

dx

dP

dz

dz

dxi

i

i

i

i

i

= times

var( )var

PSE i

i

i

i

P

dP

dx

=( )

aeligegraveccedil

oumloslashdivide

2

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

1438 KLEIN

et al 1994 Watson amp Pelli 1983) The placement of eachnew trial is based on one-step ahead minimum search(King-Smith 1984) of the expected entropy cost function(Pelli 1987)

A simplified rough description of the Y trial placementfor 2AFC runs is that initially trials are placed at about the90 correct level to establish a solid threshold Oncethreshold has been roughly established trials are placed atabout the 90 and the 70 levels to pin down the slopeThe precise levels depend on the assumed PF shape

Before one starts measuring slopes however one mustfirst be convinced that knowledge of slope can be impor-tant It is so for several reasons

1 Parametric fitting procedures either require knowl-edge of slope b or must have data that adequately con-strain the slope estimate Part of the discussion that fol-lows will be on how to obtain reliable threshold estimateswhen slope is not well known A tight knowledge of slopecan minimize the error of the threshold estimate

2 Strasburger (2001a 2001b) gives good argumentsfor the desirability of knowing the shape of the PF He isinterested in differences between foveal and peripheralvisual processing and he uses the PF slope as an indica-tion of differences

3 Slopes are different for different stimuli and thiscan lead to misleading results if slope is ignored It mayhelp to describe my experience with the Modelfest pro-ject (Carney et al 2000) to make this issue concreteThis continuing project involves a dozen laboratoriesaround the world We have measured detection thresh-olds for a wide variety of visual stimuli The data areused for developing an improved model of spatial visionSome of the stimuli were simple easily memorized lowspatial frequency patterns that tend to have shallowerslopes because a stimulus-known-exactly template canbe used efficiently and with minimal uncertainty On theother hand tiny Modelfest stimuli can have a good dealof spatial uncertainty and are therefore expected to pro-duce PFs with steeper slopes Because of the slope changethe pattern of thresholds will depend on the level atwhich threshold is measured When this rich dataset isanalyzed with f ilter models that make estimates formechanism bandwidths and mechanism pooling thesedifferent slope-dependent threshold patterns will lead todifferent estimates for the model parameters Several ofus in the Modelfest data gathering group argued that weshould use a data-gathering methodology that wouldallow estimation of PF slope of each of the 45 stimuliHowever I must admit that when I was a subject my pa-tience wore thin and I too chose the quicker runs Al-though I fully understand the attraction of short runs fo-cused on thresholds and ignoring slope I still think thereare good reasons for spreading trials out For one thinginterspersing trials that are slightly visible helps the sub-ject maintain memory of the test pattern

4 Slopes can directly resolve differences between mod-els In my own research for example my colleagues and

I are able to use the PF slope to distinguish long-range fa-cilitation that produces a low-level excitation or noise re-duction (typical with a cross-orientation surround) from ahigher level uncertainty reduction (typical with a same-orientation surround)

Throwing-Out Rules Goodness-of-FitThere are occasions when a staircase goes sour Most

often this happens when the subjectrsquos sensitivity changesin the middle of a run possibly because of fatigue It istherefore good to have a well-specified rule that allowsnonstationary runs to be thrown out The throwing-out rulecould be based on whether the PF slope estimate is toolow or too high Alternatively a standard approach is tolook at the chi-square goodness-of-f it statistic whichwould involve an assumption about the shape of the un-derlying PF I have a commentary in Section III regard-ing biases in the chi-square metric that were found byWichmann and Hill (2001a) Biases make it difficult touse standard chi-square tables The trouble with this ap-proach of basing the goodness-of-fit just on the cumulateddata is that it ignores the sequential history of the run TheBrownian upndashdown staircase makes the sequential historyvividly visible with a plot of sequentially tested levelsWhen fatigue or adaptation sets in one sees the tested lev-els shift from one stable point to another Leek Hanna andMarshall (1991) developed a method of assessing the non-stationarity of a psychometric function over the course ofits measurement using two interleaved tracks EarlierHall (1983) also suggested a technique to address the sameproblem I encourage further research into the develop-ment of a non-stationarity statistic that can be used to dis-card bad runs This statistic would take the sequential his-tory into account as well as the PF shape and the chi-square

The development of a ldquothrowing-outrdquo rule is the ex-treme case of the notion of weighted averaging One im-portant reason for developing good estimators for good-ness-of-fit and good estimators for threshold variance isto be able to optimally combine thresholds from repeatedexperiments I shall return to this issue in connectionwith the Wichmann and Hill (2001a) analysis of bias ingoodness-of-fit metrics

Final Thought The Method Should Depend on the Situation

Also important is that the method chosen should be ap-propriate for the task For example suppose one is usingwell-experienced observers for testing and retesting sim-ilar conditions in which one is looking for small changesin threshold or one is interested in the PF shape In suchcases one has a good estimate of the expected thresholdand the signal detection method of constant stimuli withblanks and with ratings might be best On the other handfor clinical testing in which one is highly uncertain aboutan individualrsquos threshold and criterion stability a 2AFClikelihood or staircase method (with a confidence rating)might be best

MEASURING THE PSYCHOMETRIC FUNCTION 1439

III DATA-FITTING METHODS FOR ESTIMATING THRESHOLD AND SLOPEPLUS GOODNESS-OF-FIT ESTIMATION

This final section of the paper is concerned with whatto do with the data after they have been collected Wehave here the standard Bayesian situation Given the PFit is easy to generate samples of data with confidence in-tervals But we have the inverse situation in which we aregiven the data and need to estimate properties of the PFand get confidence limits for those estimates The pair ofarticles in this special issue by Wichmann and Hill (2001a2001b) as well as the articles by King-Smith et al (1994)Treutwein (1995) and Treutwein and Strasburger (1999)do an outstanding job of introducing the issues and providemany important insights For example Emerson (1986)and King-Smith et al emphasize the advantages of meanlikelihood over maximum likelihood The speed of mod-ern computers makes it just as easy to calculate meanlikelihood as it is to calculate maximum likelihood As an-other example Treutwein and Strasburger (1999) demon-strate the power of using Bayesian priors for estimatingparameters by showing how one can estimate all four pa-rameters of Equation 1A in fewer than 100 trials This im-portant finding deserves further examination

In this section I will focus on four items First I will dis-cuss the relative merits of parametric versus nonparamet-ric methods for estimating the PF properties The paperby Miller and Ulrich (2001) provides a detailed analysis ofa nonparametric method for obtaining all the moments ofthe PF I will discuss some advantages and limitations oftheir method Second is the question of goodness-of-fitWichmann and Hill (2001a) show that when the numberof trials is small the goodness-of-fit can be very biasedI will examine the cause of that bias Third is the issue dis-cussed by Wichmann and Hill (2001a) that of how lapseseffect slope estimates This topic is relevant to Strasburg-errsquos (2001b) data and analysis Finally there is the possiblycontroversial topic of an upward slope bias in adaptivemethods that is relevant to the papers of Kaernbach (2001b)and Strasburger (2001b)

Parametric Versus Nonparametric FitsThe main split between methods for estimating pa-

rameters of the psychometric function is the distinctionbetween parametric and nonparametric methods In aparametric method one uses a PF that involves severalparameters and uses the data to estimate the parametersFor example one might know the shape of the PF and es-timate just the threshold parameter McKee et al (1985)show how slope uncertainty affects threshold uncertaintyin probit analysis The two papers by Wichmann and Hill(2001a 2001b) are an excellent introduction to many is-sues in parametric fitting An example of a nonparamet-ric method for estimating threshold would be to averagethe levels of a Brownian staircase after excluding the firstfour or so reversals in order to avoid the starting level biasBefore I served as guest editor of this special symposium

issue I believed that if one had prior knowledge of the exactPF shape a parametric method for estimating threshold wasprobably better than nonparametric methods After read-ing the articles presented here as well as carrying out myrecent simulations I no longer believe this I think thatin the proper situation nonparametric methods are as goodas parametric methods Furthermore there are situationsin which the shape of the PF is unknown and nonparamet-ric methods can be better

Miller and Ulrichrsquos Article on the Nonparametric SpearmanndashKaumlrber Method of Analysis

The standard way to extract information from psycho-metric data is to assume a functional shape such as a Wei-bull cumulative normal logit dcent and then vary the pa-rameters until likelihood is maximized or chi-square isminimized Miller and Ulrich (2001) propose using theSpearmanndashKaumlrber (SK) nonparametric method whichdoes not require a functional shape They start with a prob-ability density function defined by them as the derivativeof the PF

(33)

where s(i) is the stimulus intensity at the ith level andP(i) is the probability correct at the that level The notationpdf(i) is meant to indicate that it is the pdf for the intervalbetween i ndash 1 and i Miller and Ulrich give the r th momentof this distribution as their Equation 3

(34)

The next four equations are my attempt to clarify their ap-proach I do this because when applicable it is a finemethod that deserves more attention Equation 34 prob-ably came from the mean value theorem of calculus

(35)

where f (s) 5 sr11 and f cent(smid) 5 (r 1 1)smidr is the deriv-

ative of f with respect to s somewhere within the intervalbetween s(i 1) and s(i) For a slowly varying functionsmid would be near the midpoint Given Equation35 Equa-tion 34 becomes

(36)

where Ds 5 s(i) s(i 1) Equation 36 is what one wouldmean by the rth moment of the distribution The case r 50 is important since it gives the area under the pdf

(37)

by using the definition in Equation 33 The summation inEquation 37 gives the area under the pdf For it to be aproper pdf its area must be unity which means that theprobability at the highest level P(imax) must be unity andthe probability at the lowest level Pmin must be zero asMiller and Ulrich (2001) point out

m0 = =aringpdf( ) ( ) ( ) max mini s P i P ii

D

mrr

i

i s s= aring pdf mid( ) D

f s i f s i f s s i s i( ) ( ) ( ) ( ) ( ) [ ] [ ] = cent [ ]1 1mid

mrr r

ri s i s i=

+ [ ]aring + +11

11 1pdf( ) ( ) ( )

pdf( )( ) ( )

( ) ( )i

P i P i

s i s i= 1

1

1440 KLEIN

The next simplest SK moment is for r 5 1

(38)

from Equation 33 This first moment provides an esti-mate of the centroid of the distribution corresponding tothe center of the psychometric function It could be usedas an estimate of threshold for detection or PSE for dis-crimination An alternate estimator would be the medianof the pdf corresponding to the 50 point of the PF (theproper definition of PSE for discrimination)

Miller and Ulrich (2001) use Monte Carlo simulationsto claim that the SK method does a better job of extract-ing properties of the psychometric function than do para-metric methods One must be a bit careful here since intheir parametric fitting they generate the data by usingone PF and then fit it with other PFs But still it is a sur-prising claim if this simple method is so good why havepeople not been using it for years There are two possibleanswers to this question One is that people never thoughtof using it before Miller and Ulrich The other possibil-ity is that it has problems I have concluded that the an-swer is a combination of both factors I think that there isan important place for the SK approach but its limitationsmust be understood That is my next topic

The most important limitation of the SK nonparamet-ric method is that it has only been shown to work well formethod of constant stimulus data in which the PF goesfrom 0 to 1 I question whether it can be applied with thesame confidence to 2AFC and to adaptive procedures withunequal numbers of trials at each level The reason for thislimitation is that the SK method does not offer a simpleway to weight the different samples Let us consider twosituations with unequal weighting adaptive methods and2AFC detection tasks

Adaptive methods place trials near threshold but withvery uneven distribution of number of trials at differentlevels The SK method gives equal weighting to sparselypopulated levels that can be highly variable thus con-tributing to a greater variance of parameter estimates Letus illustrate this point for an extreme case with five levelss 5 [1 2 3 4 5] Suppose the adaptive method placed [11 1 98 1] trials at the five levels and the number correctwere [0 1 1 49 1] giving probabilities [0 1 1 5 1] andpdf 5 [1 0 5 5] The following PEST-like (Taylor ampCreelman 1967) rules could lead to this unusual dataMove down a level if the presently tested level has greaterthan P1 correct move up three levels if the presentlytested level has less than P2 correct P1 can be anynumber less than 33 and P2 can be any number greaterthan 67 The example data would be produced by start-ing at Level 5 and getting four in a row correct followedby an error followed by a wide set possible of alternat-ing responses for which the PEST rules would leave thefourth level as the level to be tested The SK estimate of the

center of the PF is given by Equation 38 to be micro1 5 200Since a pdf with a negative probability is distasteful tosome one can monotonize the PF according to the proce-dure of Miller and Ulrich (which does include the numberof trials at each level) The new probabilities are [0 5151 51 1] with pdf 5 [51 0 0 49] The new estimate ofthreshold is micro1 5 297 However given the data with 4998correct at Level 4 an estimate that includes a weightingaccording to the number of trials at each level would placethe centroid of the distribution very close to micro1 5 4 Thuswe see that because the SK procedure ignores the numberof trials at each level it produces erroneous estimates ofthe PF properties This example with shifting thresholdswas chosen to dramatize the problems of unequal weight-ing and the effect of monotonizing

A similar problem can occur even with the method ofconstant stimuli with equal number of trials at each levelbut with unequal binomial weighting as occurs in 2AFCdata The unequal weighting occurs because of the 50lower asymptote The binomial statistics error bar of dataat 55 correct is much larger than for data at 95 cor-rect Parametric procedures such as probit analysis giveless weighting to the lower data This is why the most ef-ficient placement of trials in 2AFC is above 90 correctas I have discussed in Section II Thus it is expected thatfor 2AFC the SK method would give threshold estimatesless precise than the estimates given by a parametricmethod that includes weighting

I originally thought that the SK method would fail on alltypes of 2AFC data However in the discussion followingEquation 7 I pointed out that for 2AFC discriminationtasks there is a way of extending the data to the 0ndash100range that produces weightings similar to those for a yesnotask Rather than use the standard Equation 2 to deal withthe 50 lower asymptote one should use the method dis-cussed in connection with Figure 3 in which one can usefor the ordinate the percent of correct responses in Inter-val 2 and use the signal strength in Interval 2 minus thatin Interval 1 for the abscissa That procedure produces aPF that goes from 0 to 100 thereby making it suit-able for SK analysis while also removing a possible bias

Now let us examine the SK method as applied to datafor which it is expected to work such as discriminationPFs that go from 0 to 100 These are tasks in which thelocation of the psychometric function is the PSE and theslope is the threshold

A potentially important claim by Miller and Ulrich(2001) is that the SK method performs better than probitanalysis This is a surprising claim since their data aregenerated by a probit PF one would expect probit analy-sis to do well especially since probit analysis does espe-cially well with discrimination PFs that go from 0 to 100To help clarify this issue Miller and I did a number ofsimulations We focused on the case of N 5 40 with 8 tri-als at each of 5 levels The cumulative normal PF (Equa-tion 14) was used with equally spaced z scores and with thelowest and highest probabilities being 1 and 99 Theprobabilities at the sample points are P(i) 5 1 1222

m1

2 21

2

11

2

=

= [ ] +

aring

aring

pdf( )( ) ( )

( ) ( )( ) ( )

is i s i

P i P is i s i

MEASURING THE PSYCHOMETRIC FUNCTION 1441

50 8778 and 99 corresponding to z scores ofz(i) 5 ndash2328 1164 0 1164 and 2328 Nonmonot-onicity is eliminated as Miller and Ulrich suggest (seethe Appendix for Matlab Code 4 for monotonizing) I didMonte Carlo simulations to obtain the standard deviationof the location of the PF micro1 given by Equation38 I foundthat both the SK method and the parametric probit method(assuming a fixed unity slope) gave a standard deviationof between 028 and 029 Miller and Ulrichrsquos result re-ported in their Table 17 gives a standard error of 0071However their Gaussian had a standard deviation of 025whereas my z-score units correspond to a standard devi-ation of unity Thus their standard error must be multipliedby 4 giving a standard error of 0284 in excellent agree-ment with my value So at least in this instance the SKmethod did not do better than the parametric method (withslope fixed) and the surprise mentioned at the beginningof this paragraph did not materialize I did not examineother examples where nonparametric methods might dobetter than the matched parametric method However thesimulations of McKee et al (1985) give some insight intowhy probit analysis with floating slope can do poorly evenon data generated from probit PFs McKee et al showthat when the number of trials is low and if stimulus lev-els near both 0 and 100 are not tested the slope can bevery flat and the predicted threshold can be quite distantIf care is not taken to place a bound on these probit esti-mates one would expect to find deviant threshold esti-mates The SK method on the other hand has built-inbounds for threshold estimates so outliers are avoidedThese bounds could explain why SK can do better thanprobit When the PF slope is uncertain nonparametricmethods might work better than parametric methods adecided advantage for the SK method

It is instructive to also do an analytic calculation of thevariance based on an analysis similar to what was doneearlier If only the ith level of the five had been testedthe variance of the PF location (the PSE in a discrimina-tion task) would have been as follows (Gourevitch ampGalanter 1967)

(39)

where var(P) is given by Equation 27 and PF slope is

(40)

where for the cumulative normal dzi dxi 5 1s where sis the standard deviation of the pdf and

(41)

from Equation 3A By combining these equations one gets

(42)

Note that if all trials were placed at the optimal stimuluslocation of z 5 0 (P 5 5) Equation 42 would becomes2p 2Ni as discussed earlier When all five levels arecombined the total variance is smaller than each individ-ual variance as given by the variance summation formula

(43)

Finally upon plugging in the five values of zi rangingfrom ndash2328 to 12328 and Pi ranging from 1 to 99the standard error of the PF location is

(44)

This value is precisely the value obtained in our MonteCarlo simulations of the nonparametric SK method andalso by the slope fixed parametric probit analysis Thistriple confirmation shows that the SK method is excel-lent in the realm where it applies (equal number of trialsat levels that span 0 to 100) It also confirms our sim-ple analytic formula

The data presented in Miller and Ulrichrsquos Table 17 giveus a fine opportunity to ask about the efficiency of themethod of constant stimuli that they use For the N 5 40example discussed here the variance is

(45)

This value is more than twice the variance of p (2 Ntot)that is obtained for the same PF using adaptive methodsincluding the simple upndashdown staircase that place trialsnear the optimal point

The large variance for the Miller and Ulrich stimulusplacement is not surprising given that of the five stimu-lus levels the two at P 5 1 and 99 are quite inefficientThe inefficient placement of trials is a common problemfor the method of constant stimuli as opposed to adaptivemethods However as I mentioned earlier the method ofconstant stimuli combined with multiple response ratingsin a signal detection framework may be the best of allmethods for detection studies for revealing properties of thesystem near threshold while maintaining a high efficiency

A curious question comes up about the optimal placementof trials for measuring the location of the psychometric func-tion With the original function the optimal placement is atthe steepest part of the function (at P 5 0) However withthe SK method when one is determining the location ofthe pdf one might think that the optimal placement of sam-ples should occur at the points where the pdf is steepestThis contradiction is resolved when one realizes that onlythe original samples are independent The pdf samples areobtained by taking differences of adjacent PF values soneighboring samples are anticorrelated Thus one cannot

var tottot

= =SEN

2 3 24

SE = =var tot12 0 2844

var var tot = aring1 1i

var( )exp

PSE i i i

i

i

P Pz

N= ( ) ( )

2 12

2

ps

dP

dz

zi

i

iaeligegraveccedil

oumloslashdivide

=( )2 2

2

exp

p

dP

dx

dP

dz

dz

dxi

i

i

i

i

i

= times

var( )var

PSE i

i

i

i

P

dP

dx

=( )

aeligegraveccedil

oumloslashdivide

2

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

MEASURING THE PSYCHOMETRIC FUNCTION 1439

III DATA-FITTING METHODS FOR ESTIMATING THRESHOLD AND SLOPEPLUS GOODNESS-OF-FIT ESTIMATION

This final section of the paper is concerned with whatto do with the data after they have been collected Wehave here the standard Bayesian situation Given the PFit is easy to generate samples of data with confidence in-tervals But we have the inverse situation in which we aregiven the data and need to estimate properties of the PFand get confidence limits for those estimates The pair ofarticles in this special issue by Wichmann and Hill (2001a2001b) as well as the articles by King-Smith et al (1994)Treutwein (1995) and Treutwein and Strasburger (1999)do an outstanding job of introducing the issues and providemany important insights For example Emerson (1986)and King-Smith et al emphasize the advantages of meanlikelihood over maximum likelihood The speed of mod-ern computers makes it just as easy to calculate meanlikelihood as it is to calculate maximum likelihood As an-other example Treutwein and Strasburger (1999) demon-strate the power of using Bayesian priors for estimatingparameters by showing how one can estimate all four pa-rameters of Equation 1A in fewer than 100 trials This im-portant finding deserves further examination

In this section I will focus on four items First I will dis-cuss the relative merits of parametric versus nonparamet-ric methods for estimating the PF properties The paperby Miller and Ulrich (2001) provides a detailed analysis ofa nonparametric method for obtaining all the moments ofthe PF I will discuss some advantages and limitations oftheir method Second is the question of goodness-of-fitWichmann and Hill (2001a) show that when the numberof trials is small the goodness-of-fit can be very biasedI will examine the cause of that bias Third is the issue dis-cussed by Wichmann and Hill (2001a) that of how lapseseffect slope estimates This topic is relevant to Strasburg-errsquos (2001b) data and analysis Finally there is the possiblycontroversial topic of an upward slope bias in adaptivemethods that is relevant to the papers of Kaernbach (2001b)and Strasburger (2001b)

Parametric Versus Nonparametric FitsThe main split between methods for estimating pa-

rameters of the psychometric function is the distinctionbetween parametric and nonparametric methods In aparametric method one uses a PF that involves severalparameters and uses the data to estimate the parametersFor example one might know the shape of the PF and es-timate just the threshold parameter McKee et al (1985)show how slope uncertainty affects threshold uncertaintyin probit analysis The two papers by Wichmann and Hill(2001a 2001b) are an excellent introduction to many is-sues in parametric fitting An example of a nonparamet-ric method for estimating threshold would be to averagethe levels of a Brownian staircase after excluding the firstfour or so reversals in order to avoid the starting level biasBefore I served as guest editor of this special symposium

issue I believed that if one had prior knowledge of the exactPF shape a parametric method for estimating threshold wasprobably better than nonparametric methods After read-ing the articles presented here as well as carrying out myrecent simulations I no longer believe this I think thatin the proper situation nonparametric methods are as goodas parametric methods Furthermore there are situationsin which the shape of the PF is unknown and nonparamet-ric methods can be better

Miller and Ulrichrsquos Article on the Nonparametric SpearmanndashKaumlrber Method of Analysis

The standard way to extract information from psycho-metric data is to assume a functional shape such as a Wei-bull cumulative normal logit dcent and then vary the pa-rameters until likelihood is maximized or chi-square isminimized Miller and Ulrich (2001) propose using theSpearmanndashKaumlrber (SK) nonparametric method whichdoes not require a functional shape They start with a prob-ability density function defined by them as the derivativeof the PF

(33)

where s(i) is the stimulus intensity at the ith level andP(i) is the probability correct at the that level The notationpdf(i) is meant to indicate that it is the pdf for the intervalbetween i ndash 1 and i Miller and Ulrich give the r th momentof this distribution as their Equation 3

(34)

The next four equations are my attempt to clarify their ap-proach I do this because when applicable it is a finemethod that deserves more attention Equation 34 prob-ably came from the mean value theorem of calculus

(35)

where f (s) 5 sr11 and f cent(smid) 5 (r 1 1)smidr is the deriv-

ative of f with respect to s somewhere within the intervalbetween s(i 1) and s(i) For a slowly varying functionsmid would be near the midpoint Given Equation35 Equa-tion 34 becomes

(36)

where Ds 5 s(i) s(i 1) Equation 36 is what one wouldmean by the rth moment of the distribution The case r 50 is important since it gives the area under the pdf

(37)

by using the definition in Equation 33 The summation inEquation 37 gives the area under the pdf For it to be aproper pdf its area must be unity which means that theprobability at the highest level P(imax) must be unity andthe probability at the lowest level Pmin must be zero asMiller and Ulrich (2001) point out

m0 = =aringpdf( ) ( ) ( ) max mini s P i P ii

D

mrr

i

i s s= aring pdf mid( ) D

f s i f s i f s s i s i( ) ( ) ( ) ( ) ( ) [ ] [ ] = cent [ ]1 1mid

mrr r

ri s i s i=

+ [ ]aring + +11

11 1pdf( ) ( ) ( )

pdf( )( ) ( )

( ) ( )i

P i P i

s i s i= 1

1

1440 KLEIN

The next simplest SK moment is for r 5 1

(38)

from Equation 33 This first moment provides an esti-mate of the centroid of the distribution corresponding tothe center of the psychometric function It could be usedas an estimate of threshold for detection or PSE for dis-crimination An alternate estimator would be the medianof the pdf corresponding to the 50 point of the PF (theproper definition of PSE for discrimination)

Miller and Ulrich (2001) use Monte Carlo simulationsto claim that the SK method does a better job of extract-ing properties of the psychometric function than do para-metric methods One must be a bit careful here since intheir parametric fitting they generate the data by usingone PF and then fit it with other PFs But still it is a sur-prising claim if this simple method is so good why havepeople not been using it for years There are two possibleanswers to this question One is that people never thoughtof using it before Miller and Ulrich The other possibil-ity is that it has problems I have concluded that the an-swer is a combination of both factors I think that there isan important place for the SK approach but its limitationsmust be understood That is my next topic

The most important limitation of the SK nonparamet-ric method is that it has only been shown to work well formethod of constant stimulus data in which the PF goesfrom 0 to 1 I question whether it can be applied with thesame confidence to 2AFC and to adaptive procedures withunequal numbers of trials at each level The reason for thislimitation is that the SK method does not offer a simpleway to weight the different samples Let us consider twosituations with unequal weighting adaptive methods and2AFC detection tasks

Adaptive methods place trials near threshold but withvery uneven distribution of number of trials at differentlevels The SK method gives equal weighting to sparselypopulated levels that can be highly variable thus con-tributing to a greater variance of parameter estimates Letus illustrate this point for an extreme case with five levelss 5 [1 2 3 4 5] Suppose the adaptive method placed [11 1 98 1] trials at the five levels and the number correctwere [0 1 1 49 1] giving probabilities [0 1 1 5 1] andpdf 5 [1 0 5 5] The following PEST-like (Taylor ampCreelman 1967) rules could lead to this unusual dataMove down a level if the presently tested level has greaterthan P1 correct move up three levels if the presentlytested level has less than P2 correct P1 can be anynumber less than 33 and P2 can be any number greaterthan 67 The example data would be produced by start-ing at Level 5 and getting four in a row correct followedby an error followed by a wide set possible of alternat-ing responses for which the PEST rules would leave thefourth level as the level to be tested The SK estimate of the

center of the PF is given by Equation 38 to be micro1 5 200Since a pdf with a negative probability is distasteful tosome one can monotonize the PF according to the proce-dure of Miller and Ulrich (which does include the numberof trials at each level) The new probabilities are [0 5151 51 1] with pdf 5 [51 0 0 49] The new estimate ofthreshold is micro1 5 297 However given the data with 4998correct at Level 4 an estimate that includes a weightingaccording to the number of trials at each level would placethe centroid of the distribution very close to micro1 5 4 Thuswe see that because the SK procedure ignores the numberof trials at each level it produces erroneous estimates ofthe PF properties This example with shifting thresholdswas chosen to dramatize the problems of unequal weight-ing and the effect of monotonizing

A similar problem can occur even with the method ofconstant stimuli with equal number of trials at each levelbut with unequal binomial weighting as occurs in 2AFCdata The unequal weighting occurs because of the 50lower asymptote The binomial statistics error bar of dataat 55 correct is much larger than for data at 95 cor-rect Parametric procedures such as probit analysis giveless weighting to the lower data This is why the most ef-ficient placement of trials in 2AFC is above 90 correctas I have discussed in Section II Thus it is expected thatfor 2AFC the SK method would give threshold estimatesless precise than the estimates given by a parametricmethod that includes weighting

I originally thought that the SK method would fail on alltypes of 2AFC data However in the discussion followingEquation 7 I pointed out that for 2AFC discriminationtasks there is a way of extending the data to the 0ndash100range that produces weightings similar to those for a yesnotask Rather than use the standard Equation 2 to deal withthe 50 lower asymptote one should use the method dis-cussed in connection with Figure 3 in which one can usefor the ordinate the percent of correct responses in Inter-val 2 and use the signal strength in Interval 2 minus thatin Interval 1 for the abscissa That procedure produces aPF that goes from 0 to 100 thereby making it suit-able for SK analysis while also removing a possible bias

Now let us examine the SK method as applied to datafor which it is expected to work such as discriminationPFs that go from 0 to 100 These are tasks in which thelocation of the psychometric function is the PSE and theslope is the threshold

A potentially important claim by Miller and Ulrich(2001) is that the SK method performs better than probitanalysis This is a surprising claim since their data aregenerated by a probit PF one would expect probit analy-sis to do well especially since probit analysis does espe-cially well with discrimination PFs that go from 0 to 100To help clarify this issue Miller and I did a number ofsimulations We focused on the case of N 5 40 with 8 tri-als at each of 5 levels The cumulative normal PF (Equa-tion 14) was used with equally spaced z scores and with thelowest and highest probabilities being 1 and 99 Theprobabilities at the sample points are P(i) 5 1 1222

m1

2 21

2

11

2

=

= [ ] +

aring

aring

pdf( )( ) ( )

( ) ( )( ) ( )

is i s i

P i P is i s i

MEASURING THE PSYCHOMETRIC FUNCTION 1441

50 8778 and 99 corresponding to z scores ofz(i) 5 ndash2328 1164 0 1164 and 2328 Nonmonot-onicity is eliminated as Miller and Ulrich suggest (seethe Appendix for Matlab Code 4 for monotonizing) I didMonte Carlo simulations to obtain the standard deviationof the location of the PF micro1 given by Equation38 I foundthat both the SK method and the parametric probit method(assuming a fixed unity slope) gave a standard deviationof between 028 and 029 Miller and Ulrichrsquos result re-ported in their Table 17 gives a standard error of 0071However their Gaussian had a standard deviation of 025whereas my z-score units correspond to a standard devi-ation of unity Thus their standard error must be multipliedby 4 giving a standard error of 0284 in excellent agree-ment with my value So at least in this instance the SKmethod did not do better than the parametric method (withslope fixed) and the surprise mentioned at the beginningof this paragraph did not materialize I did not examineother examples where nonparametric methods might dobetter than the matched parametric method However thesimulations of McKee et al (1985) give some insight intowhy probit analysis with floating slope can do poorly evenon data generated from probit PFs McKee et al showthat when the number of trials is low and if stimulus lev-els near both 0 and 100 are not tested the slope can bevery flat and the predicted threshold can be quite distantIf care is not taken to place a bound on these probit esti-mates one would expect to find deviant threshold esti-mates The SK method on the other hand has built-inbounds for threshold estimates so outliers are avoidedThese bounds could explain why SK can do better thanprobit When the PF slope is uncertain nonparametricmethods might work better than parametric methods adecided advantage for the SK method

It is instructive to also do an analytic calculation of thevariance based on an analysis similar to what was doneearlier If only the ith level of the five had been testedthe variance of the PF location (the PSE in a discrimina-tion task) would have been as follows (Gourevitch ampGalanter 1967)

(39)

where var(P) is given by Equation 27 and PF slope is

(40)

where for the cumulative normal dzi dxi 5 1s where sis the standard deviation of the pdf and

(41)

from Equation 3A By combining these equations one gets

(42)

Note that if all trials were placed at the optimal stimuluslocation of z 5 0 (P 5 5) Equation 42 would becomes2p 2Ni as discussed earlier When all five levels arecombined the total variance is smaller than each individ-ual variance as given by the variance summation formula

(43)

Finally upon plugging in the five values of zi rangingfrom ndash2328 to 12328 and Pi ranging from 1 to 99the standard error of the PF location is

(44)

This value is precisely the value obtained in our MonteCarlo simulations of the nonparametric SK method andalso by the slope fixed parametric probit analysis Thistriple confirmation shows that the SK method is excel-lent in the realm where it applies (equal number of trialsat levels that span 0 to 100) It also confirms our sim-ple analytic formula

The data presented in Miller and Ulrichrsquos Table 17 giveus a fine opportunity to ask about the efficiency of themethod of constant stimuli that they use For the N 5 40example discussed here the variance is

(45)

This value is more than twice the variance of p (2 Ntot)that is obtained for the same PF using adaptive methodsincluding the simple upndashdown staircase that place trialsnear the optimal point

The large variance for the Miller and Ulrich stimulusplacement is not surprising given that of the five stimu-lus levels the two at P 5 1 and 99 are quite inefficientThe inefficient placement of trials is a common problemfor the method of constant stimuli as opposed to adaptivemethods However as I mentioned earlier the method ofconstant stimuli combined with multiple response ratingsin a signal detection framework may be the best of allmethods for detection studies for revealing properties of thesystem near threshold while maintaining a high efficiency

A curious question comes up about the optimal placementof trials for measuring the location of the psychometric func-tion With the original function the optimal placement is atthe steepest part of the function (at P 5 0) However withthe SK method when one is determining the location ofthe pdf one might think that the optimal placement of sam-ples should occur at the points where the pdf is steepestThis contradiction is resolved when one realizes that onlythe original samples are independent The pdf samples areobtained by taking differences of adjacent PF values soneighboring samples are anticorrelated Thus one cannot

var tottot

= =SEN

2 3 24

SE = =var tot12 0 2844

var var tot = aring1 1i

var( )exp

PSE i i i

i

i

P Pz

N= ( ) ( )

2 12

2

ps

dP

dz

zi

i

iaeligegraveccedil

oumloslashdivide

=( )2 2

2

exp

p

dP

dx

dP

dz

dz

dxi

i

i

i

i

i

= times

var( )var

PSE i

i

i

i

P

dP

dx

=( )

aeligegraveccedil

oumloslashdivide

2

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

1440 KLEIN

The next simplest SK moment is for r 5 1

(38)

from Equation 33 This first moment provides an esti-mate of the centroid of the distribution corresponding tothe center of the psychometric function It could be usedas an estimate of threshold for detection or PSE for dis-crimination An alternate estimator would be the medianof the pdf corresponding to the 50 point of the PF (theproper definition of PSE for discrimination)

Miller and Ulrich (2001) use Monte Carlo simulationsto claim that the SK method does a better job of extract-ing properties of the psychometric function than do para-metric methods One must be a bit careful here since intheir parametric fitting they generate the data by usingone PF and then fit it with other PFs But still it is a sur-prising claim if this simple method is so good why havepeople not been using it for years There are two possibleanswers to this question One is that people never thoughtof using it before Miller and Ulrich The other possibil-ity is that it has problems I have concluded that the an-swer is a combination of both factors I think that there isan important place for the SK approach but its limitationsmust be understood That is my next topic

The most important limitation of the SK nonparamet-ric method is that it has only been shown to work well formethod of constant stimulus data in which the PF goesfrom 0 to 1 I question whether it can be applied with thesame confidence to 2AFC and to adaptive procedures withunequal numbers of trials at each level The reason for thislimitation is that the SK method does not offer a simpleway to weight the different samples Let us consider twosituations with unequal weighting adaptive methods and2AFC detection tasks

Adaptive methods place trials near threshold but withvery uneven distribution of number of trials at differentlevels The SK method gives equal weighting to sparselypopulated levels that can be highly variable thus con-tributing to a greater variance of parameter estimates Letus illustrate this point for an extreme case with five levelss 5 [1 2 3 4 5] Suppose the adaptive method placed [11 1 98 1] trials at the five levels and the number correctwere [0 1 1 49 1] giving probabilities [0 1 1 5 1] andpdf 5 [1 0 5 5] The following PEST-like (Taylor ampCreelman 1967) rules could lead to this unusual dataMove down a level if the presently tested level has greaterthan P1 correct move up three levels if the presentlytested level has less than P2 correct P1 can be anynumber less than 33 and P2 can be any number greaterthan 67 The example data would be produced by start-ing at Level 5 and getting four in a row correct followedby an error followed by a wide set possible of alternat-ing responses for which the PEST rules would leave thefourth level as the level to be tested The SK estimate of the

center of the PF is given by Equation 38 to be micro1 5 200Since a pdf with a negative probability is distasteful tosome one can monotonize the PF according to the proce-dure of Miller and Ulrich (which does include the numberof trials at each level) The new probabilities are [0 5151 51 1] with pdf 5 [51 0 0 49] The new estimate ofthreshold is micro1 5 297 However given the data with 4998correct at Level 4 an estimate that includes a weightingaccording to the number of trials at each level would placethe centroid of the distribution very close to micro1 5 4 Thuswe see that because the SK procedure ignores the numberof trials at each level it produces erroneous estimates ofthe PF properties This example with shifting thresholdswas chosen to dramatize the problems of unequal weight-ing and the effect of monotonizing

A similar problem can occur even with the method ofconstant stimuli with equal number of trials at each levelbut with unequal binomial weighting as occurs in 2AFCdata The unequal weighting occurs because of the 50lower asymptote The binomial statistics error bar of dataat 55 correct is much larger than for data at 95 cor-rect Parametric procedures such as probit analysis giveless weighting to the lower data This is why the most ef-ficient placement of trials in 2AFC is above 90 correctas I have discussed in Section II Thus it is expected thatfor 2AFC the SK method would give threshold estimatesless precise than the estimates given by a parametricmethod that includes weighting

I originally thought that the SK method would fail on alltypes of 2AFC data However in the discussion followingEquation 7 I pointed out that for 2AFC discriminationtasks there is a way of extending the data to the 0ndash100range that produces weightings similar to those for a yesnotask Rather than use the standard Equation 2 to deal withthe 50 lower asymptote one should use the method dis-cussed in connection with Figure 3 in which one can usefor the ordinate the percent of correct responses in Inter-val 2 and use the signal strength in Interval 2 minus thatin Interval 1 for the abscissa That procedure produces aPF that goes from 0 to 100 thereby making it suit-able for SK analysis while also removing a possible bias

Now let us examine the SK method as applied to datafor which it is expected to work such as discriminationPFs that go from 0 to 100 These are tasks in which thelocation of the psychometric function is the PSE and theslope is the threshold

A potentially important claim by Miller and Ulrich(2001) is that the SK method performs better than probitanalysis This is a surprising claim since their data aregenerated by a probit PF one would expect probit analy-sis to do well especially since probit analysis does espe-cially well with discrimination PFs that go from 0 to 100To help clarify this issue Miller and I did a number ofsimulations We focused on the case of N 5 40 with 8 tri-als at each of 5 levels The cumulative normal PF (Equa-tion 14) was used with equally spaced z scores and with thelowest and highest probabilities being 1 and 99 Theprobabilities at the sample points are P(i) 5 1 1222

m1

2 21

2

11

2

=

= [ ] +

aring

aring

pdf( )( ) ( )

( ) ( )( ) ( )

is i s i

P i P is i s i

MEASURING THE PSYCHOMETRIC FUNCTION 1441

50 8778 and 99 corresponding to z scores ofz(i) 5 ndash2328 1164 0 1164 and 2328 Nonmonot-onicity is eliminated as Miller and Ulrich suggest (seethe Appendix for Matlab Code 4 for monotonizing) I didMonte Carlo simulations to obtain the standard deviationof the location of the PF micro1 given by Equation38 I foundthat both the SK method and the parametric probit method(assuming a fixed unity slope) gave a standard deviationof between 028 and 029 Miller and Ulrichrsquos result re-ported in their Table 17 gives a standard error of 0071However their Gaussian had a standard deviation of 025whereas my z-score units correspond to a standard devi-ation of unity Thus their standard error must be multipliedby 4 giving a standard error of 0284 in excellent agree-ment with my value So at least in this instance the SKmethod did not do better than the parametric method (withslope fixed) and the surprise mentioned at the beginningof this paragraph did not materialize I did not examineother examples where nonparametric methods might dobetter than the matched parametric method However thesimulations of McKee et al (1985) give some insight intowhy probit analysis with floating slope can do poorly evenon data generated from probit PFs McKee et al showthat when the number of trials is low and if stimulus lev-els near both 0 and 100 are not tested the slope can bevery flat and the predicted threshold can be quite distantIf care is not taken to place a bound on these probit esti-mates one would expect to find deviant threshold esti-mates The SK method on the other hand has built-inbounds for threshold estimates so outliers are avoidedThese bounds could explain why SK can do better thanprobit When the PF slope is uncertain nonparametricmethods might work better than parametric methods adecided advantage for the SK method

It is instructive to also do an analytic calculation of thevariance based on an analysis similar to what was doneearlier If only the ith level of the five had been testedthe variance of the PF location (the PSE in a discrimina-tion task) would have been as follows (Gourevitch ampGalanter 1967)

(39)

where var(P) is given by Equation 27 and PF slope is

(40)

where for the cumulative normal dzi dxi 5 1s where sis the standard deviation of the pdf and

(41)

from Equation 3A By combining these equations one gets

(42)

Note that if all trials were placed at the optimal stimuluslocation of z 5 0 (P 5 5) Equation 42 would becomes2p 2Ni as discussed earlier When all five levels arecombined the total variance is smaller than each individ-ual variance as given by the variance summation formula

(43)

Finally upon plugging in the five values of zi rangingfrom ndash2328 to 12328 and Pi ranging from 1 to 99the standard error of the PF location is

(44)

This value is precisely the value obtained in our MonteCarlo simulations of the nonparametric SK method andalso by the slope fixed parametric probit analysis Thistriple confirmation shows that the SK method is excel-lent in the realm where it applies (equal number of trialsat levels that span 0 to 100) It also confirms our sim-ple analytic formula

The data presented in Miller and Ulrichrsquos Table 17 giveus a fine opportunity to ask about the efficiency of themethod of constant stimuli that they use For the N 5 40example discussed here the variance is

(45)

This value is more than twice the variance of p (2 Ntot)that is obtained for the same PF using adaptive methodsincluding the simple upndashdown staircase that place trialsnear the optimal point

The large variance for the Miller and Ulrich stimulusplacement is not surprising given that of the five stimu-lus levels the two at P 5 1 and 99 are quite inefficientThe inefficient placement of trials is a common problemfor the method of constant stimuli as opposed to adaptivemethods However as I mentioned earlier the method ofconstant stimuli combined with multiple response ratingsin a signal detection framework may be the best of allmethods for detection studies for revealing properties of thesystem near threshold while maintaining a high efficiency

A curious question comes up about the optimal placementof trials for measuring the location of the psychometric func-tion With the original function the optimal placement is atthe steepest part of the function (at P 5 0) However withthe SK method when one is determining the location ofthe pdf one might think that the optimal placement of sam-ples should occur at the points where the pdf is steepestThis contradiction is resolved when one realizes that onlythe original samples are independent The pdf samples areobtained by taking differences of adjacent PF values soneighboring samples are anticorrelated Thus one cannot

var tottot

= =SEN

2 3 24

SE = =var tot12 0 2844

var var tot = aring1 1i

var( )exp

PSE i i i

i

i

P Pz

N= ( ) ( )

2 12

2

ps

dP

dz

zi

i

iaeligegraveccedil

oumloslashdivide

=( )2 2

2

exp

p

dP

dx

dP

dz

dz

dxi

i

i

i

i

i

= times

var( )var

PSE i

i

i

i

P

dP

dx

=( )

aeligegraveccedil

oumloslashdivide

2

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

MEASURING THE PSYCHOMETRIC FUNCTION 1441

50 8778 and 99 corresponding to z scores ofz(i) 5 ndash2328 1164 0 1164 and 2328 Nonmonot-onicity is eliminated as Miller and Ulrich suggest (seethe Appendix for Matlab Code 4 for monotonizing) I didMonte Carlo simulations to obtain the standard deviationof the location of the PF micro1 given by Equation38 I foundthat both the SK method and the parametric probit method(assuming a fixed unity slope) gave a standard deviationof between 028 and 029 Miller and Ulrichrsquos result re-ported in their Table 17 gives a standard error of 0071However their Gaussian had a standard deviation of 025whereas my z-score units correspond to a standard devi-ation of unity Thus their standard error must be multipliedby 4 giving a standard error of 0284 in excellent agree-ment with my value So at least in this instance the SKmethod did not do better than the parametric method (withslope fixed) and the surprise mentioned at the beginningof this paragraph did not materialize I did not examineother examples where nonparametric methods might dobetter than the matched parametric method However thesimulations of McKee et al (1985) give some insight intowhy probit analysis with floating slope can do poorly evenon data generated from probit PFs McKee et al showthat when the number of trials is low and if stimulus lev-els near both 0 and 100 are not tested the slope can bevery flat and the predicted threshold can be quite distantIf care is not taken to place a bound on these probit esti-mates one would expect to find deviant threshold esti-mates The SK method on the other hand has built-inbounds for threshold estimates so outliers are avoidedThese bounds could explain why SK can do better thanprobit When the PF slope is uncertain nonparametricmethods might work better than parametric methods adecided advantage for the SK method

It is instructive to also do an analytic calculation of thevariance based on an analysis similar to what was doneearlier If only the ith level of the five had been testedthe variance of the PF location (the PSE in a discrimina-tion task) would have been as follows (Gourevitch ampGalanter 1967)

(39)

where var(P) is given by Equation 27 and PF slope is

(40)

where for the cumulative normal dzi dxi 5 1s where sis the standard deviation of the pdf and

(41)

from Equation 3A By combining these equations one gets

(42)

Note that if all trials were placed at the optimal stimuluslocation of z 5 0 (P 5 5) Equation 42 would becomes2p 2Ni as discussed earlier When all five levels arecombined the total variance is smaller than each individ-ual variance as given by the variance summation formula

(43)

Finally upon plugging in the five values of zi rangingfrom ndash2328 to 12328 and Pi ranging from 1 to 99the standard error of the PF location is

(44)

This value is precisely the value obtained in our MonteCarlo simulations of the nonparametric SK method andalso by the slope fixed parametric probit analysis Thistriple confirmation shows that the SK method is excel-lent in the realm where it applies (equal number of trialsat levels that span 0 to 100) It also confirms our sim-ple analytic formula

The data presented in Miller and Ulrichrsquos Table 17 giveus a fine opportunity to ask about the efficiency of themethod of constant stimuli that they use For the N 5 40example discussed here the variance is

(45)

This value is more than twice the variance of p (2 Ntot)that is obtained for the same PF using adaptive methodsincluding the simple upndashdown staircase that place trialsnear the optimal point

The large variance for the Miller and Ulrich stimulusplacement is not surprising given that of the five stimu-lus levels the two at P 5 1 and 99 are quite inefficientThe inefficient placement of trials is a common problemfor the method of constant stimuli as opposed to adaptivemethods However as I mentioned earlier the method ofconstant stimuli combined with multiple response ratingsin a signal detection framework may be the best of allmethods for detection studies for revealing properties of thesystem near threshold while maintaining a high efficiency

A curious question comes up about the optimal placementof trials for measuring the location of the psychometric func-tion With the original function the optimal placement is atthe steepest part of the function (at P 5 0) However withthe SK method when one is determining the location ofthe pdf one might think that the optimal placement of sam-ples should occur at the points where the pdf is steepestThis contradiction is resolved when one realizes that onlythe original samples are independent The pdf samples areobtained by taking differences of adjacent PF values soneighboring samples are anticorrelated Thus one cannot

var tottot

= =SEN

2 3 24

SE = =var tot12 0 2844

var var tot = aring1 1i

var( )exp

PSE i i i

i

i

P Pz

N= ( ) ( )

2 12

2

ps

dP

dz

zi

i

iaeligegraveccedil

oumloslashdivide

=( )2 2

2

exp

p

dP

dx

dP

dz

dz

dxi

i

i

i

i

i

= times

var( )var

PSE i

i

i

i

P

dP

dx

=( )

aeligegraveccedil

oumloslashdivide

2

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

1442 KLEIN

claim that for estimating the PSE the optimal placementof trials is at the steep portion of the pdf

Wichmann and Hillrsquos Biased Goodness-of-FitWhenever I fit a psychometric function to data I also

calculate the goodness-of-fit The second half of Wich-mann and Hill (2001a) is devoted to that topic and I havesome comments on their findings There are two standardmethods for calculating the goodness-of-fit (Press Teu-kolsky Vetterling amp Flannery 1992) The first method isto calculate the X 2 statistic as given by

(46)

where p (datai) and Pi are the percent correct of the datumand the PF at level i The binomial var(Pi ) is our old friend

(47)

where Ni is the number of trials at level i The X 2 is of in-terest because the binomial distribution is asymptoticallya Gaussian so that the X 2 statistic asymptotically has a chi-square distribution Alternatively one can write X 2 as

(48)

where the residuals are the difference between the pre-dicted and observed probabilities divided by the standarderror of the predicted probabilities SEi 5 [var(Pi )]05

(49)

The second method is to use the normalized log-likelihood that Wichmann and Hill call the deviance D Iwill use the letters LL to remind the reader that I am deal-ing with log likelihood

(50)

Similar to the X 2 statistic LL asymptotically also has a chi-square distribution

In order to calculate the goodness-of-fit from the chi-square statistic one usually makes the assumption that weare dealing with a linear model for Pi A linear modelmeans that the various parameters controlling the shape ofthe psychometric function (like a b g in Equation 1 forthe Weibull function) should contribute linearly HoweverEquations 1 8 and 12 show that only g contributes linearlyEven though the PF function is not linear in its parametersone typically makes the linearity assumption anyway inorder to use the chi-square goodness-of-fit tables The pur-pose of this section is to examine the validity of the linear-ity assumption

The chi-square goodness-of-fit distribution for a linearmodel is tabulated in most statistics books It is given bythe Gamma function (Press et al 1992)

(51)

where the degrees of freedom df is the number of datapoints minus the number of parameters The Gamma func-tion is a standard function (called Gammainc in Matlab)In order to check that the Gamma function is functioningproperly I often check that for a reasonably large numberof degrees of freedom (like df gt10) the chi-square distri-bution is close to Gaussian with a mean df and the stan-dard deviation (2df )05 As an example for df 5 18 wehave Gamma(9 9) 5 054 which is very close to the as-ymptotic value of 05 To check the standard deviation wehave Gamma [182 (18 1 6)2] 5 845 which is indeedclose to the value of 0841 expected for a unity z-score

With a sparse number of trials or with probabilities nearunity or with nonlinear models (all of which occur in fit-ting psychometric functions) one might worry about thevalidity of using the chi-square distribution (from standardtables or Equation 51) Monte Carlo simulations would bemore trustworthy That is what Wichmann and Hill (2001a)use for the log likelihood deviance LL In Monte Carlosimulations one replicates the experimental conditionsand uses different randomly generated data sets based onbinomial statistics Wichmann and Hill do 10000 runs foreach condition They calculate LL for each run and thenplot a histogram of the Monte Carlo simulations for com-parison with the expected chi-square given by Equa-tion 50 based on the chi-square distribution

Their most surprising finding is the strong bias displayedin their Figures 7 and 8 Their solid line is the chi-squaredistribution from Equation 51 Both Figures 7a and 8a arefor a total of 120 trials per run with 60 levels and 2 trialsfor each level In Figure 7a with a strong upward bias thetrials span the probability interval [052 085] in Fig-ure 8a with a strong downward bias the interval is [072099] That relatively innocuous shift from testing at lowerprobabilities to testing at higher probabilities made a dra-matic shift in the bias of the LL distribution relative to thechi-square distribution The shift perplexed me so I de-cided to investigate both X 2 and likelihood for differentnumbers of trials and different probabilities for each level

Matlab Code 2 in the Appendix is the program that cal-culates LL and X 2 for a PF ranging from P 5 50 to100 and for 1 2 4 8 or 16 trials at each level TheMatlab code calculates Equations 46 and 50 by enumer-ating all possible outcomes rather than by doing MonteCarlo simulations Consider the contribution to chi squarefrom one level in the sum of Equation 46

(52)

where Oi 5 Ni p(datai) and Ei 5 Ni Pi The mean value ofX 2

i is obtained by doing a weighted sum over all possible

Xp P

P P

N

O E

E Pi

i i

i i

i

i i

i i

2

2 2

1 1=

[ ]=

( )( )

( )

( )

data

prob_chisq Gamma= ( )df 2 22c

LL N pp

P

N pp

P

i ii

i

i

i ii

i

=

+ [ ] eacute

eumlecirc

aring2

11

( )ln( )

( ) ln( )

datadata

datadata

1

residualdata

ii i

i

P P

SE=

( )

X ii

2 2= aring residual

var PP P

Ni

i i

i( ) =

( )1

Xp P

P

i i

ii

2

2

=( )[ ]

( )aringdata

var

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

MEASURING THE PSYCHOMETRIC FUNCTION 1443

outcomes for Oi The weighting wti is the expected prob-ability from binomial statistics

(53)

The expected value lt X 2i gt is

(54)

A bit of algebra based on aringOiwti 5 1 gives a surprising

result

(55)

That is each term of the X 2 sum has an expectation valueof unity independent of Ni and independent of P Havingeach deviant contribute unity to the sum in Equation 46is desired since Ei 5NiPi is the exact value rather than afitted value based on the sampled data In Figure 4 thehorizontal line is the contribution to chi square for anyprobability level I had wrongly thought that one neededNi to be fairly big before each term contributed a unityamount on the average to X 2 It happens with any Ni

If we try the same calculation for each term of thesummation in Equation 50 for LL we have

(56)

Unfortunately there is no simple way to do the summationsas was done for X 2 so I resort to the program of MatlabCode 2 The output of the program is shown in Figure 4The five curves are for Ni 5 1 2 4 8 and 16 as indicatedIt can be seen that for low values of Pi near 5 the contri-bution of each term is greater than 1 and that for high val-ues (near 1) the contribution is less than 1 As Ni increasesthe contribution of most levels gets closer to 1 as expectedThere are however disturbing deviations from unity at highlevels of Pi An examination of the case Ni 5 2 is usefulsince it is the case for which Wichmann and Hill (2001a)showed strong biases in their Figures 7 and 8 (left-handpanels) This examination will show why Figure 4 has theshape shown

I first examine the case where the levels are biased to-ward the lower asymptote For Ni 5 2 and Pi 5 05 theweights in Equation 53 are 14 12 and 14 for Oi 5 0 1or 2 and Ei 5 Ni Pi 5 1 Equation 56 simplifies since thefactors with Oi 5 0 and Oi 5 1 vanish from the first termand the factors with Oi 5 2 and Oi 5 1 vanish from the sec-ond term The Oi 5 1 terms vanish because log(11) 5 0Equation 56 becomes

(57)

Figure 4 shows that this is precisely the contribution ofeach term at Pi 5 5 Thus if the PF levels were skewed to

the low side as in Wichmann and Hillrsquos Figure 7 (left panel)the present analysis predicts that the LL statistic wouldbe biased to the high side For the 60 levels in Figure 7(left panel) their deviance statistic was biased about 20too high which is compatible with our calculation

Now I consider the case where levels are biased near theupper asymptote For Pi 5 1 and any value of Ni theweights are zero because of the 1 Pi factor in Equa-tion 53 except for Oi 5 Ni and Ei 5 Ni Equation 56 van-ishes either because of the weight or because of the logterm in agreement with Figure 4 Figure 4 shows that forlevels of Pi 85 the contribution to chi-square is less thanunity Thus if the PF levels were skewed to the high sideas in Wichmann and Hillrsquos Figure 8 (left panel) we wouldexpect the LL statistic to be biased below unity as theyfound

I have been wondering about the relative merits of X 2

and log likelihood for many years I have always appreci-ated the simplicity and intuitive nature of X 2 but statisti-cians seem to prefer likelihood I do not doubt the argu-ments of King-Smith et al (1994) and Kontsevich andTyler (1999) who advocate using mean likelihood in a

lt gt = +[ ]= =

LLi 2 0 25 2 2 2 2 2

2 2 1 386

ln( ) ln( )

ln( )

lt gt =aeligegraveccedil

oumloslashdivide

+ ( ) aeligegraveccedil

oumloslashdivide

eacute

eumlecircaringLLi O

Oi

i

ii i

i i

i i

wt OO

EN O

N O

N Ei

i

2 ln ln

lt gt =X i2 1

lt gt =( )

( )aringX wtO E

E Pi iO

i i

i ii

2

2

1

wtN

O N OP Pi

i

i i i

iO

i

N Oi i i=

( )times ( )

1

5 6 7 8 9 10

02

04

06

08

1

12

14

1

2

4

8

N = 16

X for all N

Psychometric function level (p )

Log

likel

ihoo

d (D

) at

eac

h le

vel

2

i

Figure 4 The bias in goodness-of-fit for each level of 2AFCdata The abscissa is the probability correct at each level Pi Theordinate is the contribution to the goodness-of-fit metric (X 2 orlog likelihood deviance) In linear regression with Gaussiannoise each term of the chi-square summation is expected to givea unity contribution if the true rather than sampled parametersare used in the fit so that the expected value of chi square equalsthe number of data points With binomial noise the horizontaldashed line indicates that the X 2 metric (Equation 52) contributesexactly unity independent of the probability and independent ofthe number of trials at each level This contribution was calcu-lated by a weighted sum (Equations 53ndash54) over all possible out-comes of data rather than by doing Monte Carlo simulationsThe program that generated the data is included in the Appen-dix as Matlab Code 2 The contribution to log likelihood deviance(specified by Equation 56) is shown by the five curves labeled1ndash16 Each curve is for a specific number of trials at each levelFor example for the curve labeled 2 with 2 trials at each leveltested the contribution to deviance was greater than unity forprobability levels less than about 85 correct and was less thanunity for higher probabilities This finding explains the goodness-of-fit bias found by Wichmann and Hill (2001a Figures 7a and 8a)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

1444 KLEIN

Bayesian framework for adaptive methods in order to mea-sure the PF However for goodness-of-fit it is not clearthat the log likelihood method is better than X2 The com-mon argument in favor of likelihood is that it makes senseeven if there is only one trial at each level However myanalysis shows that the X 2 statistic is less biased than loglikelihood when there is a low number of trials per levelThis surprised me Wichmann and Hill (2001a) offer twomore arguments in favor of log likelihood over X 2 Firstthey note that the maximum likelihood parameters will notminimize X 2 This is not a bothersome objection sinceif one does a goodness-of-fit with X 2 one would also dothe parameter search by minimizing X 2 rather than max-imizing likelihood Second Wichmann and Hill claim thatlikelihood but not X 2 can be used to assess the signifi-cance of added parameters in embedded models I doubtthat claim since it is standard to use c2 for embedded mod-els (Press et al 1992) and I cannot see that X 2 shouldbe any different

Finally I will mention why goodness-of-fit considera-tions are important First if one consistently finds thatonersquos data have a poorer goodness-of-fit than do simula-tions one should carefully examine the shape of the PF fit-ting function and carefully examine the experimentalmethodology for stability of the subjectsrsquo responses Sec-ond goodness-of-fit can have a role in weighting multiplemeasurements of the same threshold If one has a reliableestimate of threshold variance multiple measurements canbe averaged by using an inverse variance weighting Thereare three ways to calculate threshold variance (1) One canuse methods that depend only on the best-fitting PF (seethe discussion of inverse of Hessian matrix in Press et al1992) The asymptotic formula such as that derived inEquations 39ndash43 provides an example for when onlythreshold is a free parameter (2) One can multiply the vari-ance of method (1) by the reduced chi-square (chi-squaredivided by the degrees of freedom) if the reduced chi-square is greater than one (Bevington 1969 Klein 1992)(3) Probably the best approach is to use bootstrap estimatesof threshold variance based on the data from each run(Foster amp Bischof 1991 Press et al 1992) The bootstrapestimator takes the goodness-of-fit into account in esti-mating threshold variance It would be useful to have moreresearch on how well the threshold variance of a given runcorrelates with threshold accuracy (goodness-of-fit) of

that run It would be nice if outliers were strongly corre-lated with high threshold variance I would not be sur-prised if further research showed that because the fittinginvolves nonlinear regression the optimal weightingmight be different from simply using the inverse varianceas the weighting function

Wichmann and Hill and Strasburger Lapses and Bias Calculation

The first half of Wichmann and Hill (2001a) is con-cerned with the effect of lapses on the slope of the PFldquoLapsesrdquo also called ldquofinger errorsrdquo are errors to highlyvisible stimuli This topic is important because it is typi-cally ignored when one is fitting PFs Wichmann and Hill(2001a) show that improper treatment of lapses in para-metric PF fitting can cause sizeable errors in the valuesof estimated parameters Wichmann and Hill (2001a) showthat these considerations are especially important for es-timation of the PF slope Slope estimation is central toStrasburgerrsquos (2001b) article on letter discrimination Stras-burgerrsquos (2001b) Figures 4 5 9 and 10 show that thereare substantial lapses even at high stimulus levels Stras-burgerrsquos (2001b) maximum likelihood fitting programused a non-zero lapse parameter of l 5 001 (H Stras-burger personal communication September 17 2001)I was worried that this value for the lapse parameter wastoo small to avoid the slope bias so I carried out a num-ber of simulations similar to those of Wichmann and Hill(2001a) but with a choice of parameters relevant toStrasburgerrsquos situation Table 2 presents the results of thisstudy

Since Strasburger used a 10AFC task with the PF rang-ing from 10 to 100 correct I decided to use discrim-ination PFs going from 0 to 100 rather than the2AFC PF of Wichmann and Hill (2001a) For simplicity Idecided to have just three levels with 50 trials per level thesame as Wichmann and Hillrsquos example in their Figure 1The PF that I used had 25 42 and 50 correct responses atlevels placed at x 5 0 1 and 6 (columns labeled ldquoNoLapserdquo) A lapse was produced by having 49 instead of50 correct at the high level (columns labeled ldquoLapserdquo)The data were fit in six different ways Three PF shapeswere used probit (cumulative normal) logit and Wei-bull corresponding to the rows of Table 2 and two errormetrics were used chi-square minimization and likelihood

Table 2The Effect of Lapses on the Slope of Three Psychometric Functions

l 5 01 l 5 0001 l 5 01 l 5 0001

Psychometric No Lapse No Lapse Lapse Lapse

Function L c2 L c2 L c2 L c2

Probit 105 105 100 099 105 105 036 034Logit 176 176 166 166 176 176 084 072Weibull 101 101 097 097 101 101 026 026

NotemdashThe columns are for different lapse rates and for the presence or absence of lapses The 12 pairs ofentries in the cells are the slopes the left and right values correspond to likelihood maximization (L) and c2

minimization respectively

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

maximization corresponding to the paired entries in thetable In addition two lapse parameters were used in thefit l 5 001 (first and third pairs of data columns) andl 5 00001 (second and fourth pairs of columns) For thisdiscrimination task the lapses were made symmetric bysetting g 5 l in Equation 1A so that the PF would besymmetric around the midpoint

The Matlab program that produced Table 2 is includedas Matlab Code 3 in the Appendix The details on how thefitting was done and the precise definitions of the fittingfunctions are given in the Matlab code The functionsbeing fit are

with z 5 p2(y p1) The parameters p1 and p2 repre-senting the threshold and slope of the PF are the two freeparameters in the search The resulting slope parametersare reported in Table 2 The left entry of each pair is theresult of the likelihood maximization fit and the rightentry is that of the chi-square minimization The resultsshowed that (1) When there were no lapses (50 out of 50correct at the high level) the slope did not strongly de-pend on the lapse parameter (2) When the lapse param-eter was set to l 5 1 the PF slope did not change whenthere was a lapse (49 out of 50) (3) When l 5 001the PF slope was reduced dramatically when a singlelapse was present (4) For all cases except the fourth pairof columns there was no difference in the slope estimatebetween maximizing likelihood or minimizing chi-squareas would be expected since in these cases the PF fit thedata quite well (low chi-square) In the case of a lapsewith l 5 001 (fourth pair of columns) chi-square islarge (not shown) indicating a poor fit and there is an in-dication in the logit row that chi-square is more sensitiveto the outlier than is the likelihood function In generalchi-square minimization is more sensitive to outliersthan is likelihood maximization Our results are compat-ible with those of Wichmann and Hill (2001a) Given thisdiscussion one might worry that Strasburgerrsquos estimatedslopes would have a downward bias since he used l 500001 and he had substantial lapses However his slopeswere quite high It is surprising that the effect of lapsesdid not produce lower slopes

In the process of doing the parameter searches that wentinto Table 2 I encountered the problem of ldquolocal minimardquowhich often occurs in nonlinear regression but is not al-ways discussed The PFs depend nonlinearly on the pa-rameters so both chi-square minimization and likelihoodmaximization can be troubled by this problem wherebythe best-fitting parameters at the end of a search dependon the starting point of the search When the fit is good

(a low chi-square) as is true in the first three pairs of datacolumns of Table 2 the search was robust and relativelyinsensitive to the initial choice of parameter However inthe last pair of columns the fit was poor (large chi-square)because there was a lapse but the lapse parameter wastoo small In this case the fit was quite sensitive to choiceof initial parameters so a range of initial values had to beexplored to find the global minimum As can be seen inthe Matlab Code 3 in the Appendix I set the initial choiceof slope to be 03 since an initial guess in that region gavethe overall lowest value of chi-square

Wichmann and Hillrsquos (2001a) analysis and the discus-sion in this section raise the question of how one decideson the lapse rate For an experiment with a large numberof trials (many hundreds) with many trials at high stim-ulus strengths Wichmann and Hill (2001a Figure 5) showthat the lapse parameter should be allowed to vary whileone uses a standard fitting procedure However it is rareto have sufficient data to be able to estimate slope andlapse parameters as well as threshold One good approachis that of Treutwein and Strasburger (1999) and Wichmannand Hill (2001a) who use Bayesian priors to limit therange of l A second approach is to weight the data witha combination of binomial errors based on the observedas well as the expected data (Klein 2002) and fix thelapse parameter to zero or a small number Normally theweighting is based solely on the expected analytic PFrather than the data Adding in some variance due to theobserved data permits the data with lapses to receivelesser weighting A third approach proposed by Mannyand Klein (1985) is relevant to testing infants and clin-ical populations in both of which cases the lapse ratemight be high with the stimulus levels well separatedThe goal in these clinical studies is to place bounds onthreshold estimates Manny and Klein used a step-likePF with the assumption that the slope was steep enoughthat only one datum lay on the sloping part of the func-tion In the fitting procedure the lapse rate was given bythe average of the data above the step A maximum likeli-hood procedure with threshold as the only free parameter(the lapse rate was constrained as just mentioned) wasable to place statistical bounds on the threshold estimates

Biased Slope in Adaptive MethodsIn the previous section I examined how lapses produce

a downward slope bias Now I will examine factors thatcan produce an upward slope bias Leek Hanna andMarshall (1992) carried out a large number of 2AFC3AFC and 4AFC simulations of a variety of Brownianstaircases The PF that they used to generate the data andto fit the data was the power law dcent function (dcent 5 xt

b) (Iwas pleased that they called the dcent function the ldquopsycho-metric functionrdquo) They found that the estimated slopeb of the PF was biased high The bias was larger for runswith fewer trials and for PFs with shallower slopes ormore closely spaced levels Two of the papers in this spe-cial issue were centrally involved with this topic Stras-

probit erf Equation 3B

it

Weibull Equation

P z

Pz

P z

= +aeligegraveccedil

oumloslashdivide

=+

= [ ]

( )

logexp( )

exp exp( ) ( )

5 52

11

1 13

MEASURING THE PSYCHOMETRIC FUNCTION 1445

(58A)

(58B)

(58C)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

1446 KLEIN

burger (2001b) used a 10AFC task for letter discrimina-tion and found slopes that were more than double theslopes in previous studies I worried that the high slopesmight be due to a bias caused by the methodology Kaern-bach (2001b) provides a mechanism that could explainthe slope bias found in adaptive procedures Here I willsummarize my thoughts on this topic My motivationgoes well beyond the particular issue of a slope bias as-sociated with adaptive methods I see it as an excellent casestudy that provides many lessons on how subtle seem-ingly innocuous methods can produce unwanted biases

Strasburgerrsquos steep slopes for character recognitionStrasburger (2001b) used a maximum likelihood adaptiveprocedure with about 30 trials per run The task was let-ter discrimination with 10 possible peripherally viewedletters Letter contrast was varied on the basis of a like-lihood method using a Weibull function with b 5 35 toobtain the next test contrast and to obtain the thresholdat the end of each run In order to estimate the slope thedata were shifted on a log axis so that thresholds werealigned The raw data were then pooled so that a large num-ber of trials were present at a number of levels Note thatthis procedure produces an extreme concentration of tri-als at threshold The bottom panels of Strasburgerrsquos Fig-ures 4 and 5 show that there are less than half the numberof trials in the two bins adjacent to the central bin with abin separation of only 001 log units (a 23 contrastchange) A Weibull function with threshold and slope asfree parameters was fit to the pooled data using a lowerasymptote of g 5 10 (because of the 10AFC task) anda lapse rate of l 5 001 (a 99990 correct upper as-ymptote) The PF slope was found to be b 5 55 Sincethis value of b is about twice that found regularly Stras-burgerrsquos finding implies at least a four-fold variance re-duction The asymptotic (large N ) variance for the 10AFCtask is 175(Nb 2) 5 0058N (Klein 2001) For a run of25 trials the variance is var 5 005825 5 00023 Thestandard error is SE 5 sqrt(var) 05 This means that injust 25 trials one can estimate threshold with a 5 stan-dard error a low value That low value is sufficiently re-markable that one should look for possible biases in theprocedure

Before considering a multiplicity of factors contribut-ing to a bias it should be noted that the large values of bcould be real for seeing the tiny stimuli used by Strasburger(2001b) With small stimuli and peripheral viewing thereis much spatial uncertainty and uncertainty is known toelevate slopes A question remains of whether the uncer-tainty is sufficient to account for the data

There are many possible causes of the upward slopebias My intuition was that the early step of shifting the PFto align thresholds was an important contributor to thebias As an example of how it would work consider thefive-point PF with equal level spacing used by Miller andUlrich (2001) P(i) 5 1 1222 50 8778 and99 Suppose that because of binomial variability themiddle point was 70 rather than 50 Then the thresh-

old would be shifted to between Levels 2 and 3 and theslope would be steepened Suppose on the other hand thatthe variability causes the middle point to be 30 ratherthan 50 Now the threshold would be shifted to betweenLevels 3 and 4 and the slope would again be steepenedSo any variability in the middle level causes steepeningVariability at the other levels has less effect on slope I willnow compare this and other candidates for slope bias

Kaernbachrsquos explanation of the slope bias Kaern-bach (2001b) examines staircases based on a cumulativenormal PF that goes from 0 to 100 The staircase ruleis Brownian with one step up or down for each incorrect orcorrect answer respectively Parametric and nonparamet-ric methods were used to analyze the data Here I will focuson the nonparametric methods used to generate Kaern-bachrsquos Figures 2ndash4 The analysis involves three steps

First the data are monotonized Kaernbach gives tworeasons for this procedure (1) Since the true psychome-tric function is expected to be monotonic it seems appro-priate to monotonize the data This also helps remove noisefrom nonparametric threshold or slope estimates (2) ldquoSec-ond and more importantly the monotonicity constraintimproves the estimates at the borders of the tested signalrange where only few tests occur and admits to estimatethe values of the PF at those signal levels that have not beentested (ie above or below the tested range) For thepresent approach this extrapolation is necessary sincethe averaging of the PF values can only be performed ifall runs yield PF estimates for all signal levels in questionrdquo(Kaernbach 2001b p 1390)

Second the data are extrapolated with the help of themonotonizing step as mentioned in the quote of the pre-ceding paragraph Kaernbach (2001b) believes it is es-sential to extrapolate However as I shall show it is pos-sible to do the simulations without the extrapolationstep I will argue in connection with my simulations thatthe extrapolation step may be the most critical step inKaernbachrsquos method for producing the bias he finds

Third a PF is calculated for each run The PFs are thenaveraged across runs This is different from Strasburgerrsquosmethod in which the data are shifted and then rather thanaverage the PFs of each run the total number correct andincorrect at each level are pooled Finally the PF is gen-erated on the basis of the pooled data

Simulations to clarify contributions to the slope biasA number of factors in Kaernbachrsquos and Strasburgerrsquos pro-cedures could contribute to biased slope estimate In Stras-burgerrsquos case there is the shifting step One might thinkthis special to Strasburger but in fact it occurs in manymethods both parametric and nonparametric In paramet-ric methods both slope and threshold are typically esti-mated A threshold estimate that is shifted away from itstrue value corresponds to Strasburgerrsquos shift Similarlyin the nonparametric method of Miller and Ulrich (2001)the distribution is implicitly allowed to shift in the processof estimating slope When Kaernbach (2001b) imple-mented Miller and Ulrichrsquos method he found a large up-

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

MEASURING THE PSYCHOMETRIC FUNCTION 1447

ward slope bias Strasburgerrsquos shift step was more explicitthan that of the other methods in which the shift is implicitStrasburgerrsquos method allowed the resulting slope to be vi-sualized as in his figures of the raw data after the shift

I investigated several possible contributions to the slopebias (1) shifting (2) monotonizing (3) extrapolating and(4) averaging the PF Rather than simulations I did enu-merations in which all possible staircases were analyzedwith each staircase consisting of 10 trials all starting atthe same point To simplify the analysis and to reduce thenumber of assumptions I chose a PF that was flat (slopeof zero) at P 5 50 This corresponds to the case of anormal PF but with a very small step size The Matlab pro-gram that does all the calculations and all the figures isincluded as Matlab Code 4 There are four parts to the code(1) the main program (2) the script for monotonizing

(3) the script for extrapolating (4) the script for either av-eraging the PFs or totaling up the raw responses The Mat-lab code in the Appendix shows that it is quite easy to finda method for averaging PFs even when there are levels thatare not tested Since the annotated programs are includedI will skip the details here and just offer a few comments

The output of the Matlab program is shown in Figure 5The left and right columns of panels show the PF for thecase of no shifting and with shifting respectively Theshifting is accomplished by estimating threshold as themean of all the levels a standard nonparametric methodfor estimating threshold That mean is used as the amountby which the levels are shifted before the data from all runsare pooled The top pair of panels is for the case of aver-aging the raw data the middle panels show the results of av-eraging following the monotonizing procedure The monot-

0 5 10 15 200

5

1no shift

(a)

0 5 10 15 204

5

6

7(b)

0 5 10 15 200

5

1(c)

stimulus levels

0 5 10 15 200

5

1shift

(d)

0 5 10 15 200

5

1(e)

0 5 10 15 200

5

1(f )

stimulus levels

no datamanipulation

frequency frequency

Pro

babi

lity

corr

ect

Extrapolate psychometric function

Monotonizepsychometric function

Figure 5 Slope bias for staircase data Psychometric functions are shown following four types of processingsteps shifting monotonizing extrapolating and type of averaging In all panels the solid line is for averaging theraw data of multiple runs before calculating the PF The dashed line is for first calculating the PF and then aver-aging the PF probabilities Panel a shows the initial PF is unchanged if there is no shifting maximizing or ex-trapolating For simplicity a flat PF fixed at P 5 50 was chosen The dotndashdashed curve is a histogram show-ing the percentage of trials at each stimulus level Panel b shows the effect of monotonizing the data Note that thisone panel has a reduced ordinate to better show the details of the data Panel c shows the effect of extrapolatingthe data to untested levels Panels d e and f are for the cases a b and c where in addition a shift operation is doneto align thresholds before the data are averaged across runs The results show that the shift operation is the mosteffective analysis step for producing a bias The combination of monotonizing extrapolating and averaging PFs(panel c) also produces a strong upward bias of slope

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

1448 KLEIN

onizing routine took some thought and I hope its presencein the Appendix (Matlab Code 4) will save others muchtrouble The lower pair of panels shows the results of aver-aging after both monotonizing and extrapolating

In each panel the solid curve shows the average PF ob-tained by combining the correct trials and the total trialsacross runs for each level and taking their ratio to get thePF The dashed curve shows the average PF resulting fromaveraging the PFs of each run The latter method is themore important one since it simulates single short runsIn the upper pair of panels the dashed and solid curvesare so close to each other that they look like a single curveIn the upper pair I show an additional dotndashdashed linewith rightndashleft symmetry that represents the total num-ber of trials The results in the plots are as follows

Placement of trials The effect of the shift on the num-ber of trials at each level can be seen by comparing thedotndashdashed lines in the top two panels The shift narrowsthe distribution significantly There are close to zero trialsmore than three steps from threshold in panel d with theshift even though the full range is 610 steps The curveshowing the total number of trials would be even sharperif I had used a PF with positive slope rather than a PFthat is constant at 50

Slope bias with no monotonizing The top left panelshows that with no shift and no monotonizing there is noslope bias The PF is flat at 50 The introduction of a shiftproduces a dramatic slope bias going from PF 5 20 to80 as the stimulus goes from Levels 8 to 12 There wasno dependence on the type of averaging

Effect of monotonizing on slope bias Monotonizingalone (middle panel on left) produced a small slope biasthat was larger with PF averaging than with data averagingNote that the ordinate has been expanded in this one casein order to better illustrate the extent of the bias The ex-treme amount of steepening (right panel) that is producedby shifting was not matched by monotonizing

Effect of extrapolating on slope bias The lower left panelshows the dramatic result that the combination of all threeof Kaernbachrsquos methodsmdashmonotonizing extrapolatingand averaging PFsmdashdoes produce a strong slope bias forthese Brownian staircases If one uses Strasburgerrsquos typeof averaging in which the data are pooled before the PF iscalculated then the slope bias is minimal This type of av-eraging is equivalent to a large increase in the number oftrials

These simulations show that it is easy to get an upwardslope bias from Brownian data with trials concentratedhear one point An important factor underlying this bias isthe strong correlation in staircase levels discussed byKaernbach (2001b) A factor that magnifies the slope biasis that when trials are concentrated at one point the slopeestimate has a large standard error allowing it to shift eas-ily Our simulations show that one can produce biased slopeestimates either by a procedure (possibly implicit) that shiftsthe threshold or by a procedure that extrapolates data tountested levels This topic is of more academic than prac-

tical interest since anyone interested in the PF slope shoulduse an adaptive algorithm like the Y method of Kontsevichand Tyler (1999) in which trials are placed at separatedlevels for estimating slope The slope bias that is producedby the seemingly innocuous extrapolation or shifting stepis a wonderful reminder of the care that must be takenwhen one employs psychometric methodologies

SUMMARY

Many topics have been in this paper so a summaryshould serve as a useful reminder of several highlightsItems that are surprising novel or important are italicized

I Types of PFs1 There are two methods for compensating for the PF

lower asymptote also called the correction for bias11 Do the compensation in probability space Equa-

tion 2 specifies p(x) 5 [P(x) P(0)] [1 P(0)] where P(0)the lower asymptote of the PF is often designated by theparameter g With this method threshold estimates vary asthe lower asymptote varies (a failure of the high-thresholdassumption)

12 Do the compensation in z-score space for yesnotasks Equation 5 specifies d cent(x) 5 z(x) z(0) With thismethod the response measure dcent is identical to the metricused in signal detection theory This way of looking at psy-chometric functions is not familiar to many researchers

2 2AFC is not immune to bias It is not uncommon forthe observer to be biased toward Interval 1 or 2 when thestimulus is weak Whereas the criterion bias for a yesnotask [z(0) in Equation 5] affects dcent linearly the intervalbias in 2AFC affects dcent quadratically and therefore it hasbeen thought small Using an example from Green andSwets (1966) I have shown that this 2AFC bias can havea substantial effect on dcent and on threshold estimates a dif-ferent conclusion from that of Green and Swets The in-terval bias can be eliminated by replotting the 2AFC dis-crimination PF using the percent correct Interval 2judgments on the ordinate and the Interval 2 minus In-terval 1 signal strength on the abscissa This 2AFC PFgoes from 0 to 100 rather than from 50 to 100

3 Three distinctions can be made regarding PFsyesno versus forced choice detection versus discrimi-nation adaptive versus constant stimuli In all of the ex-perimental papers in this special issue the use of forcedchoice adaptive methods reflects their widespread preva-lence

4 Why is 2AFC so popular given its shortcomings Acommon response is that 2AFC minimizes bias (but noteItem 2 above) A less appreciated reason is that many adap-tive methods are available for forced choice methods butthat objective yesno adaptive methods are not commonBy ldquoobjectiverdquo I mean a signal detection method with suf-ficient blank trials to fix the criterion Kaernbach (1990)proposed an objective yesno upndashdown staircase withrules as simple as these Randomly intermix blanks and

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

MEASURING THE PSYCHOMETRIC FUNCTION 1449

signal trials decrease the level of the signal by one step forevery correct answer (hits or correct rejections) and in-crease the level by three steps for every wrong answer(false alarms or misses) The minimum dcent is obtained whenthe numbers of ldquoyesrdquo and ldquonordquo responses are about equal(ROC negative diagonal) A bias of imbalanced ldquoyesrdquo andldquonordquo responses is similar to the 2AFC interval bias dis-cussed in Item 2 The appropriate balance can be achievedby giving bias feedback to the observer

5 Threshold should be defined on the basis of a fixeddcent level rather than the 50 point of the PF

6 The connection between PF slope on a natural log-arithm abscissa and slope on a linear abscissa is as fol-lows slopelin 5 slopelogxt (Equation 11) where x t is thestimulus strength in threshold units

7 Strasburgerrsquos (2001a) maximum PF slope has the ad-vantage that it relates PF slope using a probability ordinatebcent to the low stimulus strength logndashlog slope of the dcentfunction b It has the further advantage of relating theslopes of a wide variety of PFs Weibull logistic Quickcumulative normal hyperbolic tangent and signal detec-tion dcent The main difficulty with maximum slope is thatquite often threshold is defined at a point different fromthe maximum slope point Typically the point of maxi-mum slope is lower than the level that gives minimumthreshold variance

8 A surprisingly accurate connection was made be-tween the 2AFC Weibull function and the StromeyerndashFoley dcent function given in Equation 20 For all values ofthe Weibull b parameter and for all points on the PF themaximum difference between the two functions is 0004on a probability ordinate going from 50 to 100 For thisgood fit the connection between the dcent exponent b and theWeibull exponent b is bb 5 106 This ratio differs frombb 08 (Pelli 1985) and bb 5 088 (Strasburger2001a) because our dcent function saturates at moderate dcentvalues just as the Weibull saturates and as experimentaldata saturate

II Experimental Methods for Measuring PFs1 The 2AFC and objective yesno threshold vari-

ances were compared using signal detection assump-tions For a fixed total percent correct the yesno andthe 2AFC methods have identical threshold varianceswhen using a dcent function given by dcent 5 ct

b The optimumvariance of 164(Nb2) occurs at P 5 94 where N isthe number of trials If N is the number of stimulus pre-sentations then for the 2AFC task the variance wouldbe doubled to 328(Nb2) Counting stimulus presenta-tions rather than trials can be important for situations inwhich each presentation takes a long time as in the smelland taste experiments of Linschoten et al (2001)

2 The equality of the 2AFC and yesno thresholdvariance leads to a paradox whereby observers couldconvert the 2AFC experiment to an objective yesno ex-periment by closing their eyes in the first 2AFC intervalParadoxically the eyesrsquo shutting would leave the thresh-old variance unchanged This paradox was resolved when

a more realistic dcent PF with saturation was used In that casethe yesno variance was slightly larger than the 2AFC vari-ance for the same number of trials

3 Several disadvantages of the 2AFC method are that(a) 2AFC has an extra memory load (b) modeling proba-bility summation and uncertainty is more difficult thanfor yesno (c) multiplicative noise (ROC slope) is diffi-cult to measure in 2AFC (d) bias issues are present in2AFC as well as in objective yesno and (e) thresholdestimation is inefficient in comparison with methodswith lower asymptotes that are less than 50

4 Kaernbach (2001b) suggested that some 2AFCproblems could be alleviated by introducing an extraldquodonrsquot knowrdquo response category A better alternative is togive a high or low confidence response in addition tochoosing the interval I discussed how the confidencerating would modify the staircase rules

5 The efficiency of the objective yesno methodcould be increased by increasing the number of responsecategories (ratings) and increasing the number of stimu-lus levels (Klein 2002)

6 The simple upndashdown (Brownian) staircase withthreshold estimated by averaging levels was found to havenearly optimal efficiency in agreement with Green (1990)It is better to average all levels (after about four reversalsof initial trials are thrown out) rather than average an evennumber of reversals Both of these findings were surprising

7 As long as the starting point is not too distant fromthreshold Brownian staircases with their moderate iner-tia have advantages over the more prestigious adaptivelikelihood methods At the beginning of a run likelihoodmethods may have too little inertia and can jump to lowstimulus strengths before the observer is fully familiar withthe test target At the end of the run likelihood methodsmay have too much inertia and resist changing levels eventhough a nonstationary threshold has shifted

8 The PF slope is useful for several reasons It isneeded for estimating threshold variance for distinguish-ing between multiple vision models and for improved es-timation of threshold I have discussed PF slope as well asthreshold Adaptive methods are available for measuringslope as well as threshold

9 Improved goodness-of-fit rules are needed as abasis for throwing out bad runs and as a means for im-proving estimates of threshold variance The goodness-of-fit metric should look not only at the PF but also at thesequential history of the run so that mid-run shifts inthreshold can be determined The best way to estimatethreshold variance on the basis of a single run of data isto carry out bootstrap calculations (Foster amp Bischof 1991Wichmann amp Hill 2001b) Further research is needed inorder to determine the optimal method for weighting mul-tiple estimates of the same threshold

10 The method should depend on the situation

III Mathematical Methods for Analyzing PFs1 Miller and Ulrichrsquos (2001) nonparametric Spearmanndash

Kaumlrber (SK) analysis can be as good as and often better

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

1450 KLEIN

than a parametric analysis An important limitation of theSK analysis is that it does not include a weighting factorthat would emphasize levels with more trials This wouldbe relevant for staircase methods in which the number oftrials was unequally distributed It would also cause prob-lems with 2AFC detection because of the asymmetry ofthe upper and lower asymptote It need not be a problemfor 2AFC discrimination because as I have shown thistask can be analyzed better with a negative-going abscissathat allows the ordinate to go from 0 to 100 Equa-tions 39ndash43 provide an analytic method for calculating theoptimal variance of threshold estimates for the method ofconstant stimuli The analysis shows that the SK methodhad an optimal variance

2 Wichmann and Hill (2001a) carry out a large numberof simulations of the chi-square and likelihood goodness-of-fit for 2AFC experiments They found trial placementswhere their simulations were upwardly skewed in com-parison with the chi-square distribution based on linear re-gression They found other trial placements where theirchi-square simulations were downwardly skewed Thesepuzzling results rekindled my longstanding interest inwanting to know when to trust the chi-square distributionin nonlinear situations and prompted me to analyzerather than simulate the biases shown by Wichmann ampHill To my surprise I found that the X 2 distribution(Equation 52) had zero bias at any testing level even for1 or 2 trials per level The likelihood function (Equa-tion 56) on the other hand had a strong bias when thenumber of trials per level was low (Figure 4) The bias asa function of test level was precisely of the form that isable to account for the bias found by Wichmann and Hill(2001a)

3 Wichmann and Hill (2001a) present a detailed in-vestigation of the strong effect of lapses on estimates ofthreshold and slope This issue is relevant to the data ofStrasburger (2001b) because of the lapses shown in hisdata and because a very low lapse parameter (l 5 00001)was used in fitting the data In my simulations using PFparameters somewhat similar to Strasburgerrsquos situation Ifound that the effect of lapses would be expected to bestrong so that Strasburgerrsquos estimated slopes should belower than the actual slopes However Strasburgerrsquos slopeswere high so the mystery remains of why the lapses didnot have a stronger effect My simulations show that onepossibility is the local minimum problem whereby the slopeestimate in the presence of lapses is sensitive to the initialstarting guesses for slope in the search routine In order tofind the global minimum one needs to use a range of ini-tial slope guesses

4 One of the most intriguing findings in this specialissue was the quite high PF slopes for letter discriminationfound by Strasburger (2001b) The slopes might be highbecause of stimulus uncertainty in identifying small low-contrast peripheral letters There is also the possibility thatthese high slopes were due to methodological bias (Leeket al 1992) Kaernbach (2001b) presents a wonderfulanalysis of how an upward bias could be caused by trial

nonindependence of staircase procedures (not the methodStrasburger used) I did a number of simulations to sep-arate out the effects of threshold shift (one of the steps inthe Strasburger analysis) PF monotonization PF extrap-olation and type of averaging (the latter three used byKaernbach) My results indicated that for experimentssuch as Strasburgerrsquos the dominant cause of upward slopebias was the threshold shift Kaernbachrsquos extrapolationwhen coupled with monotonization can also lead to a strongupward slope bias Further experiments and analyses areencouraged since true steep slopes would be very impor-tant in allowing thresholds to be estimated with muchfewer trials than is presently assumed Although none ofour analyses makes a conclusive case that Strasburgerrsquosestimated slopes are biased the possibility of a bias sug-gests that one should be cautious before assuming that thehigh slopes are real

5 The slope bias story has two main messages The ob-vious one is that if one wants to measure the PF slope byusing adaptive methods one should use a method thatplaces trials at well separated levels The other messagehas broader implications The slope bias shows how sub-tle seemingly innocuous methods can produce unwantedeffects It is always a good idea to carry out Monte Carlosimulations of onersquos experimental procedures looking forthe unexpected

REFERENCES

Bevingt on P R (1969) Data reduction and error analysis for thephysical sciences New York McGraw-Hill

Car ney T Tyl er C W Wat son A B Makous W Beu t t er BChen C C Nor cia A M amp Kl ein S A (2000) Modelfest Yearone results and plans for future years In B E Rogowitz amp T NPapas (Eds) Human vision and electronic imaging V (Proceedingsof SPIE Vol 3959 pp 140-151) Bellingham WA SPIE Press

Emer son P L (1986) Observations on a maximum-likelihood andBayesian methods of forced-choice sequential threshold estimationPerception amp Psychophysics 39 151-153

Finn ey D J (1971) Probit analysis (3rd ed) Cambridge CambridgeUniversity Press

Fol ey J M (1994) Human luminance pattern-vision mechanismsMasking experiments require a new model Journal of the OpticalSociety of America A 11 1710-1719

Fost er D H amp Bischof W F (1991) Thresholds from psychometricfunctions Superiority of bootstrap to incremental and probit varianceestimators Psychological Bulletin 109 152-159

Gour evit ch V amp Ga l a nt er E (1967) A significance test for oneparameter isosensitivity functions Psychometrika 32 25-33

Gr een D M (1990) Stimulus selection in adaptive psychophysicalprocedures Journal of the Acoustical Society of America 87 2662-2674

Gr een D M amp Swet s J A (1966) Signal detection theory andpsychophysics Los Altos CA Peninsula Press

Ha cker M J amp Rat cl if f R (1979) A revised table of dcent for M-alternative forced choice Perception amp Psychophysics 26 168-170

Ha l l J L (1968) Maximum-likelihood sequential procedure for esti-mation of psychometric functions [Abstract] Journal of the Acousti-cal Society of America 44 370

Hal l J L (1983) A procedure for detecting variability of psychophys-ical thresholds Journal of the Acoustical Society of America 73 663-667

Kaer nbach C (1990) A single-interval adjustment-matrix (SIAM) pro-cedure for unbiased adaptive testing Journal of the Acoustical Soci-ety of America 88 2645-2655

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

MEASURING THE PSYCHOMETRIC FUNCTION 1451

Ka er nbach C (2001a) Adaptive threshold estimation with unforced-choice tasks Perception amp Psychophysics 63 1377-1388

Ka er nbach C (2001b) Slope bias of psychometric functions derivedfrom adaptive data Perception amp Psychophysics 63 1389-1398

King-Smit h P E (1984) Efficient threshold estimates from yesnoprocedures using few (about 10) trials American Journal of Optom-etry amp Physiological Optics 81 119

Kin g-Smit h P E Gr igsby S S Vin gr ys A J Ben es S C ampSu powit A (1994) Efficient and unbiased modifications of theQUEST threshold method Theory simulations experimental evalu-ation and practical implementation Vision Research 34 885-912

Kin g-Smit h P E amp Rose D (1997) Principles of an adaptive methodfor measuring the slope of the psychometric function Vision Re-search 37 1595-1604

Kl ein S A (1985) Double-judgment psychophysics Problems andsolutions Journal of the Optical Society of America 2 1560-1585

Kl ein S A (1992) An Excel macro for transformed and weighted av-eraging Behavior Research Methods Instruments amp Computers 2490-96

Kl ein S A (2002) Measuring the psychometric function Manuscriptin preparation

Kl ein S A amp St r omeyer C F III (1980) On inhibition betweenspatial frequency channels Adaptation to complex gratings VisionResearch 20 459-466

Kl ein S A St r omeyer C F III amp Ga nz L (1974) The simulta-neous spatial frequency shift A dissociation between the detectionand perception of gratings Vision Research 15 899-910

Kont sevich L L amp Tyl er C W (1999) Bayesian adaptive estimationof psychometric slope and threshold Vision Research 39 2729-2737

Leek M R (2001) Adaptive procedures in psychophysical researchPerception amp Psychophysics 63 1279-1292

Leek M R Han na T E amp Mar sha l l L (1991) An interleavedtracking procedure to monitor unstable psychometric functionsJournal of the Acoustical Society of America 90 1385-1397

Leek M R Hanna T E amp Mar sh al l L (1992) Estimation of psy-chometric functions from adaptive tracking procedures Perception ampPsychophysics 51 247-256

Linschot en M R Ha r vey L O Jr El l er P M amp Ja fek B W(2001) Fast and accurate measurement of taste and smell thresholdsusing a maximum-likelihood adaptive staircase procedure Percep-tion amp Psychophysics 63 1330-1347

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-

adigm to measure Vernier acuity of older infants Vision Research25 1245-1252

McKee S P Kl ein S A amp Tel l er D A (1985) Statistical prop-erties of forced-choice psychometric functions Implications of pro-bit analysis Perception amp Psychophysics 37 286-298

Mil l er J amp Ul r ich R (2001) On the analysis of psychometric func-tions The SpearmanndashKaumlrber Method Perception amp Psychophysics63 1399-1420

Pel l i D G (1985) Uncertainty explains many aspects of visual con-trast detection and discrimination Journal of the Optical Society ofAmerica A 2 1508-1532

Pel l i D G (1987) The ideal psychometric procedures [Abstract] In-vestigative Ophthalmology amp Visual Science 28 (Suppl) 366

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pr ess W H Teukol sky S A Vet t er l ing W T amp Fl ann er y B P(1992) Numerical recipes in C The art of scientific computing (2nded) Cambridge Cambridge University Press

St r a sbu r ger H (2001a) Converting between measures of slope of the psychometric function Perception amp Psychophysics 63 1348-1355

St r asbu r ger H (2001b) Invariance of the psychometric function forcharacter recognition across the visual field Perception amp Psycho-physics 63 1356-1376

St r omeyer C F III amp Kl ein S A (1974) Spatial frequency channelsin human vision as asymmetric (edge) mechanisms Vision Research14 1409-1420

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficiency esti-mates on probability functions Journal of the Acoustical Society ofAmerica 41 782-787

Tr eu t wein B (1995) Adaptive psychophysical procedures VisionResearch 35 2503-2522

Tr eu t wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysics 33 113-120

Wichman n F A amp Hil l N J (2001a) The psychometric function IFitting sampling and goodness-of-fit Perception amp Psychophysics63 1293-1313

Wichma nn F A amp Hil l N J (2001b) The psychometric function IIBootstrap based confidence intervals and sampling Perception ampPsychophysics 63 1314-1329

Yu C Kl ein S A amp Levi D M (2001) Psychophysical measurementand modeling of iso- and cross-surround modulation of the foveal TvCfunction Manuscript submitted for publication

(Continued on next page)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

1452 KLEINA

PP

EN

DIX

Mat

lab

Pro

gram

s A

vail

able

Fro

m k

lein

sp

ecta

cle

berk

eley

ed

u

Mat

lab

Cod

e 1

Wei

bull

Fu

ncti

on

Pro

babi

lity

dcent a

nd L

ogndashL

og d

cent Slo

pe

Cod

e fo

r G

ener

atin

g F

igur

e 1

rsquogam

ma

5 5

15

erf

(1

sqrt

(2))

gam

ma

(low

er a

sym

ptot

e) f

or z

-sco

re 5

1

beta

5 2

inc

5 0

1

beta

is P

F s

lope

x5

inc

2in

c3

th

e st

imul

us r

ange

wit

h li

near

abs

ciss

ap

5 g

amm

a1(1

gam

ma)

(1

exp(

x^(

beta

)))

W

eibu

ll f

unct

ion

subp

lot(

32

1)p

lot(

xp)

gri

dte

xt(

19

2rsquo(a

)rsquo) y

labe

l(rsquoW

eibu

ll w

ith

beta

5 2

rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

conv

erti

ng p

roba

bili

ty to

z-s

core

subp

lot(

32

3)p

lot(

xz)

gri

dte

xt(

13

6rsquo(b

)rsquo) y

labe

l(rsquoz

-sco

re o

f W

eibu

ll (

dacutersquo

1)rsquo)

dpri

me

5 z

11

dp

rim

e 5

z(h

it)

z (fa

lse

alar

m)

difd

5 d

iff(

log(

dpri

me)

)in

c

deri

vativ

e of

log

dcentxd

5 in

cin

c2

9999

xva

lues

at m

idpo

int o

f pr

evio

us x

valu

essu

bplo

t(3

25)

plo

t(xd

dif

dx

d)g

rid

m

ulti

plic

atio

n by

xd

is to

mak

e lo

g ab

scis

saax

is([

0 3

5 2

])t

ext(

31

9rsquo(

c)rsquo)

yla

bel(

rsquologndash

log

slop

e of

dacutersquorsquo

) x

labe

l(rsquos

tim

ulus

in th

resh

old

unit

srsquo)

y 5

2

inc

1

do s

imil

ar p

lots

for

a lo

g ab

scis

sa (

y )p

5 g

amm

a1(1

gam

ma)

(1

exp(

exp(

beta

y))

)

Wei

bull

fun

ctio

n as

a f

unct

ion

of y

subp

lot(

32

2)p

lot(

yp

0p(

201)

rsquorsquo)

grid

tex

t(1

89

2rsquo(d

)rsquo)z

5 s

qrt(

2)e

rfin

v(2

p1)

su

bplo

t(3

24)

plo

t(y

z)g

rid

text

(1

83

6rsquo(e

)rsquo)dp

rim

e5

z1

1di

fd5

dif

f(lo

g(dp

rim

e))

inc

subp

lot(

32

6)p

lot(

y[d

ifd(

1) d

ifd]

)gr

id x

labe

l(rsquos

tim

ulus

in n

atur

al lo

g un

itsrsquo

) te

xt(

16

19

rsquo(f)rsquo)

Mat

lab

Cod

e 2

Goo

dne

ss-o

f-F

it

Lik

elih

ood

vs C

hi S

quar

e

Rel

evan

t to

Wic

hm

ann

and

Hill

(20

01a)

and

Fig

ure

4

clea

r c

lf

clea

r al

lfa

c5

[1

cum

prod

(12

0)]

cr

eate

fac

tori

al f

unct

ion

p5

[5

10

19

99]

ex

amin

e a

wid

e ra

nge

of p

roba

bili

ties

Nal

l5 [

1 2

4 8

16]

nu

mbe

r of

tri

als

at e

ach

leve

lfo

r iN

5 1

5

iter

ate

over

then

num

ber

of tr

ials

N5

Nal

l(iN

) E

5 N

p

E

is th

e ex

pect

ed n

umbe

r as

fun

ctio

n of

pli

k5

0 X

25

0

in

itia

lize

the

like

liho

od a

nd X

2

for

Obs

5 0

N

sum

ove

r al

l pos

sibl

e co

rrec

t res

pons

esN

m5

NO

bs

nu

mbe

r of

inco

rrec

t res

pons

esw

eigh

t5 f

ac(1

1N

)fa

c(11

Obs

)fa

c(11

Nm

)p

^Obs

(1

p)^

Nm

bino

mia

l wei

ght

lik

5 li

k 2

wei

ght

(Obs

log

(E(

Obs

1ep

s))1

Nm

log

((N

E)

(Nm

1ep

s)))

Equ

atio

n56

X2

5 X

21w

eigh

t(

Obs

E)

^2

(E

(1p)

)

calc

ulat

e X

2 E

quat

ion

52en

d

end

sum

mat

ion

loop

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

MEASURING THE PSYCHOMETRIC FUNCTION 1453L

L2(

iN

)5

lik

X2a

ll(i

N

)5

X2

st

ore

the

like

liho

ods

and

chi s

quar

esen

d

end

Nlo

op

the

rest

is f

or p

lott

ing

plot

(pL

L2

rsquokrsquop

X2a

ll(2

)rsquo

krsquo)

grid

pl

ot th

e lo

g li

keli

hood

s an

d X

2

for

in5

14

tex

t(9

35 L

L2(

ine

nd6

) n

um2s

tr(N

all(

in))

)en

dte

xt(

913

12

rsquoN5

16rsquo

)te

xt(

659

5rsquoX

2 fo

r al

l Nrsquo)

xlab

el(rsquoP

(pr

edic

ted

prob

abil

ity)

rsquo)yl

abel

(rsquoCon

trib

utio

n to

log

like

liho

od (

D)

from

eac

h le

velrsquo)

Mat

lab

Cod

e 3

Dow

nw

ard

Slop

e B

ias

Due

to

Lap

ses

Rel

evan

t to

Wic

hm

ann

and

Hil

l (20

01a)

and

Tab

le2

func

tion

sse

5 p

robi

tML

2(pa

ram

s i

ilap

se)

fu

ncti

on c

alle

d by

mai

n pr

ogra

mst

imul

us5

[0

1 6]

N5

[50

50

50]

Obs

5 [

25 4

2 50

]

psyc

hom

etri

c fu

ncti

on in

form

atio

nla

pseA

ll5

[0

1 0

001

01

000

1]l

apse

5 la

pseA

ll(i

laps

e)

ch

oose

the

laps

e ra

te l

ambd

aif

ilap

segt

2O

bs(3

)5

Obs

(3)

1en

d

prod

uce

a la

pse

for

last

2 c

olum

nszz

5 (

stim

ulus

para

ms(

1))

para

ms(

2)

z -

scor

e of

psy

chom

etri

c fu

ncti

onii

5 m

od(i

3)

in

dex

for

sele

ctin

g ps

ycho

met

ric

func

tion

if ii

55

0

prob

5 (

11er

f(zz

sqr

t(2)

))2

prob

it (

cum

ulat

ive

norm

al)

else

if ii

55

1 p

rob

5 1

(11

exp(

zz))

logi

tel

se

prob

5 1

exp(

exp(

zz))

Wei

bull

end

Exp

5 N

(l

apse

1pr

ob(

12

laps

e))

ex

pect

ed n

umbe

r co

rrec

tif

ilt3

chi

sq5

2

(Obs

lo

g(E

xpO

bs)

1(N

Obs

)l

og((

NE

xp1

eps)

(N

Obs

1ep

s)))

log

like

liho

odel

se c

hisq

5 (

Obs

Exp

)^2

E

xp(

NE

xp1

eps)

X2

eps

is s

mal

lest

Mat

lab

num

ber

end

sse

5 s

um(c

hisq

)

sum

ove

r th

e th

ree

leve

ls

M

AIN

PR

OG

RA

M

clea

rcl

fty

pe5

[rsquop

robi

t max

lik

rsquorsquolo

git m

axli

k rsquorsquo

Wei

bull

max

lik

rsquo

head

ings

for

Tab

le2

rsquopro

bit c

hisq

rsquorsquolo

git c

hisq

rsquorsquoW

eibu

ll c

hisq

rsquo]

disp

(rsquo g

amm

a5

01

00

01

01

0001

rsquo)

type

top

of T

able

2di

sp(rsquo

laps

e5

no

no

yes

ye

srsquo)

for

i5 0

5

iter

ate

over

fun

ctio

n ty

pefo

r il

apse

5 1

4

iter

ate

over

laps

e ca

tego

ries

para

ms

5 f

min

s(rsquop

robi

tML

2rsquo[

1 3

][]

[]

iil

apse

)

fmin

s fi

nds

min

imum

of

chi-

squa

resl

ope(

ilap

se)

5 p

aram

s(2)

slop

e is

2nd

par

amet

eren

ddi

sp([

type

(i1

1)

num

2str

(slo

pe)]

)

prin

t res

ult

end

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

1454 KLEINA

PP

EN

DIX

(Con

tinu

ed)

Mat

lab

Cod

e 4

Bro

wni

an S

tair

case

Enu

mer

atio

ns fo

r Sl

ope

Bia

s

Mon

oton

y

flag

5 1

ii5

0

in

itia

lize

fl

ag5

1 m

eans

non

mon

oton

icit

y is

det

ecte

dw

hile

fla

g5

5 1

keep

goi

ng if

ther

e is

non

mon

oton

ic d

ata

flag

5 0

rese

t mon

oton

ic f

lag

ii5

ii1

1

incr

ease

the

nonm

onot

onic

spr

ead

for

i25

ii1

1nt

ot

lo

op o

ver

all P

F le

vels

look

ing

for

nonm

onot

onic

ity

prob

5 c

or(

tot1

eps)

calc

ulat

e pr

obab

ilit

ies

if (

prob

(i2

ii)gt

prob

(i2)

) amp

(to

t(i2

)gt0)

chec

k fo

r no

nmon

oton

icit

yco

r(i2

iii

2)5

mea

n(co

r(i2

iii

2))

ones

(1ii

11)

repl

ace

nonm

onot

onic

cor

rect

by

aver

age

tot(

i2ii

i2)

5 m

ean(

tot(

i2ii

i2)

)on

es(1

ii1

1)

re

plac

e no

nmon

oton

ic to

tal b

y av

erag

efl

ag5

1

se

t fla

g th

at n

onm

onot

onic

ity

is f

ound

end

end

end

Not

e th

at in

line

6 th

e de

nom

inat

or is

tot1

eps

whe

re e

ps is

the

smal

lest

flo

atin

g po

int n

umbe

r th

at M

atla

b ca

n us

e B

ecau

se o

f th

is c

hoic

e if

ther

e ar

e ze

ro tr

ials

at a

leve

l th

e as

-so

ciat

ed p

roba

bili

ty w

ill b

e ze

ro

Ext

rapo

late

ii5

1 w

hile

tot(

ii)

55

0i

i5 ii

11

end

co

unt t

he lo

w le

vels

wit

h no

tri

als

if ii

gt1

sk

ip lo

wes

t lev

elto

t(1

ii1)

5 o

nes(

1ii

1)

0000

1

fill

in w

ith

very

low

num

ber

of t

rial

sco

r(1

ii1)

5 p

rob(

ii)

tot(

1ii

1)

fi

ll in

wit

h pr

obab

ilit

y of

low

est t

este

d le

vel

end

ii5

nbi

ts2

whi

le to

t(ii

)5

5 0

ii5

ii1

end

do

the

sam

e ex

trap

olat

ion

at h

igh

end

if ii

ltnb

its2

to

t(ii

11

nbit

s2)

5 o

nes(

1nb

its2

ii)

000

01

cor(

ii1

1nb

its2

)5

pro

b(ii

)to

t(ii

11

nbit

s2)

end

Ave

ragi

ng

prob

15

cor

(t

ot1

eps)

calc

ulat

e P

Fpr

obA

ll(i

opt

)5

pro

bAll

(iop

t)1

prob

1

sum

the

PF

s to

be

aver

aged

prob

Tot

All

(iop

t)

5 p

robT

otA

ll(i

opt

)1(t

otgt

0)

co

unt t

he n

umbe

r of

sta

irca

ses

wit

h a

give

n le

vel

corA

ll(i

opt

)5

cor

All

(iop

t)1

cor

sum

the

num

ber

corr

ect o

ver

all s

tair

case

sto

tAll

(iop

t)

5 to

tAll

(iop

t)

1to

t

sum

the

tota

l num

ber

of tr

ials

ove

r al

l sta

irca

ses

Mai

n P

rogr

am

nbit

s5

10

n25

2^n

bits

1nb

its2

5 2

nbi

ts1

nb

its

is n

umbe

r of

sta

irca

se s

teps

zero

35

zer

os(3

nbi

ts2)

the

thre

e ro

ws

are

for

the

thre

e io

pt v

alue

sfo

r is

hift

5 0

1

ishi

ft5

0 m

eans

no

shif

t (1

mea

ns s

hift

)to

tAll

5 z

ero3

cor

All

5 z

ero3

pro

bAll

5 z

ero3

pro

bTot

All

5 z

ero3

init

iali

ze a

rray

s

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)

MEASURING THE PSYCHOMETRIC FUNCTION 1455fo

r i5

0n

2

loop

ove

r al

l pos

sibl

e st

airc

ases

a5

bit

get(

i[1

nbit

s])

a(

k )5

1 is

hit

a(k

)5

0 is

mis

sst

ep5

12

a(1

end

1)

1

step

dow

n fo

r hi

t 1

step

up

for

hit

aCum

5 [

0 cu

msu

m(s

tep)

]

accu

mul

ate

step

s to

get

sti

mul

us le

vel

if is

hift

55

0 a

ve5

0

fo

r th

e no

-shi

ft o

ptio

nel

se a

ve5

rou

nd(m

ean(

aCum

))e

nd

for

shif

t opt

ion

ave

is th

e am

ount

of

shif

taC

um5

aC

umav

e1nb

its

do

the

shif

tco

r5

zer

os(1

nbi

ts2)

tot

5 c

or

in

itia

lize

for

eac

h st

airc

ase

for

i25

1n

bits

co

r(aC

um(i

2))

5 c

or(a

Cum

(i2)

)1a(

i2)

co

unt n

umbe

r co

rrec

t at e

ach

leve

lto

t(aC

um(i

2))

5 to

t(aC

um(i

2))1

1

coun

t num

ber

tria

ls a

t eac

h le

vel

end

iopt

5 1

sta

irav

e

tw

o ty

pes

of a

vera

ging

(io

pt is

inde

x in

scr

ipt a

bove

)io

pt5

2m

onot

oniz

e2s

tair

ave

m

onot

oniz

e P

F a

nd th

en d

o av

erag

ing

iopt

5 3

ext

rapo

late

sta

irav

e

extr

apol

ate

and

then

do

aver

agin

gen

dpr

ob5

cor

All

(t

otA

ll1

eps)

get a

vera

ge f

rom

poo

led

data

prob

ave

5 p

robA

ll

(pro

bTot

All

1ep

s)

ge

t ave

rage

fro

m in

divi

dual

PF

sx

5 [

1nb

its2

]

x ax

is f

or p

lott

ing

subp

lot(

32

11is

hift

)

plot

the

top

pair

of

pane

lspl

ot(x

pro

b(1

)x

totA

ll(1

)

max

(tot

All

(1

))rsquo

rsquox

prob

ave(

1)

rsquorsquo)

grid

if is

hift

55

0ti

tle(

rsquono

shif

trsquo)y

labe

l(rsquon

o da

ta m

anip

ulat

ionrsquo

)te

xt(

59

5rsquo(a

)rsquo)el

se ti

tle(

rsquoshi

ftrsquo)

text

(5

95

rsquo(d)rsquo)

end

subp

lot(

32

31is

hift

)pl

ot(x

pro

b(2

)x

pro

bave

(2)

rsquorsquo)

grid

plot

mid

dle

pair

of

pane

lsif

ishi

ft5

5 0

yla

bel(

rsquomon

oton

ize

psyc

hom

etri

c fn

rsquo)te

xt(

56

3rsquo(b

)rsquo)

else

text

(5

95

rsquo(e)rsquo)

end

subp

lot(

32

51is

hift

)

plot

bot

tom

pai

r of

pan

els

plot

(xp

rob(

3)

xp

roba

ve(3

)rsquo

rsquo[nb

its

nbit

s][

45

55]

)gr

idtt

5 rsquoc

frsquot

ext(

59

5[rsquo(

rsquo tt(

ishi

ft1

1) rsquo)

rsquo])

if is

hift

55

0 y

labe

l(rsquoe

xtra

pola

te p

sych

omet

ric

fnrsquo)

end

xlab

el(rsquos

tim

ulus

leve

lsrsquo)

end

en

d lo

op o

f w

heth

er to

shi

ft o

r no

t

Not

emdashT

he m

ain

prog

ram

has

one

tric

ky s

tep

that

req

uire

s kn

owle

dge

of M

atla

b a

5 b

itge

t (i

[1n

bits

])

whe

rei i

s an

inte

ger

from

0 to

2nb

its

1 a

nd n

bits

5 1

0 in

the

pres

ent p

ro-

gram

The

bit

get c

omm

and

conv

erts

the

inte

ger

i to

a ve

ctor

of

bits

For

exa

mpl

e f

or i

5 1

1 th

e ou

tput

of

the

bitg

et c

omm

and

is 1

1 0

1 0

0 0

0 0

0 T

he n

umbe

ri 5

11

(bin

ary

is10

11)

corr

espo

nds

to th

e 10

tria

l sta

irca

se C

C I

C I

I I

I I

I w

here

C m

eans

cor

rect

and

I m

eans

inco

rrec

t

(Man

uscr

ipt r

ecei

ved

July

10

200

1ac

cept

ed f

or p

ubli

cati

on A

ugus

t 8 2

001

)