accurately sized test statistics with misspeci–ed...

32
Accurately Sized Test Statistics with Misspecied Conditional Homoskedasticity Jack Erb y Economics Department University of California, Santa Barbara E-mail: [email protected] Douglas G. Steigerwald Economics Department University of California, Santa Barbara E-mail: [email protected] July 2009 Abstract We study the nite-sample performance of test statistics in linear regression models where the error dependence is of unknown form. With an unknown dependence structure there is traditionally a trade- o/ between the maximum lag over which the correlation is estimated (the bandwidth) and the amount of heterogeneity in the process. When allowing for heterogeneity, through conditional heteroskedas- ticity, the correlation at far lags is generally omitted and the resultant ination of the empirical size of test statistics has long been recog- nized. To allow for correlation at far lags we study test statistics constructed under the possibly misspecied assumption of conditional homoskedasticity. To improve the accuracy of the test statistics, we employ the second-order asymptotic renement in Rothenberg (1988) to determine critical values. The simulation results of this paper sug- gest that when sample sizes are small, modeling the heterogeneity of Corresponding author y We thank the members of the Econometrics Lab at Santa Barbara for helpful com- ments. 1

Upload: ngohanh

Post on 26-May-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Accurately Sized Test Statistics withMisspeci�ed Conditional Homoskedasticity

Jack Erb �y

Economics DepartmentUniversity of California, Santa Barbara

E-mail: [email protected]

Douglas G. SteigerwaldEconomics Department

University of California, Santa BarbaraE-mail: [email protected]

July 2009

Abstract

We study the �nite-sample performance of test statistics in linearregression models where the error dependence is of unknown form.With an unknown dependence structure there is traditionally a trade-o¤ between the maximum lag over which the correlation is estimated(the bandwidth) and the amount of heterogeneity in the process.When allowing for heterogeneity, through conditional heteroskedas-ticity, the correlation at far lags is generally omitted and the resultantin�ation of the empirical size of test statistics has long been recog-nized. To allow for correlation at far lags we study test statisticsconstructed under the possibly misspeci�ed assumption of conditionalhomoskedasticity. To improve the accuracy of the test statistics, weemploy the second-order asymptotic re�nement in Rothenberg (1988)to determine critical values. The simulation results of this paper sug-gest that when sample sizes are small, modeling the heterogeneity of

�Corresponding authoryWe thank the members of the Econometrics Lab at Santa Barbara for helpful com-

ments.

1

a process is secondary to accounting for dependence. We �nd that aconditionally homoskedastic covariance matrix estimator (when usedin conjunction with Rothenberg�s second-order critical value adjust-ment) improves test size with only a minimal loss in test power, evenwhen the data manifest signi�cant amounts of heteroskedasticity. Insome speci�cations, the size in�ation was cut by nearly 40% over thetraditional HAC test. Finally, we note that the proposed test statis-tics do not require that the researcher specify the bandwidth or thekernel.

Key Words: Robust testing, test size, con�dence interval estimation,heteroskedasticity, autocorrelation2000 Mathematics Subject Classi�cation: 62F03, 62F35, 62J05, 62M10,

91B84

1 Introduction

When forming test statistics for coe¢ cients in linear regression models, ithas become widely accepted to use the Newey-West covariance estimator toaccount for error serial correlation. The appeal of the Newey-West method(introduced in 1987) is that it allows for conditional heteroskedasticity, al-though at the cost of only admitting serial correlation at near lags. Theinability to account for serial correlation at far lags leads to test statisticswith empirical sizes that far exceed nominal sizes. To address this problem,Kiefer and Vogelsang (2005) re�ne the asymptotic theory to more accuratelymodel the admission of serial correlation at far lags. As Kiefer and Vo-gelsang show that the resultant non-Gaussian critical values increase withthe admitted lag length, the desire to accommodate both conditional het-eroskedasticity and correlation at far lags carries its own cost of considerablylengthening con�dence intervals. In an e¤ort to reduce this cost, we adjudgewhether modeling the heteroskedasticity is secondary to accounting for de-pendence. To do so, we compare the performance of test statistics that allowfor conditional heteroskedasticity with test statistics constructed under the(possibly) misspeci�ed assumption of conditional homoskedasticity.Driven by the desire to allow for general dependence in economic time

series, White and Domowitz (1984) develop a consistent standard error es-timator under conditional heteroskedasticity. The key condition is that the

2

maximum lag over which the serial correlation is estimated, the bandwidth,is an asymptotically negligible fraction of the sample size. Although theWhite-Domowitz estimator is consistent, it is not guaranteed to be positivede�nite. In response, Newey and West demonstrate that the introduction ofa kernel that downweights correlations as the lag length grows, ensures thatthe consistent standard error estimator is also positive semi-de�nite.With every solution there comes another problem. While the Newey-

West estimator is consistent and positive semi-de�nite, the estimated stan-dard errors are often too small. As Andrews (1991) demonstrates, if theerrors exhibit substantive temporal dependence, then test statistics formedfrom the Newey-West standard error estimates have empirical size far inexcess of nominal size (test statistics that reject too often). To mitigatesize distortion, Andrews and Monahan (1992) propose a two-step method, inwhich the �rst step consists of prewhitening the residuals by �tting a low-dimension process (such as a VAR(1)) to capture serial correlation at farlags. In the second step, the conditional heteroskedasticity is estimated atnear lags.Prewhitening the residuals prior to estimating conditional heteroskedas-

ticity at near lags goes part way to resolving the problem of test overre-jection. In an e¤ort to make further improvements, Kiefer and Vogelsangsuggest forgoing the �rst step prewhitening and estimating the conditionalheteroskedasticity directly at both near and far lags. When including cor-relation at far lags, it is no longer tenable to assume that the bandwidth isan asymptotically negligible fraction of the sample size. In consequence,the (�rst-order) asymptotic distribution of resultant test statistics is notGaussian. The alternative asymptotic distribution delivers simulated criticalvalues that are considerably larger than their Gaussian counterparts. If onlyserial correlation at near lags is admitted, then the re�ned asymptotic criti-cal values deliver size improvements in line with the improvements obtainedby prewhitening. If serial correlation at all lags is admitted (note that thetheory does not deliver an optimal bandwidth), then there are substantialfurther reductions in empirical size.A key insight in previous research is the need to account for serial cor-

relation at far lags to obtain more accurate coverage probabilities. Currentmethods to account for correlation at far lags are either completely general,as in Kiefer and Vogelsang, or speci�c, as in Andrews and Monahan.1 While

1Other variance estimators have also been proposed such as the VARHAC estimator

3

Kiefer and Vogelsang allow for conditional heteroskedasticity of unknownform, the cost is longer con�dence intervals resulting in loss of power for as-sociated test statistics. The low-dimension parametric method of Andrewsand Monahan produces con�dence intervals of more moderate length, butstill su¤ers from high empirical size. It is therefore of interest to studymethods that lie between the two, to adjudge the trade-o¤ between size andpower.In contrast to Kiefer and Vogelsang, we propose broadening the �rst step

of Andrews and Monahan by �tting a high-dimension process to captureserial correlation at all lags, while forgoing the second-step conditional het-eroskedasticity estimation. The assumption underlying the method is thatstandard errors can be well approximated by a conditionally homoskedas-tic covariance matrix that is band diagonal. The band diagonals are notrestricted to be related through a low-dimension process. To obtain size im-provements we too rely on asymptotic re�nements, namely the second-ordertheory of Rothenberg (1988). The second-order theory yields critical valuesthat adjust to incorporate the behavior of the regressors.2 As Rothenbergestablishes, the bandwidth need not be an asymptotically negligible frac-tion of the sample size, so all correlation lags are included under conditionalhomoskedasticity. Further, the estimator is positive semi-de�nite by con-struction without need of a kernel.We study the size and size-adjusted power of test statistics constructed

under the three methods. We focus not only on hypothesis tests of a singleparameter, but also on tests of multiple parameters to determine the impactof o¤-diagonal elements of the estimated covariance matrix. In Section 2we present the quantities of interest and the models to be simulated. Theconditional heteroskedasticity speci�cations allow us to investigate an ad-ditional observation in Rothenberg: Namely, that the degree of correlationbetween the regressor under test and the conditional heteroskedasticity plays

(Den Haan and Levin, 2000) or the bootstrap estimator (see e.g., Götze and Künsch,1996; Hall and Horowitz, 1996; Huitema, McKean, and McKnight, 2000; Goncalves andVogelsang, 2004). The simulation results of Den Haan and Levin (1996, 2000) show thatthe VARHAC estimator performs comparably to the Andrews and Monahan prewhitenedestimator, and Kiefer and Vogelsang (2005) show that the block bootstrap performs com-parably to tests employing the �xed-bandwidth asymptotic critical values.

2Rothenberg assumes that the regressors are strictly exogenous, while Newey-West,Andrews-Monahan and Kiefer-Vogelsang assume that the regressors are only weakly ex-ogenous. The simulations in Erb and Steigerwald (2009) study the relative performanceof the Rothenberg correction under both strict and weak exogeneity.

4

a key role in the empirical size. We examine the range of models typicallyused to assess the performance of con�dence intervals under conditional het-eroskedasticity. Results from the simulations are contained in Section 3.We provide an empirical application to the estimation techniques in Section4.

2 Covariance Estimators

To determine the �nite-sample size and size-adjusted power of hypothesistests constructed under the (potentially misspeci�ed) assumption of condi-tional homoskedasticity we employ a simulation model. Our simulationmodel is

Yt = X 0t� + Ut t = 1; : : : ; n, (1)

where Xt and � are k�1 vectors. To allow for comparison with the �ndingsin both Andrews as well as Andrews and Monahan, we set k = 5 where Xt

contains a constant and four regressors. Conditional heteroskedasticity isintroduced through a scale parameter that depends equally on each of thevarying regressors

Ut = jX 0t�j � ~Ut;

where � =�0; 1

2; 12; 12; 12

�0and

n~Ut

ois a sequence of possibly dependent ran-

dom variables de�ned below. This speci�cation of conditional heteroskedas-ticity is also employed by Andrews and Andrews and Monahan to demon-strate the superior performance of estimators that incorporate conditionalheteroskedasticity over the more traditional parametric covariance estima-tors.The relative magnitude of conditional heteroskedasticity present in the

model is controlled through the degree of serial correlation in the regres-sors and error. To capture serial correlation, the regressors and error aregenerated for each t = 1; : : : ; n (and each k = 2; : : : ; 5) as

~Ut = �U ~Ut�1 + "t;

Xkt = �XXkt�1 + �kt:

The underlying errors, "t and �kt, are mutually independent N (0; 1) randomvariables.3 The serial correlation parameters (�U ; �X) take values in the set

3We set "0 and �k0 equal to zero in the simulations and discard the �rst 50 observationsto remove any in�uence from initial values.

5

� = f0; :1; :3; :5; :7; :9g.4Rothenberg derives the second-order asymptotic distribution of test sta-

tistics on coe¢ cients of a linear regression under the assumption of condi-tional homoskedasticity. He �nds (on page 1011) that as the correlationbetween the regressor under test and the scale parameter increases, the in�a-tion of the empirical size of the test statistics increases. Therefore, we employa second conditional heteroskedasticity speci�cation in which the scale para-meter depends on only one of the varying regressors. To isolate the impactnoted by Rothenberg, we consider two values for the parameter �, speci�cally� = (0; 1; 0; 0; 0)0 and � = (0; 0; 1; 0; 0)0, while testing the hypothesis that the�rst varying regressor is equal to zero. The �rst speci�cation provides thehighest level of correlation between the regressor under test and the errorscale, while in the second speci�cation the correlation between the regressorand scale is zero.5 ;6

We focus not only on hypothesis tests of a single coe¢ cient, but alsoon tests of multiple coe¢ cients. Our hypothesis tests of multiple coe¢ -cients are designed to assess the e¤ect of including o¤-diagonal elements ofthe covariance matrix. To capture this e¤ect, we consider the single re-striction imposed by the hypothesis H0 : �2 � �3 = 0. In the multiplecoe¢ cient tests we consider values of the heteroskedasticity parameter, �,equal to

�0; 1

2; 12; 12; 12

�0, (0; 1; 0; 0; 0)0, and (0; 0; 0; 1; 0)0.

To construct test statistics for hypotheses concerning �, an estimatorof the (conditional) variance of the ordinary least-squares estimator, B, isneeded. The variance of the (ordinary) least-squares estimator, conditional

4Data were also simulated according to a moving-average speci�cation, under whichthe serial correlation is of limited duration and conditional heteroskedasticity plays acorrespondingly larger role. The results were similar to those found in the autoregressivespeci�cations and can be found in Erb and Steigerwald (2009).

5The values of � have been chosen to ensure that the unconditional variance of Ut isthe same in all speci�cations. Andrews and Monahan also study this speci�cation, albeitwithout reference to the �ndings in Rothenberg.

6To see that this model brings conditional heteroskedasticity, consider the case in which� = (0; 1; 0; 0; 0)0. Then

E (UtUt�1 j X) = E�jX2tj ~Ut � jX2t�1j ~Ut�1

��� X� ;and the covariance conditional on X is �u

1��2ujX2tjjX2t�1j:

6

on X = (X1; : : : ; Xn)0, is

V ar�n12 (B � �)

���X�=

n�1

nXt=1

XtX0t

!�1n�1

nXs=1

nXt=1

E (UsXsUtX0tjX)

n�1

nXt=1

XtX0t

!�1:

The key component for estimation is J = n�1Pn

s=1

Pnt=1E (UsXsUtX

0tjX).

We consider three estimators of J .7 The �rst is the heteroskedasticity-autocorrelation consistent estimator introduced by Newey and West

Jhac =1

n� 5

"nXt=1

U2t XtX0t +

mXj=1

�1� j

m+ 1

� nXt=j+1

UtUt�j�XtX

0t�j +Xt�jX

0t

�#

wherenUt

ont=1

is the OLS residual vector.8 The value of m determines the

maximum lag length at which the conditional heteroskedasticity is estimated.As m controls the number of far lags that enter the estimator, sample-basedselection of m is very important in controlling the size of test statistics. Thevalue of the bandwidth is allowed to vary across simulations and is chosenaccording to the automatic selection procedure developed by Andrews.We also focus on a variant of the heteroskedasticity-autocorrelation con-

sistent estimator, discussed in Andrews and Monahan. To reduce the biasin Jhac, Andrews and Monahan advocate separating the variance estimationinto three steps. First, estimate the temporal correlation (at far lags) by

�tting a vector autoregression tonVt

o(where Vt = UtXt), which yields the

prewhitened residualsn~Vt

o. (In our implementation, we �t a vector au-

toregression of order 1.) Second, construct the variance estimator with theprewhitened residuals

~Jpw =1

n� 5

"nXt=1

~Vt ~V0t +

mXj=1

�1� j

m+ 1

� nXt=j+1

�~Vt ~V

0t�j +

~Vt�j ~V0t

�#:

7Not considered here are the classic OLS variance estimator under the assumption ofi.i.d. errors and the parametric, AR(1) variance estimator. Simulation results for theseestimators can be found in Erb and Steigerwald (2009).

8We use n� 5, rather than n, as the divisor because the degrees-of-freedom calculationis likely to be used when the sample size is small, as is the case in our simulations wheren = 50.

7

The last step is to recolor the estimator ~Jpw to obtain the prewhitened vari-ance estimator, Jpw, according to

Jpw = C ~JpwC0, where C =

I �

pXs=1

As

!�1.

Here,nAs

ops=1

are the estimated coe¢ cient matrices from a pth-order vector

autoregression of Vt.9

With the reduction in correlation brought about by the �rst step prewhiten-ing, there is less need to select a large value of m.10 Although the varianceof Jpw exceeds the variance of Jhac, Andrews and Monahan �nd that use ofthe prewhitened residuals reduces the (downward) bias of Jhac. The down-ward bias in the estimated standard errors is reduced to such an extent that,despite a loss of precision in estimating the standard errors, the coverageprobabilities of con�dence intervals are increased.Constructing accurately sized tests with each of the above estimators of

J remains a problem. However, the parametric prewhitening in Jpw doesimprove the size of test statistics relative to Jhac. Therefore, it may bethe case that increasing the richness of the �rst step, in which temporalcorrelation is accounted for, lessens the importance of the second step, inwhich conditional heteroskedasticity is accounted for. Rather than assumea low-dimension parametric model for temporal correlation �such as a small-order, autoregressive model �one could assume that the errors are generatedby a conditionally homoskedastic (stationary stochastic) process with nothingfurther known about the autocorrelation function. Under this assumption,

n�1nXs=1

nXt=1

E (UsXsUtX0tjX) = n�1

nXs=1

nXt=1

�jt�sjXsX0t,

where �jt�sj = E (UsUtjX) depends only on jt� sj. The third estimator ofJ , which is consistent if the errors are conditionally homoskedastic, is

Jcho =1

n� 5

nXs=1

nXt=1

�jt�sjXsX0t;

9To ensure the matrix I �P

sAs is not too close to singularity, we restrict the eigen-values of

Ps As to be no larger than 0.97 in absolute value. See Andrews and Monahan

(1992, pg. 957) for the details.10This estimator, which includes prewhitening, retains many of the asymptotic proper-

ties of Jhac including the rate of convergence.

8

where (for t > s) �jt�sj = 1n

Pnt=s+1 UtUt�s. (If s > t, simply switch the

values of t and s in the formula.)11

Under traditional asymptotics, all of the estimators of J lead to Gaussianlimit distributions in testing situations. However, extensions to the as-ymptotic theory are available for two of the estimators. For Jhac, Kieferand Vogelsang have developed an alternative limit theory based on the as-sumption that the fraction of lags that appear in the estimator, m

n, is not

asymptotically negligible.12 The critical values that arise from the alter-native limit theory can be considerably larger than the standard Gaussiancritical values. As these critical values depend on m, we report simulationresults for two sets of values of the bandwidth. The �rst uses the Andrewsautomatic bandwidth procedure to compute the bandwidth and the secondsets the bandwidth equal to the sample size (m = n).13 ;14

For Jcho, Rothenberg provides critical values based on a higher-order as-ymptotic re�nement under strict exogeneity. If cv denotes the critical value

11Our �nite-sample results are designed to guide researchers with moderate sample sizes,in which size in�ation is known to be a problem. Although Jcho is not a consistent esti-mator of J under conditional heteroskedasticity, a consistent estimator is easily obtainedby switching from Jcho to Jhac for large sample sizes (say n > 1; 000).12A similar limit theory is developed by Phillips, Sun and Jin (2006).13Our goal is to evaluate the accuracy of the Kiefer-Vogelsang limit theory under both

"large" and "small" speci�cations of the bandwidth parameter, and we use the Andrewsmethod to select the "small" bandwidth. Technically, the limit theory will not hold ifthe Andrews method is employed, as the Andrews procedure gives a bandwidth valuethat is o (n) and Kiefer and Vogelsang assume that the bandwidth parameter is O (n).However, we �nd in our simulations that the Kiefer-Vogelsang testing procedure performssubstantially better (in terms of test size) when the Andrews procedure is implementedfor each replication, as opposed to using some �xed value for the bandwidth, say m

n = 0:2,for all replications. For this reason, we report the simulation results using the Andrewsmethod.14Unfortunately, it is not clear whether the asymptotic theory of Kiefer and Vogelsang

extends to tests employing the prewhitened HAC estimator. The Kiefer-Vogelsang criticalvalues depend on the ratio of the bandwidth to the sample size

�mn

�in the HAC estimator.

Although the prewhitened variance estimator also requires a speci�ed bandwidth in thesecond stage, it is typically much smaller than the bandwidth without the �rst stageprewhitening (as much of the dependence has already been removed). Although theempirical size may be improved in some cases by prewhitening, the limit theory may nothold in all cases. For example, Vogelsang and Franses (2005) employ prewhitening withthe alternative critical values and �nd (at least in one simulation speci�cation) that, "over-rejections can be a problem with moderately persistent data and prewhitening may notimprove matters" (p. 15).

9

from the �rst-order Gaussian approximation, then Rothenberg�s second-ordertheory delivers the adjusted critical value

cvR = cv

�1 +

1

nf�X; U

��.15

His asymptotic re�nements indicate that Gaussian critical values should gen-erally be increased (as f is generally greater than zero), although the preciseform of his covariance estimator di¤ers slightly from Jcho.16 Because theadjusted critical value is a function of

�X; U

�, the adjusted critical value is

correlated with the estimated standard error. If this correlation is negative,then the critical value adjusts to the magnitude of the standard error andlessens the length of estimated con�dence intervals. Such an adjustmentfeature can lead to a test statistic with large gains in size at the cost of onlysmall reductions in size-adjusted power.

3 Simulation Results

For the simulations, we construct Yt according to (1) with � = 0. Lettingc be a 5 � 1 vector of constants that selects the parameters under test, weconstruct the test statistic for the hypothesis H0 : c

0� = 0 according to

t = [c0VBc]�1=2 �

pn c0B,

where B is the OLS estimate of � and VB is the 5 � 5 sample analog ofV ar

�n12 (B � �)

���X�. (VB is constructed for each of the variance estimatorsin Section 2.) The selected critical values are for a two-sided test with �vepercent nominal size. The sample size is n = 50 in all models to allowfor direct comparison with the simulation results presented in recent papers,such as Kiefer and Vogelsang (2005) and Phillips, Sun and Jin (2006). Eachexperiment consists of 50,000 replications. While there are a variety ofstatistics that can be used to assess the �nite-sample performance of the

15The precise form of f is detailed in the Appendix.16Rothenberg considers covariance estimators of the form 1

n�jPUtUt�j , where j cor-

rects for the number of observations lost due to the lag length. To ensure a positivesemi-de�nite estimator we replace the factor 1

n�j with1

n�5 (see McLeod and Jimenez,1984).

10

variance estimators, we follow the convention of more recent authors andfocus our attention on the �nite-sample size and size-adjusted power of teststatistics.It is well known that with any hypothesis test, there is a trade-o¤between

size and power. Because each test employs the OLS estimator as a pointestimate, improvements in size will generally be accompanied by a corre-sponding decrease in power. However, each test varies in either the varianceestimate, critical value, or both. Consequently, there is the possibility thata particular estimator may display more accuracy in terms of both improvedtest size and higher power against alternatives.

3.1 Single Parameter Tests

We �rst study H0 : c0� = 0, with c = (0; 1; 0; 0; 0)0, and so test whether

the coe¢ cient on the �rst non-constant regressor is signi�cantly di¤erentfrom zero. This choice of c leads to a standard error estimate generatedfrom only one of the diagonal elements of VB. In the tables that follow,the test statistics are referenced by the covariance matrix estimator the testemploys. Thus, hac denotes the t-statistic when VB is constructed accordingto Jhac and evaluated with standard asymptotic critical values. Criticalvalues from asymptotic re�nements are indicated by superscripts, so choR isthe t-statistic constructed with Jcho and evaluated with Rothenberg�s second-order critical values. In similar fashion, hacKV indicates use of Jhac with theKiefer-Vogelsang asymptotic approximation to generate test critical valuesand a bandwidth determined by the Andrews automatic selection procedure.As the Kiefer-Vogelsang approximation allows the bandwidth to equal thesample size, we denote this statistic as hacKV n.Table 1 reports the �nite-sample empirical size of each test statistic when

the AR(1) errors are overlaid with multiplicative heteroskedasticity enteringfrom all four non-constant regressors, i.e., � = (0; 1

2; 12; 12; 12)0. The nominal

size for each test is 0.05. As with previous authors, we �nd that the tradi-tional hac test performs quite poorly in terms of test size, especially whenthe dependence in the data is strong. Indeed, when �X = �U = 0:9, thehac test rejects the null hypothesis 38% of the time. This is quite unset-tling considering the applied researcher will be making inference based on anominal size of 5%.As the test statistics all employ the same point estimate in the numerator,

improvements in test size will be achieved by either increasing the standard

11

Table 1: Empirical Size - Heteroskedastic AR(1) Errors - � =�0; 12 ;

12 ;

12 ;

12

�(1) (2) (3) (4) (5) (6)

�X �U hac pw hacKV hacKV n cho choR

0.9

0.90.70.50.30.10.0

0.38080.28340.21260.16550.13100.1213

0.32350.23510.18030.15010.13080.1254

0.31710.23270.17660.13910.10880.0987

0.23320.16860.13220.11020.09340.0892

0.31920.23710.19410.16680.14790.1403

0.23670.16410.13060.10760.09060.0809

0.7

0.90.70.50.30.10.0

0.28590.23520.19330.15540.13270.1209

0.24600.20250.17260.14690.13320.1260

0.23930.19700.16450.13240.11080.1007

0.16800.14140.12170.10340.09030.0886

0.24810.20870.18440.15910.14400.1345

0.18540.15080.13260.11360.10180.0929

0.5

0.90.70.50.30.10.0

0.21080.18360.16210.14070.12260.1159

0.18800.16690.15190.13820.12750.1244

0.17450.15570.13850.12140.10540.0994

0.12080.10800.10190.09270.08310.0819

0.19480.17110.15690.14340.13420.1285

0.14930.12950.12040.10770.10080.0963

0.3

0.90.70.50.30.10.0

0.15920.14730.13670.12390.11670.1142

0.14920.14190.13530.12770.12320.1240

0.13550.12610.11940.10800.10240.0996

0.09490.09240.08710.08190.07930.0783

0.15790.14710.14090.12810.12820.1237

0.12730.11530.11000.09940.09960.0967

0.1

0.90.70.50.30.10.0

0.12150.11870.11990.11380.10910.1094

0.12440.12360.12770.12270.11800.1186

0.10460.10240.10470.09950.09570.0954

0.07490.07770.07960.07580.07450.0736

0.12950.12850.12890.12390.11890.1195

0.10730.10360.10310.09970.09370.0945

0.0

0.90.70.50.30.10.0

0.11050.10870.10890.10690.10780.1059

0.11900.11810.11910.11780.11960.1172

0.09480.09410.09490.09450.09470.0925

0.07190.07280.07450.07470.07590.0729

0.12250.12250.12030.11810.12000.1190

0.10110.09950.09580.09490.09470.0939

12

error in the denominator or widening the test critical values. In column 2,we see that prewhitening residuals reduces test size by in�ating the estimatedstandard errors, although over rejection of the null remains a problem. Inthe �X = �U = 0:9 case, the empirical size drops to 0.32, a 16% size gain.However, attempting to remove correlation at far lags by prewhitening whenserial correlation in the data is weak can in�ate test size even more thanusing the HAC estimator, which can be seen by comparing columns 1 and 2when either �X or �U is less than 0.3.Column 3 shows that use of the Kiefer-Vogelsang asymptotic re�nements

reduces size in�ation by widening the test critical values. The new limittheory leads to critical values that typically take on (absolute) values in therange of 2.0 to 4.81, depending on the chosen bandwidth for the HAC esti-mator. When the bandwidth is chosen according to the Andrews procedure,the size gains o¤ered by the Kiefer-Vogelsang asymptotics are slightly largerthan those achieved by prewhitening, but are typically comparable when theserial correlation is strong.Similar to prewhitening, we �nd that Jcho delivers larger standard errors

that improve the �nite-sample size of the resulting test statistics, and similarto the Kiefer-Vogelsang asymptotics, we �nd that the second-order criticalvalue re�nement of Rothenberg further improves test size by increasing thecritical values. Columns 5 and 6 of Table 1 present the empirical size of teststatistics when Jcho is used in conjunction with either standard normal criticalvalues or the second-order critical values of Rothenberg, respectively. Evenunder misspeci�cation, the choR test delivers more substantial size reductionsthan both the pw and hacKV tests. For �X = �U = 0:9; the empirical sizeof the choR test is 0.24. While the actual size remains signi�cantly largerthan the 5% nominal level, the choR test reduces size in�ation by more thana third of the level of the hac test. Moreover, the improved accuracy ofthe choR test over the more conventional HAC tests continues to hold forvery low levels of serial correlation. For �X = �U = 0:1, the second-orderadjusted, homoskedastic estimator improves test size accuracy relative to theNewey-West estimator by about 14%, or a drop in raw size from 0.1091 to0.0937.17

Recall that for a �xed value of �X , the heteroskedastic component of the

17As the table makes clear, the conditionally homoskedastic variance estimator can onlybe recommended in conjunction with Rothenberg�s second-order critical value adjustmentwhen the data is heteroskedastic. While the cho test has better size than the hac test ifthe serial correlation is high, it rarely exhibits smaller size than the pw or hacKV tests.

13

error becomes more pronounced as �U decreases. That the conditionallyhomoskedastic estimator retains an advantage in test size in this case isquite remarkable. The favorable performance of choR is due primarily tothe adaptability of the critical value to the data generating process. Thesecond-order theory for Jcho delivers critical values that, while typically largerthan their Gaussian counterparts, adjust with the regressors and residualsin such a way that the critical value increases when the estimated standarderror is small and decreases when the estimated standard error is large. Thisnegative correlation between the standard error and critical value serves asa hedge in cases where over-rejections of the null are most likely to occur.As can be seen from column 5 of Table 1, Jcho is indeed in�uenced by theheteroskedasticity as the cho test performs more poorly than the hac testwhen �U drops below 0.3. However, when the Jcho estimator is used inconjunction with the second-order critical values, the choR test retains a sizeadvantage over the hac and pw tests for all values of �X and �U , and a sizeadvantage over hacKV for all but the smallest values of �X and �U .

18

Table 2 reports the mean and variance of Rothenberg�s second-order crit-ical value, as well as its correlation with the estimated standard error, when�X is �xed at 0.9. While the mean of the critical value remains relativelyconstant as �U decreases, the variation increases and the negative correlationwith the estimated standard error becomes more pronounced. The adapt-ability of the critical value allows the conditionally homoskedastic estimatorto maintain size improvements over the hac, hacKV , and pw tests, even asthe heteroskedasticity in the process becomes more pronounced.It is important to note that the asymptotic theory put forth by Kiefer and

Vogelsang does not restrict the bandwidth to be small relative to the samplesize in order for the testing procedure to be valid. When the bandwidth isset equal to the sample size, the considerable downward bias of the Newey-West estimator is o¤set by an adjusted critical value of 4.81, which is almosttwo and a half times the critical value of the standard normal distribution.In comparison, the average critical value for the hacKV test (as selected bythe Andrews method) ranges from 2.04 when the temporal dependence islow to 2.26 when the dependence is high, and the average critical value for

18Intuition would suggest that for a �xed value of �U the performance of the cho andchoR tests should deteriorate as �X increases. However, �X also adds to the overallpattern of serial correlation in the regressors as well as the errors. For this reason, thecho and choR tests show size gains over other HAC tests when �U is large, even when �Xis large.

14

Table 2: Performance of Rothenberg�s second-order-adjustedcritical value under heteroskedasticity

�X = 0:9 (1) (2) (3)

�U Mean V ariance Corr: withpc0VBc

0.9 2.37 0.04 -0.150.7 2.34 0.04 -0.220.5 2.32 0.05 -0.270.3 2.32 0.05 -0.320.1 2.33 0.08 -0.340.0 2.34 0.08 -0.37

the choR test ranges from 2.11 when the dependence is low to 2.37 when thedependence is high. Column 4 in Table 1 shows that for more moderatelevels of dependence, further reductions in test size are achieved with theKiefer-Vogelsang critical values if the bandwidth is �xed and equal to thesample size, though the improvements are small. However, such drasticin�ation of the critical value is sure to decrease the probability of rejectingthe hypothesis for all values of �, and the slight size improvements of thehacKV n test over the choR test prove to be extremely costly in terms of testpower.In order to ascertain the power of the various testing procedures, we again

simulate (1) and compute the size-adjusted power against alternative valuesof �. It is important to �rst note that size adjustment is not possible inpractice as the underlying data generating process is typically not knownto the researcher. However, the technique may be useful in developingmethods as it is useful in comparing the power of tests that do not have thesame �nite-sample size. In our simulations, the stochastic process is known,and a size-adjusted critical value for each speci�cation can be calculated viasimulation.Figure 1 plots the upper half of the size-adjusted power functions for

the test statistics under the AR(1) speci�cation presented in Table 1 when�X = �U . We compute size-adjusted power as the fraction of test rejectionsthat arise when the true value of �2 is di¤erent from the hypothesized nullvalue of zero. Speci�cally, we set � = � (0; 1; 0; 0; 0)0 where is chosenas a set of eleven, equally spaced points from zero to some upper bound for

15

Figure 1: Heteroskedastic AR(1) Regressors and Errors� =

�0; 1

2; 12; 12; 12

�0�H0 : �2 = 0

0 2 4 60

0.2

0.4

0.6

0.8

1ρX = ρU = 0.9

β2

Siz

e­A

dj. P

ow

er

0 1 2 30

0.2

0.4

0.6

0.8

1ρX = ρU = 0.7

β2

Siz

e­A

dj. P

ow

er

0 0.5 1 1.50

0.2

0.4

0.6

0.8

1ρX = ρU = 0.5

β2

Siz

e­A

dj. P

ow

er

0 0.5 1 1.50

0.2

0.4

0.6

0.8

1ρX = ρU = 0.3

β2

Siz

e­A

dj. P

ow

er

0 0.5 1 1.50

0.2

0.4

0.6

0.8

1ρX = ρU = 0.1

β2

Siz

e­A

dj. P

ow

er

0 0.5 1 1.50

0.2

0.4

0.6

0.8

1ρX = ρU =   0

β2

Siz

e­A

dj. P

ow

er

hacpw­hachacKVn

choR

16

which the estimated power for all tests is roughly one.19 We simulate thepower of test statistics using 10,000 replications for each value of . Thesize-adjusted critical values are also computed via simulation methods using50,000 replications. As each estimator gives rise to only one �nite-sampledistribution, the size-adjusted critical values and size-adjusted power curvesfor the hac and hacKV tests are equivalent (as are those for the cho and choR

tests).Returning to our discussion of the hacKV n test, the most striking feature

of Figure 1 is the extent to which the power of the hacKV n test lags the powerof the other tests. As an example, consider the case where �2 = 2:4 andthe serial correlation parameters are equal to 0.9. The size-adjusted powerof the hac test is 0.72, while the power of the choR test is 0.70. Recall theempirical sizes of the two tests were 0.38 and 0.24, respectively, indicating a37% reduction in test size is achieved with approximately a 3% decrease intest power. As noted above, the empirical size is further improved to 0.23when using the hacKV n test. However, the power of the test falls to 0.59,and we see that the slight additional improvement in the size of the hacKV n

test comes at the cost of reducing the power by 16%. Last, the �gure showsthat the choR test provides size-adjusted power that is quite similar to boththe hac and pw tests (with minimal crossing).In general, the power functions for other values of �X and �U are similar

in shape and relative performance to those presented in Figure 1.20 TheHAC, prewhitened, and conditionally homoskedastic estimators give rise totests with very similar power, while the hacKV n test exhibits power that issubstantially lower across the range of alternatives.An unfortunate feature of the hac and pw tests is that rejection depends

heavily upon user-choice parameters (including bandwidth, weighting kernel,and prewhitening model). For this reason, Andrews and others have devel-oped data-driven, "optimal" bandwidth selection procedures in an attemptto make the estimation process more automatic.21 The Kiefer-Vogelsang

19For example, if the upper bound is set at 6, then the distance between each elementof is 6

10 , and = f0; 0:6; 1:2; 1:8; : : : ; 6g :20Note that the range of values along the �2 axis di¤ers for alternative values of the

serial correlation parameters. All test are showing substantial increases in power as thecorrelation in the data falls.21Researchers often prefer "automatic" bandwidth procedures as they simplify the esti-

mation process, provide a level of uniformity across di¤erent projects, and eliminate theneed to make "arbitrary" decisions about the amount of dependence in the data. However,

17

asymptotics eliminate the need of choosing an "optimal" bandwidth by al-lowing the critical values to adjust with the chosen bandwidth. In this way,the choice of bandwidth is incorporated directly into the testing problem.In practice, however, the researcher must still chose a bandwidth, and di¤er-ent choices may give rise to very di¤erent inferences on the parameter undertest.22

To illustrate the problem that bandwidth selection could cause, Table 3shows the probability that the hacKV test rejects the null hypothesis for atleast one bandwidth in a given replication. That is, for any given replication,we constructed a test statistic and critical value pair for each value of thebandwidth between m = 5 and m = n = 50.23 If the null hypothesis wasrejected for one or more of the test statistic/critical value pairs, the entire trialwas considered as if it rejected the null hypothesis. We then repeated theprocess 50,000 times and found the fraction of replications which produced atleast one rejection. The probability of �nding at least one bandwidth valuethat produces a test rejection in the hacKV test (Table 3) is much larger thanthe empirical size of the hacKV tests in column 3 of Table 1. In fact, theprobabilities are often larger in magnitude than the sizes of the traditionalhac test in column 1 of Table 1. Clearly, the failure to properly account forthe pre-test estimation of the bandwidth results in further size in�ation andnegates the advantages of the hacKV test over the more traditional tests.The problem of nuisance parameters in the testing process highlights an

important advantage of Jcho over other estimators. First, Jcho estimates thecorrelation at all lags, eliminating the necessity of choosing a particular band-width. Second, there is no weighting kernel or prewhitening �lter involvedin the estimation. And third, the second-order critical value adjustment is

this reasoning is not intellectually sound, as arbitrary speci�cations underpin all mechan-ical bandwidth methods For example, the Andrews bandwidth procedure is a functionof J , and therefore, a preliminary estimate of J is necessary to compute the bandwidth tobe used in the estimation of Jhac. Typically, the AR(1) model is used as a preliminaryestimate of J , but alternative models will lead to alternative bandwidths.22There is no data-dependent method of choosing an "optimal" bandwidth for the hacKV

test. Phillips, Sun and Jin propose a data dependent rule for their test that minimizesa weighted sum of type I and type II errors, which Kiefer and Vogelsang conjecture canbe extended to their test. However, in place of selecting the bandwidth, the researcher isnow left to choose the proper weights for the type I and II errors.23To ensure the estimator accurately accounts for the serial correlation, we impose a

minimum bandwidth of 5. Allowing for bandwidths less than 5 further in�ates the rejec-tion probabilities.

18

Table 3: Probability the hacKV test rejects the nullhypothesis for at least one value of the bandwidth

AR(1) regressors and errors �� =�0; 1

2; 12; 12; 12

��X

�U 0.9 0.7 0.5 0.3 0.1 0.0

0.9 0.3662 0.2788 0.2164 0.1724 0.1442 0.13360.7 0.2776 0.2310 0.2020 0.1654 0.1418 0.13000.5 0.2258 0.2030 0.1756 0.1482 0.1454 0.12560.3 0.1916 0.1752 0.1620 0.1448 0.1342 0.11740.1 0.1766 0.1542 0.1478 0.1344 0.1380 0.12840.0 0.1468 0.1458 0.1362 0.1250 0.1328 0.1412

"automatic" in the sense that it is data dependent and can be implementedwithout any choices by the user.While the choR test performs favorably under the speci�cation in Table 1

when compared to more traditional tests commonly used in the literature, it isalso important to evaluate the test under varying degrees of serial correlationand heteroskedasticity.24 One way to alter the degree of heteroskedasticityin the data is to adjust the impact of individual regressors on the error scale.Recall that the error term is generated according to

Ut = jX 0t�j � ~Ut:

Rather than set � = (0; 12; 12; 12; 12)0, as in the previous speci�cations, we now

set � = (0; 1; 0; 0; 0)0 and allow the heteroskedasticity to arise from scaling thehomoskedastic error term, ~Ut, by the absolute value of the �rst non-constantregressor. As we are testing the hypothesis H0 : �2 = 0, it may matterwhether the heteroskedasticity is brought about by the �rst non-constantregressor or some other regressor in the model. For this reason, we alsoreport the empirical size when the error is scaled by the absolute value of thesecond non-constant regressor, � = (0; 0; 1; 0; 0)0.Panel A of Table 4 shows the empirical size of tests when the heterogeneity

arises from the �rst non-constant regressor where, for brevity, we report only

24Not surprisingly, when the errors truly are homoskedastic, the choR test outperformsall other robust tests in terms of �nite-sample empirical size, as it exploits the homogeneityin the data. These results can be found in Erb and Steigerwald (2009).

19

Table 4: Empirical Size - Heteroskedastic AR(1) Errors

(1) (2) (3) (4) (5) (6)�X = �U hac pw hacKV hacKV n cho choR

Panel A: � = (0; 1; 0; 0; 0)0

0.90.70.50.30.10.0

0.44520.26720.18340.14170.12710.1244

0.39590.23130.17080.14400.13690.1335

0.38200.22870.16090.12530.11230.1108

0.27810.16250.11420.09350.08390.0829

0.48510.37480.31690.27910.26970.2945

0.40230.30900.26740.23990.23210.2312

Panel B: � = (0; 0; 1; 0; 0)0

0.90.70.50.30.10.0

0.31910.19250.13190.09940.08860.0866

0.25610.16320.12340.10490.09880.0978

0.25660.15800.11040.08440.07610.0748

0.18830.11200.08220.06720.06100.0624

0.21360.11680.08240.06700.06060.0607

0.14020.07560.05590.04730.04390.0439

the case where �U = �X . While the choR test performs comparably to theother robust tests when the amount of serial correlation is high, it no longerexhibits a size advantage, and it becomes progressively disadvantaged as thevalues of �U and �X fall toward zero. However, when testing the coe¢ cient onthe second non-constant regressor, the results change dramatically. PanelB reports the empirical size of the t-tests when � = (0; 0; 1; 0; 0)0 and theheteroskedasticity arises from a regressor other than the regressor under test.In this case, the choR test o¤ers a considerable size advantage over all othertests, even as the values of �X and �U approach zero. In fact, as the serialcorrelation parameters fall to zero, the size of the choR test falls below the5% nominal level.This table would appear to con�rm the observation in Rothenberg that

the degree of correlation between the regressor under test and the error vari-ance has a substantial impact on in�ating the empirical size of test statis-tics. While this is true for all tests under examination, the size distortionis especially pronounced for choR. In practice, the form in which the het-eroskedasticity enters the model appears to be of considerable importancewhen performing tests of hypotheses.

20

Figures 2 and 3 plot the corresponding size-adjusted power curves for theAR(1) process with �X = �U when � = (0; 1; 0; 0; 0)

0 and � = (0; 0; 1; 0; 0)0,respectively. Once again, we see choR performs comparably to hac andhacKV in terms of size-adjusted power, and all three outperform hacKV n bya considerable margin.

3.2 Multiple Parameter Tests

We now brie�y turn our attention to testing hypotheses that involve multi-ple parameters. These multi-parameter tests incorporate at least one o¤-diagonal element of VB in computing the standard error. Speci�cally, weexamine the performance of tests under the null hypothesis H0 : �2� �3 = 0(or H0 : c

0� = 0 for c = (0; 1;�1; 0; 0)0) for each of the heteroskedastic spec-i�cations listed above. The results are found in Table 5, where once againwe report only the results from setting �U = �X .In testing the hypothesis �2 � �3 = 0, the choR test provides the best

empirical size under all heteroskedasticity speci�cations and under all de-grees of dependence, excepting the case where � = (0; 1; 0; 0; 0)0. The sizeadvantage of the choR tests in the �rst and third heteroskedasticity speci�-cations is quite large. The empirical size of the choR test is cut by morethan half the level of the traditional estimators, and it even exhibits bettersize than the hacKV n test. In both cases, the empirical size approaches thenominal size rather quickly as the serial correlation decreases. When theheteroskedasticity enters solely from the regressors under test, the choR testis not as disadvantaged as in previous speci�cations, but still cannot be givena strong recommendation.Last, note that in the model under study, the testing problem is made

easier by the introduction of the second parameter to the hypothesis. Com-paring the relevant rows of Tables 5 to the corresponding rows in previoustables where �U = �X , we see that the empirical sizes of these tests are lowerthan the sizes of the single parameter tests across the full range of estima-tors. The same is also true for values of �U 6= �X (though the results are notreported here). The size-adjusted power functions for these multi-parametertests follow the same patterns as the power functions of previous tests andcan be found in Erb and Steigerwald (2009).

21

Figure 2: Heteroskedastic AR(1) Regressors and Errors� = (0; 1; 0; 0; 0)0 �H0 : �2 = 0

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1ρX = ρU = 0.9

β2

Siz

e­A

dj. P

ow

er

0 1 2 30

0.2

0.4

0.6

0.8

1ρX = ρU = 0.7

β2

Siz

e­A

dj. P

ow

er

0 0.5 1 1.5 20

0.2

0.4

0.6

0.8

1ρX = ρU = 0.5

β2

Siz

e­A

dj. P

ow

er

0 0.5 1 1.5 20

0.2

0.4

0.6

0.8

1ρX = ρU = 0.3

β2

Siz

e­A

dj. P

ow

er

0 0.5 1 1.50

0.2

0.4

0.6

0.8

1ρX = ρU = 0.1

β2

Siz

e­A

dj. P

ow

er

0 0.5 1 1.50

0.2

0.4

0.6

0.8

1ρX = ρU =   0

β2

Siz

e­A

dj. P

ow

er

hacpw­hachacKVn

choR

22

Figure 3: Heteroskedastic AR(1) Regressors and Errors� = (0; 0; 1; 0; 0) � H0 : �2 = 0

0 2 4 60

0.2

0.4

0.6

0.8

1ρX = ρU = 0.9

β2

Siz

e­A

dj. P

ow

er

0 0.5 1 1.5 20

0.2

0.4

0.6

0.8

1ρX = ρU = 0.7

β2

Siz

e­A

dj. P

ow

er

0 0.5 1 1.50

0.2

0.4

0.6

0.8

1ρX = ρU = 0.5

β2

Siz

e­A

dj. P

ow

er

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1ρX = ρU = 0.3

β2

Siz

e­A

dj. P

ow

er

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1ρX = ρU = 0.1

β2

Siz

e­A

dj. P

ow

er

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1ρX = ρU =   0

β2

Siz

e­A

dj. P

ow

er

hacpw­hachacKVn

choR

23

Table 5: Empirical Size of Multi-Parameter Tests - Het. AR(1) Errors

H0 : �2 � �3 = 0

(1) (2) (3) (4) (5) (6)�X = �U hac pw hacKV hacKV n cho choR

Panel A: � =�0; 12 ;

12 ;

12 ;

12

�00.90.70.50.30.10.0

0.32200.19300.13240.10160.09020.0840

0.25890.16280.12520.10660.09930.0950

0.25600.15610.11090.08590.07710.0712

0.18580.11110.08370.06750.06320.0583

0.21180.11570.08350.06680.06210.0593

0.13890.07230.05760.04740.04610.0425

Panel B: � = (0; 1; 0; 0; 0)0

0.90.70.50.30.10.0

0.41850.25130.17230.13460.12130.1181

0.36530.21840.16200.13720.13070.1293

0.35440.21420.15050.11680.10640.1043

0.26000.15140.10730.08820.08030.0793

0.39510.27680.21850.18990.17520.1772

0.30910.21420.17390.15490.14650.1458

Panel C: � = (0; 0; 0; 1; 0)0

0.90.70.50.30.10.0

0.32090.18830.13560.10260.08850.0887

0.25450.15890.12580.10730.09760.0986

0.25400.15190.11340.08830.07510.0765

0.18620.10980.08120.07210.06430.0636

0.21330.11630.08370.06780.06240.0635

0.13880.07530.05720.04820.04460.0463

4 Empirical Application

To understand the impact of our proposed standard error estimator on ap-plied research, we revisit Lustig and Verdelhan (2007) who use the Newey-West standard error estimator in their study of foreign currency risk premia.Lustig and Verdelhan propose that the excess returns of foreign currencymarkets can be partially explained as a premium on aggregate consumptiongrowth risk in U.S. markets. The premium arises because returns from highinterest rate countries tend to be low when (U.S. aggregate) consumptiongrowth is low. Conversely, the assets of the low interest rate countries neg-atively covary with consumption growth and so can serve as a hedge againstU.S. aggregate consumption growth risk.To verify their proposition, Lustig and Verdelhan study a panel of coun-

tries from which they �rst assign each country to one of eight currency portfo-

24

lios. The portfolios, which are formed by sorting on the short-term risk-freeinterest rate of the country, are rebalanced every period so that the countrieswith the lowest interest rates are in the �rst portfolio and countries with thehighest interest rates are in the eighth portfolio. With annual data from1953-2002 they estimate the covariances, which underpin their analysis, be-tween excess currency returns and the growth rate of consumption. For eachof the portfolios, estimates are obtained from regressions of the form

Rt+1 = �+ �ft + Ut,

where Rt+1 is the annual excess return on the portfolio and ft is the growth ofconsumption in the previous year. Separate regressions are run for durableand nondurable consumption.The �rst row of Table 6 replicates a subset of the results found in Table

6 of Lustig and Verdelhan, which are obtained from a regression of excess re-turns on durable consumption growth for the 32 year period after the demiseof Brenton-Woods in 1971. As can be seen, the consumption risk is gener-ally increasing in the interest rate, with a di¤erence in consumption betas ofthe �rst and seventh portfolios of about 160 basis points. The remainingrows in Table 6 report the estimated standard errors and critical values forthe six di¤erent tests of the regression coe¢ cients. Numbers in braces areestimated standard errors. Numbers in italics represent alternative criticalvalues which can be used in the hypothesis tests. Signi�cance at the 5%level is denoted by *.When the Newey-West estimator is used to estimate the standard errors,

only three portfolios exhibit risk parameters that are signi�cantly di¤erentfrom zero. However, hypothesis rejection varies depending on which stan-dard error estimator is employed. For example, the hac, hacKV , and hacKV n

tests all reject the hypothesis that � is equal to zero for the third portfolio.The hacKV and hacKV n tests reject the hypothesis even though the testcritical values have increased from 1.96 to 2.0794 and 4.813, respectively.However, the pw, cho, and choR tests fail to reject the hypothesis as the esti-mated standard errors are signi�cantly larger than the Newey-West standarderrors.For portfolios 4 and 7, the hac and hacKV tests continue to reject the null,

but the other tests show some switching in test rejection across portfolios.Moving to portfolio 4, the hacKV n tests no longer rejects the null hypothesis,but the pw and cho tests now do. Last, in the seventh portfolio, every

25

testing procedure, excepting the choR test rejects the null hypothesis at the5% level. Interestingly, the choR test fails to reject the null hypothesis forany of the eight portfolios.Although it is impossible to know exactly what causes the test rejections

to vary across portfolios �or for that matter whether the tests are properlyor improperly rejecting the null �the simulation results in the previous sec-tion suggest that heteroskedasticity may play a role. To further examine thee¤ect of heteroskedasticity, we test the null hypothesis of homoskedasticityin portfolios 2, 3, and 7 using the White heteroskedasticity test. Speci�cally,we regress the squared OLS residuals from each model on a constant, the fac-tor, ft, and the factor squared, then test the hypothesis that the coe¢ cientson the factors are jointly equal to zero.25 For portfolio 3, the �2 test statisticwith two degrees of freedom is 1.613 with a p-value of 0.446. For portfolios4 and 7, the tests statistics are 3.242 and 8.310 and the p-values are 0.198and 0.016, respectively. There appears to be little evidence of heteroskedas-ticity for portfolio 3, slight evidence of heteroskedasticity in portfolio 4, andsubstantial evidence of heteroskedasticity in portfolio 7. The results in theprevious section suggest that the cho and choR tests are likely most accuratein portfolio 3, indicating the hac, hacKV , and hacKV n tests may falsely rejectthe null. On the other hand, the cho and choR tests are most likely to over-reject the null hypothesis in portfolio 7, when heteroskedasticity is strongest.Indeed, the cho test rejects the null hypothesis in portfolio 7. However,when the conditionally homoskedastic estimator is used in conjunction withthe Rothenberg critical value, the test fails to reject the hypothesis that � isequal to zero.

5 Conclusion

The simulation results of this paper suggest that when sample sizes are small,modeling the heterogeneity of a process is secondary to accounting for de-pendence. We �nd that a conditionally homoskedastic covariance matrixestimator (when used in conjunction with Rothenberg�s second-order criticalvalue adjustment) improves test size with only a minimal loss in test power,even when the data manifest signi�cant amounts of heteroskedasticity. Insome speci�cations, the size in�ation was cut by nearly 40% over the tradi-

25A heteroskedasticity pretest would undoubtedly alter test size and we do not recom-mend choosing among the testing procedures in this manner.

26

Table6-EstimationofFactorBetasforEightPortfoliosSortedonInterestRates

(1971-2002)

Portfolio

Coef.Estimate

12

34

56

78

�0.5367

0.7857

1.2881

2.0321

1.2249

1.3590

2.1827

0.8447

[StandardErrorEstimate]

TestCriticalValue

hac

[0.7177]

1.96

[0.5437]

1.96

[0.5503]*

1.96

[0.7371]*

1.96

[0.7453]

1.96

[0.9185]

1.96

[0.7997]*

1.96

[0.8608]

1.96

hacK

V[0.7177]

2.1471

[0.5437]

2.1471

[0.5503]*

2.1471

[0.7371]*

2.0532

[0.7453]

2.1471

[0.9185]

2.0532

[0.7997]*

2.0532

[0.8608]

2.2416

pw

[0.8108]

1.96

[0.5796]

1.96

[0.6712]

1.96

[0.7762]*

1.96

[0.8125]

1.96

[0.9431]

1.96

[0.7532]*

1.96

[0.8516]

1.96

hacK

Vn

[0.5759]

4.813

[0.4333]

4.813

[0.2016]*

4.813

[0.4434]

4.813

[0.4511]

4.813

[0.4013]

4.813

[0.3185]*

4.813

[0.4286]

4.813

cho

[0.8597]

1.96

[0.7086]

1.96

[0.8178]

1.96

[0.9615]*

1.96

[0.8409]

1.96

[0.9694]

1.96

[1.0425]*

1.96

[1.5109]

1.96

choR

[0.8597]

2.2717

[0.7086]

2.2636

[0.8178]

2.2417

[0.9615]

2.2678

[0.8409]

2.3921

[0.9694]

2.2654

[1.0425]

2.2422

[1.5109]

2.2476

27

tional HAC test. Much of this size gain can be attributed to the mannerin which the second-order theory adjusts the critical value according to thedegree of correlation found in the data. Strong dependence usually leads toestimated standard errors which are too small, resulting in tests which rejectthe null hypothesis too often. Rothenberg�s second-order critical value re-�nements serve to dampen this e¤ect by producing larger critical values whenstandard errors are small, and more moderate critical values when standarderrors are large.In addition to improved small-sample test size, the conditionally ho-

moskedastic estimator o¤ers improvements over traditional HAC estimatorsin both computational ease and testing continuity. While the second-ordercritical value calculation is slightly complicated, the process is completelydata dependent, and requires no user choices in implementation. Conversely,traditional HAC estimators require the user to specify a weighting kernel,prewhitening �lter, and a bandwidth selection procedure, which previous au-thors have shown can have a signi�cant impact on the performance of theestimator.To be sure, the conditionally homoskedastic covariance estimator does not

perform the best under all heteroskedastic conditions. As noted by Rothen-berg, statistical inference is especially problematic when the error varianceis highly correlated with the regressor of interest. Care must be taken whenheteroskedasticity is caused primarily by the regressor whose coe¢ cient isunder test. However, the adjusted critical values and estimator that wepropose deliver a testing procedure with substantial gains in controlling size,at little cost in power.

28

6 Appendix

CALCULATION OF SECOND-ORDER CRITICAL VALUES

We are interested in testing the single restriction hypothesis H0 : c0� =

c0�0. Under the assumption of conditional homoskedasticity, we form thetest statistic

tcho =c0pn (B � �0)h

c0 (n�1X 0X)�1 Jcho (n�1X 0X)�1 ci1=2 :

Using Edgeworth expansions, Rothenberg shows that the second-order ad-justed critical value of this test statistic, de�ned as Pr

�tcho > cvR�

�= �, can

be expressed as

cvR� = Z�

1 +

14(1 + Z2�) VW � a (Z2� � 1)� b

2n

!where Z� is the corresponding � critical value from the standard normaldistribution. The formula for VW comes from rewriting the test statistic tchoas

tcho =Tcho�

1 + Wpn

�1=2 .Here, Tcho is the test statistic formed using the true value of Jcho, and byde�nition,

Tcho =c0 (B � �0)�

c0 (X 0X)�1 Jcho (X 0X)�1 c�1=2

and

W =pnc0 (X 0X)�1

�nJcho � Jcho

�(X 0X)�1 c

c0 (X 0X)�1 Jcho (X 0X)�1 c.

If the regression errors are known to be conditionally homoskedastic andstationary with unknown autocovariances, Rothenberg (pg. 1006) derivesthe speci�c form of W as well as its variance. The estimator of the varianceof W is

VW =2Pk(Pj rj �j+k)

2

(Pk rk �k)

2 ;k = � (n� 1) ; : : : ; 0; : : : (n� 1)j = � (n� 1) ; : : : ; 0; : : : (n� 1)

29

where �j is the jth sample autocovariance as de�ned in section 2 and

rj = n�1Pn�jjj

t=1 xtxt+jjj j = � (n� 1) ; : : : ; 0; : : : ; n� 1

with xt = nX (X 0X)�1 c.The parameters a and b are de�ned as

a =

Pk rk�rkPk rk�k

and b =

Pk rk�qkkPk rk�k

; k = � (n� 1) ; :::; (n� 1)

where �rk = (n� k)�1P

t ztzt+k, and

�qkk = traceh(X 0X)

�1(X 0X�k) (X

0X)�1�nJcho

�i�2� trace

h(X 0X)

�1X 0�X�k

i:

In the equation for �rk, zt is de�ned as the tth element of the n� 1 vector

z =M�xpn�1x0�x

whereM is the n�n matrix In�X (X 0X)�1X 0, and � is the n�n estimatedcovariance matrix for U with (s; t) element equal to �js�tj. The lagged crossproduct matricesX 0X�k andX 0�X�k are formed by summing over the n�jkjcommon observations.

30

Bibliography

Andrews, D., 1991, �Heteroskedasticity and Autocorrelation Consistent Co-variance Matrix Estimation�, Econometrica 59, 817-858.

Andrews, D. and J. Monahan, 1992, �An Improved Heteroskedasticity andAutocorrelation Consistent Covariance Matrix Estimator", Econometrica 60,953-966.

Den Haan, W. and A. Levin, 1996, �A Practitioner�s Guide to Robust Covari-ance Matrix Estimation,�National Bureau of Economic Research, TechnicalWorking Paper No. 197

Den Haan, W. and A. Levin, 2000, �Robust Covariance Matrix Estimationwith Data-Dependent VAR Prewhitening Order,�National Bureau of Eco-nomic Research, Technical Working Paper No. 255.

Erb, J. and D. Steigerwald, 2009, �Accurately Sized Test Statistics withMisspeci�ed Conditional Homoskedasticity,�Working Paper, Department ofEconomics, University of California, Santa Barbara.

Goncalves, S. and T. Vogelsang, 2004, �Block Bootstrap Puzzles in HACRobust Testing: The Sophistication of the Naive Bootstrap,�Working Paper,Department of Economics, Cornell University.

Götze, F. and H. Künsch, 1996, �Second-Order Correctness of the BlockwiseBootstrap for Stationary Observations,�Annals of Statistics 24, 1914-1933.

Hall, P. and J. Horowitz, 1996, �Bootstrap Estimators for Tests Based onGeneralized Method of Moments Estimators,�Econometrica 64, 891-916.

Huitema, B., J. McKean and S. McKnight, 2000, �A Double BootstrapMethod to Analyze Linear Models With Autoregressive Error Terms,�Psy-chological Methods 5, 87-101.

Kiefer, N. and T. Vogelsang, 2005, �ANewAsymptotic Theory for Heteroskedasticity-Autocorrelation Robust Tests�, Econometric Theory 21, 1130-1164.

Lustig, H. and Verdelhan, A., 2007, �The Cross Section of Foreign CurrencyRisk Premia and Consumption Growth Risk�, American Economic Review97, 89-117.

McLeod, A. and C. Jimenez, 1984, �Nonnegative De�niteness of the SampleAutocovariance Function�, American Statistician 38, 297-298.

31

Newey, W. and K. West, 1987, �A Simple, Positive Semi-De�nite, Het-eroskedasticity and Autocorrelation Consistent Covariance Matrix�, Econo-metrica 55, 703-708.

Phillips, P., Y. Sun and S. Jin, 2006, �Spectral Density Estimation andRobust Hypothesis Testing Using Steep Origin Kernels Without Truncation",International Economic Review 47, 837-894.

Rothenberg, T., 1988, �Approximate Power Functions for Some Robust Testsof Regression Coe¢ cients�, Econometrica 56, 997-1019.

White, H. and I. Domowitz, 1984, �Nonlinear Regression with DependentObservations�, Econometrica 52, 143-162.

Vogelsang, T. and P. Franses, 2005, �Testing for Common DeterministicTrend Slopes,�Journal of Econometrics 126, 1-24.

32