iv tes lecture

Upload: cathy-huang

Post on 09-Jan-2016

216 views

Category:

Documents


0 download

DESCRIPTION

Interval testing for labor economics

TRANSCRIPT

  • 1

    Instrumental Variables

    - Alternative, simpler approach - Less parametric

    Ti = binary treatment, Homogeneous Additive T.E. iiii uXTy +++=

    iiii vwXT ++= 21

    Omitted vars. Bias: E(uivi) 0

    For now, abstract from Xi Ti not independent of (y0i, y1i)

    Definition: wi are I.V. if they are randomly assigned. - wi independent of (y0i, y1i) E(wiui) = 0 - wi correlated with Ti E(wiTi) 0 - For linear model above, just need E(wiui) = 0 (orthogonality)

    If also assume constant additive T.E.: y1i = y0i + Can use I.V. to identify causal ATE =

    In practice want: 1. Strong 1st-stage relation between wi and Ti 2. wi valid exclusion restriction (quality and validity of research design)

    Single I.V. wi = Just Identified: yi = y0i + Ti

    Reduced-form: E(yi|wi) = 11 + 12 wi 1st-stage: E(Ti|wi) = 21 + 22 wi

    E(yi|wi) = E(y0i|wi) + E(Ti|wi) = (0 + 21) + 22 wi

  • 2

    Unless 22 = 0 (No 1st-stage) Weak IV, 22 0

    22

    12

    = , ratio of reduced-form to 1st-stage

    Two-stage least squares (2SLS)

    22

    122

    ^

    ^

    ^

    sls

    = , from 2 OLS regressions

    ( )( )( )( ) ilsslsIV

    ^^

    ^

    ^

    ii

    ii^

    TTwwyyww

    ===

    =

    2

    22

    12

    2SLS: 1. LS of Ti on 1, wi i^^^

    i wT += 2221

    2. LS of yi on 1, ^

    iT

    Form IV/2SLS estimator from 2 reduced-forms See anatomy of 2SLS (how IV estimate formed) STATA: ivreg y (T=w), robust

    If wi binary (0-1), has Wald estimator interpretation (grouped estimation) wi = 1, 1y , 1T wi = 0, 0y , 0T

    01

    01TTyy

    IV

    ^

    = , 01 uu ? 01 XX ?

    Question: How much bias does wi reduce relative to reduction in treatment variation?

    Ex.: Quarter 4 vs. Quarter 1 babies 01 TT = 0.1 yrs of education, very small difference

    001 XX , mothers education

  • 3

    Multiple I.V. wi = Over-Identified: dim(Ti)=1, dim(wi)=p > 1

    Observe Xi, but Ti not R.A. conditional on Xi

    iiii uXTy ++= iiii vwXT ++= 21

    E(uivi) 0 ^

    i^

    ii vTT += , ^

    iT = exogenous component

    ^

    iv = potential endogenous component may be correlated with ui

    Definition: wi are I.V. conditional on Xi if they are R.A. conditional on Xi (y0i, y1i)Cwi | Xi In linear case, need E(wiui|Xi) = 0 2SLS

    1. OLS of Ti on Xi, wi 2221^

    i^

    i^

    i wXT += , ^

    ii^

    i TTv =

    2. OLS of yi on Xi, ^

    iT STATA: ivreg y X (T=w), robust

    2 Alternative 2nd-stages

    i) use wi as instruments: ii^

    ii XTy ++= , 0=

    i

    ^

    iTE

    ii) use ^iv as control for omitted variables mathematically identical

    i^

    iiii vXTy +++= , ( ) 0= iiTE sls^CF^ 2 = -

    ^

    iv selection correction single-index control function - measures corr(ui, vi) = ols

    ^ sls

    ^

    2

    - Before: = > iiii zzvuE

  • 4

    Indirect validity tests of instruments (2SLS better than OLS?) OLS:

    - regress yi on Ti ols - regress each Xi on Ti how correlated are controls with Ti?

    2SLS: - 2SLS of yi on Ti using wi as instrument 2sls - regress each Xi on wi how correlated are controls with wi? - 2SLS of each Xi on Ti using wi as instrument how correlated are Xi

    with variation in Ti due to wi?

    Testing whether instruments reduce association between treatment and observables. Cannot test association with unobservables.

    Also examine whether 2sls is less sensitive to controlling for Xi than ols

    I.V. versus Heckitt:

    I.V.: -robust since not assuming (ui, vi) ~ joint normal - inefficient since assuming all variation in Ti not due to wi is

    endogenous may throw away good variation - only 1 correction term

    Heckitt: - biased if (ui, vi) not joint normal - efficient if f(ui, vi) correctly specified additional ID assumption

    i) only correct for part of vi correlate with ui since specifying their correlation; ii) using nonlinear transformation of

    ( ) ( )( )

    ==

    ^iv - Can make inferences on selection mechanism (e.g., cream skinning,

    comparative advantage, absolute advantage, etc.)

    Issue with 2SLS:

    In finite samples, if 1st-stage weak, then sls

    ^

    2 biased toward ols^ , and

    conventional s.e.(sls

    ^

    2 ) biased.

  • 5

    Overfitting and poor research design small sample bias (Should present 1st-stage results)

    Testing Over-ID restrictions: 1. outcome equation residual i should be uncorrelated with wi, Xi

    (exogenous vars.) regress yi on Xi,

    ^

    iT ^

    i , regress ^

    i on Xi, wi 2R N 2R ~ 2(p-1), p = dim(wi)

    - Intuitive omnibus test of orthogonality conditions ( ) 0= iiwE and model specification

    - Similar to test for heteroskedasticity (Lagrange Multiplier test)

    2. If homogeneous and additive , then each I.V. (w1i, , wpi) should identify the same same

    sls^

    2 . If different I.V. lead to different s then not all I.V. valid (or model misspecified).

    Can motivate as Minimum Distance problem - 1 endogenous treatment, p instruments

    Structural eqn.: yi = Ti + ui, Ti = 1w1i + 2w2i + + pwpi + vi Reduced-form: yi = 1w1i + 2w2i + + pwpi + i 1 = 1, 2 = 2 , , p = p RF parameters: [1 p, 1 p] = Structural parameters: [1 p, 1 p] = f()

    2-steps 1. Estimate with two OLS regressions of reduced-form and first-stage. 2. Fit reduced-form estimates to structural parameters.

    = )( )()( ^^1^^ fVarfFMin = Optimal Min. Distance

    freedom of degrees )1(2^

    pF DOMD Use to test whether (p1) over-identifying restrictions hold.

  • 6

    - I.V./2SLS simple and works well if additive, homogeneous T.E.s - Commonly used - Heckitt may work better if there is Roy-like self selection (Differential sorting based on heterogeneous T.E.s)

    Relax Homogeneous Additive T.E. Assumption Angrist, Imbens and Rubin; Wooldridge; Garen

    wi are R.A. conditional on Xi (often, just need mean independence) wi finite valued, valid exclusion restriction

    i varies over i (Random Coeffs.) Can we estimate ATE= E(i)= = avg. effect for randomly selected i?

    Ex. i = is return to college f(i) = population density function r = marginal cost of education (e.g., interest rate) i r, go to college (1) i < r, dont (0)

    Return to College

    Choose 1

    Choose 0

    r (1)(bar)(0)

    Density function ofReturns to College

    ATE= , r = Marginal T.E. (effect for marginal person) Effect" Treated on Treatment" ATE Selected1 ==

    untreated for effect Average0 =

  • 7

    ( )ryyT iiii >== 011 - Pure Roy model all variation in choice due to self-selection on

    benefits relative to uniform cost. - Cant ID T.E.s without strong parametric assumptions.

    ( )01 01 >= iiii cyyT - Some variation in choice due to heterogeneity in costs (ci). If costs

    unrelated to i and omitted vars., then can ID model using cost variables as instruments.

    Binary Treatment: ( )01 >= *ii TT i = y1i y0i, ( ) 0Cov ii T,

    Neither Heckitt nor IV can ID E(y1i y0i) without strong assumptions.

    Heckitt: (u0i, u1i,vi) ~ trivariate normal

    2SLS: E(y1i y0i|Ti, wi) = E(y1i y0i) - gain from treatment varies across i, but mean independent of Ti, wi - loosely, Cov(y1i y0i, Ti)=0, Cov(y1i y0i, wi)=0 - otherwsie, E(

    sls^

    2 )

    For both Heckitt and I.V., semi-parametric identification at infinity i.e., need incredible instrumental variable.

    Continuous Treatment *iT

    ii*iii uXTy ++= E(ui *iT )0 omitted vars. bias E(i *iT )0 selectivity bias Ex. *iT =yrs. of education E(i *iT )>0?

  • 8

    Rewrite: ii*ii*ii uXTaTy +++= , ( )iE = , = iia

    iii*i vwXT ++= 21 *iT interacts with unobserved ai

    =

    sls

    ^

    E 2 if: (Wooldridge) A1: E(ui|wi)=0, wi = valid I.V. A2: E(vi|wi)=0 (wi, vi) independent

    ( ) 22 vii wvE = A3: E(ai|wi, vi) = E(ai|vi) = vi wi unrelated to T.E. heterogeneity conditional on vi A4: 2 0, valid 1st-stage

    - can condition on zi = (Xi, wi) - A2 unlikely to hold if *iT discrete (e.g., binary) unless wi is purely

    randomly assigned - A3 pretty restrictive single control function (vi) absorbs both

    omitted vars. and selectivity biases

    A1-A4 ( ) ( ) constant== *iii*ii TaEwTaE == CFsls^^

    EE 2

    Alternatively, B1A1, B4A4 B2: ( ) iiiii*i awXa,wTE +++= 21 , ( )[ ]iiii aa,wvE = B3: E(ai|wi)=0

    ( ) 22 aii waE =

    - Fewer restrictions on *iT reduced-form (B2) - More restrictions on (ai, wi) relation (B3) - Ex.: B1-B4 satisfied for binary Ti when Pr(Ti = 1|Xi, wi, ai) is linear

    probability model

  • 9

    What if A1-A4 (B1-B4) violated?

    sls

    ^

    E 2

    Augmented control function approach to random coefficients (Garen) - include additional control for selectivity bias - based on assumption ui, ai linear in *iT , wi

    C1: ( ) ( ) 0== iiii waEwuE C2: ( ) iw*iTi*ii wTw,TuE += C3: ( ) iw*iTi*ii wTw,TaE +=

    C1-C3 ( ) iTi*ii vw,TuE =

    ( ) iTi*ii vw,TaE = 2nd control function

    ^

    i^ *

    i*i vTT += , ==

    ^

    i^ *

    i vT exogenous, potentially endogenous

    (*) i^i*i^ii*ii vTvXTy ++++= 21

    ^

    iv1 = control for omitted vars. bias

    ^

    i*i vT 2 = control for selectivity bias ( )

    ( )iii

    TvVar

    v,uCov== 1 ,

    ( )( )i

    iiT

    vVarv,aCov

    == 2

    - 1 tests for E(ui| *iT )=0, O.V.B - 2 tests for self-selection due to T.E. heterogeneity

    2 0 T.E. heterogeneity and self-selection - Under assumptions, adding ^i*i vT 2 eliminates selectivity bias due

    to E(ai| *iT )0

  • 10

    - More general than 2SLS: 2 control functions instead of 1 - Similar to p-score and Heckitt selection correction approaches - No joint normality assumptions (leveraging *iT continuous) - works if C2 and C3 hold

    - Test for robustness include 2^iv , 2^

    ii vT , etc. - Calculate standard errors via bootstrap.

    Application: Chay and Greenstone (JPE, April 2005) - Air pollution and housing prices = hedonics - Go through this in Applied Exercise #4

    What can 2SLS Semiparametrically Identify? (AIR, JASA 96) Heterogeneous T.E.s

    - minimal assumptions on functional form and T.E. heterogeneity

    Binary Ti = (0, 1) Binary I.V. wi = (0, 1)

    Potential Treatment Status T0i if wi = 0 T1i if wi = 1

    Ti = T0i(1 - wi) + T1iwi = T0i + (T1i - T0i)wi

    Assumptions 1. Independence (strengthened I.V. condition) (y0i, y1i, T0i, T1i)C iw Ex. wi = 1 i encouraged to take treatment, encouragement R.A.

    2. Monotonicity Pr(Ti = 1|wi = 1) Pr(Ti = 1|wi = 0), for all i, or Pr(Ti = 1|wi = 1) Pr(Ti = 1|wi = 0), for all i

    T1i T0i for all i, Pr(T1i T0i) = 1 T1i T0i

  • 11

    T0i 0 1

    0 Never takers Defiers T1i 1 Compliers Always takers

    Compliers: Pr(T1i > T0i) = 1

    Monotonicity No Defiers - use intuition on plausibility - sometimes not true of latent variable models (especially if wi takes

    on many values)

    Ex. Ti = f(wi), f() may not be monotonic

    With 1. and 2., 2SLS (IV) identifies Local Average T.E. (LATE) (Ahn and Powell type assumptions)

    wi binary: yi = + 2 wi + ui Ti = + 1 wi + vi

    ^

    ^^

    sls

    1

    22

    =

    ( ) ( )( ) ( ) ( )101

    0101012 ==

    ==

    ==

    =

    iiii

    iiii

    iiii^ TTyyEwTEwTEwyEwyElimp sls

    = ATE for compliers (those whose treatment status changed by I.V. wi)

    Note: *iT continuous, 2SLS has similar interpretation (Angrist and Imbens) ( )

    ( )*g*gg^

    TETE

    limp sls

    =

    2 = weighted avg. of LATEs for each group g

  • 12

    ATE = E(y1i y0i) = avg. effect for populaton SATE = E(y1i y0i|Ti = 1) = avg. effect among treated = Effect of unionism on unionized LATE = ( )10101 = iiii TTyyE = avg. effect among compliers

    SATE and LATE are avg. effects among non-random subpopulations - not clear which are the interesting policy parameters - LATE interesting if wi can be linked to a clear policy (wi =

    regulation vs. wi = QOB)

    Latent Vars. Interpretation of 2SLS - Abstract from Xi

    ii uy 000 += ii uy 111 += ( )01 2 ++= iii vwT , wi = (0, 1)

    T0i = 1 if + vi > 0, 0 otherwise T1i = 1 if + 2 + vi > 0, 0 otherwise

    I.V. condition: (u0i, u1i, vi)Cwi Monotonicity: 2 0 (2 < 0), Ex. linearity, homogeneity

    Compliers: T1i T0i = 1 2 vi <

    ( )

  • 13

    Also need , + 2 , to ID (1 0) wi = 1 Ti = 1 Treatment status R.A. wi = 0 Ti = 0 everyone a complier

    Now: wi = (-1, 0, 1), focus on wi = (-1, 1)

    Complier condition: 2 vi < + 2 2 012 =

    sls^limp

    Roy Model: ( )01 012 ++= iiii yyw'T ' uniform cost of participating

    - sls^

    2 provides little information on (1 0)

    Latent Var. Model vi = u1i u0i = ' + 1 0

    Complier condition: ' 2 y1i y0i < ' Higher potential gain more likely to participate

    2 0 ( )'yyyyElimp iiii^ sls =

    01012

    sls^

    2 estimates cost of participating (useless)

    Cannot identify Roy Model without joint normality assumption or panel data (and stationarity assumption)?

    SATE = ATE and I.V. estimate of (1 0) consistent only if ( ) 0101 == iiiii w,X,TuuE

  • 14

    Semiparametric selection corrections and Identification at Infinity 1. (u0i, u1i, vi)Cwi 2. Index-sufficiency and sufficient variation in wi

    [traces out Pr(selection)]

    wi such that Pr(Ti = 1|wi, Xi) 1 E(u1i|Ti = 1, wi, Xi) 0 wi infinite-valued

    Can derive semi-parametric selection correction estimate of ATE and SATE without functional form assumption on (u0i, u1i, vi) joint distribution

    Identical Issue in I.V./2SLS

    With just independence and monotonicity I.V./2SLS can point identify LATE, but can say nothing about ATE.

    What can latent variable selection correction approach identify? - Vytlacil, Econometrica: under independence and monotonicity,

    latent variable approach can derive bounds on ATE. - Does not derive a point estimator of anything, though.

    Returns to Education example - How to interpret the fact that IV estimates often larger than OLS,

    and twins estimate little different from OLS? - Go back to human capital model and allow for heterogeneity. - What are implications for role of benefits and costs in educational

    attainment?

    Put graphs of education production functions and cost functions here - Two types of individuals - Ability, marginal returns and marginal costs for person i (ai, bi, ri);

    person j (aj, bj, rj)

  • 15

    What about continuous IV, wi? 1. Local Instrumental Variables (Heckman and Vytlacil, 2000)

    - local Wald estimation of T.E.s at different values of wi - E(yi|wi), E(Ti|wi): take ratio at different values of wi - Bandwidth choice

    2. Manning, BE Journals, 2004 (very clear) binary Ti - binary treatment and heterogeneous T.E.s - heterogeneous T.E.s nonlinear relation between mean of

    outcome conditional on I.V. and mean of treatment conditional on I.V.

    - linear I.V. (2SLS) estimator is misspecification of functional form - therefore, will depend on I.V. used due to misspecification - estimate correction functional form, then T.E.s independent of I.V.

    used - in practice, need rich data to identify T.E.s without strong

    distributional assumptions; e.g., a lot of variation in E(Ti|wi) in the data (identification at infinity). Otherwise, must extrapolate.

    - o.w., settle for estimates of T.E.s that are instrument-dependent

    3. Blundell and Powell - control function approaches when the outcome equation is a

    nonlinear transformation of some function (e.g., binary response) - 2SLS invalid for nonlinear models