ch5 guzel ders notu

Upload: ahmet-huseein

Post on 08-Apr-2018

251 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Ch5 Guzel Ders Notu

    1/25

    Chapter 5 Properties of a Random Sample

    5.1 Population and sample (CB 5.1)

    Suppose we are interested in the distribution or certain features of a collection

    of data. We call this collection a population.

    Suppose for some reason these data are not well documented or easily ac-cessible and the distribution or features that we are interested in cannot bereadily computed. A simple example of this is household incomes. If wewish to know the true average U.S. household income for this month thenwe have a very big task on hand because we have to gather the informationfrom hundreds of millions of families.

    A solution of this is to draw a sample from the population, in other words toselect a subset from the population, and use the sample information to makeinference on the truth. How to best do this and how to handle samplingvariability are among the most important issues in statistics

    The populations features that might be of interest include: the shape ofthe distribution (is it symmetric or skewed, does it have one single peakor multiple peaks, etc.), whether a standard distribution (normal, gamma,Weibull, Poisson, etc) could serve as a reasonable approximation, what is the

    mean, variance, percentiles, etc.

    Any number which can be computed from the population is called a parameter.Common parameters of interest are the mean, variance, percentiles, mode(most probable value).

    A statistic is any number calculated from the sample data. Suppose thesample data are X1, . . . , X n. Examples of statistics are the sample mean= X = n1

    ni=1 Xi, sample variance = (n 1)1

    ni=1(Xi X)2, sample

    percentiles, sample range (= sample maximum sample minimum).Consider the experiment of drawing at random a sample from a population.Before the sample is drawn, we can think of the sample values to be observedas random variables. In that sense we can also think of any statistic computed

    1

  • 8/6/2019 Ch5 Guzel Ders Notu

    2/25

    from these values as a random variable and speak about its distribution, calledsampling distribution. After the sample is drawn, then we see the value ofthe statistic and there would be no distribution to speak of.

    Typically we will assume that the population size is much bigger than thesample size and that the sample observations are drawn from the populationindependently of one another under very similar sampling conditions. Assuch the random variables in the sample will be approximately independentand have very similar distributions.

    We say that a collection of rvs X1, . . . , X n form a random sample if they areiid. We will for the most part assume that this is case. In practice, this is ofcourse often violated. The iid. theory is nevertheless relavant since the iid.

    model can be used as a fundamental building block for complicated modelsof dependence.

    5.2 Basic tools (CB 5.1)

    In this section we review some basic tools that are useful for studying thedistributional properties of statistics.

    Assume that X1, . . . , X n are iid.

    (a) IfE(X1) = then E(X) = .

    (b) If Var(X1) = 2 then Var(X) = 2/n.

    Proof. In general, given rvs X1, . . . , X n

    Var

    ni=1

    Xi

    = E n

    i=1(Xi i)

    2

    =ni=1

    nj=1

    E(Xi i)(Xj j)

    2

  • 8/6/2019 Ch5 Guzel Ders Notu

    3/25

    =ni=1

    E(Xi i)2 +

    1i=jnE(Xi i)(Xj j)

    =n

    i=1

    Var(Xi) + 1i=jn

    Cov(Xi, Xj).

    (c) IfE(X1) = and Var(X1) = 2 then E(S2) = 2.

    Proof.

    ES2 =1

    n 1En

    i=1(Xi X)2 = 1

    n 1E

    n

    i=1X2i (nX)2

    = nn 1

    2 + 2 2

    n 2

    = 2.

    (d) IfX1 has mgf M(t) then X has mgf Mn(t/n).

    (e) IfX1 has cdf F then maxni=1 Xi has cdf F

    n.

    If X, Y have a joint pdf fX,Y then the joint pdf of X and Z = X+ Y is

    fX,Z(x, z) = fX,Y(x, z x)and hence the pdf of Z is

    fZ(z) =

    fX,Y(x, z x)dx.

    If X, Y are independent then

    fZ(z) = fX(x)fY(z x)dx,which is called the convolution of fX and fY.

    The mgf of a rv (X1, . . . , X n) is defined by

    M(t1, . . . , tn) = Een

    i=1 tiXi.

    3

  • 8/6/2019 Ch5 Guzel Ders Notu

    4/25

    As in the one-variable case, mgfs are unique. Let (X, Y) be bivariate normalwith pdf:

    f(x, y) =1

    2x

    y1 2

    e 1

    2(12)[(xx

    x)2+(

    yy

    y)22(xx

    x)(

    yy

    y)]

    Then

    M(t1, t2) = e1t1+2t2+

    12(

    21t

    21+212t1t2+

    22t

    22).

    5.3 Sampling from the normal distribution (CB 5.3)

    Theorem 5.3.1. Let X1, . . . , X n be a random sample from N(, 2). Then

    (a)

    X and S

    2

    are independent rvs,(b) X N(, 2/n),(c) (n 1)S2/2 2 with n 1 degrees of freedom. (Recall that 2(p) =

    gamma (p/2, 2).

    Proof. (b) is obvious so we focus on (a) and (c). We assume without lossof generality that = 0, 2 = 1. We first prove (a). Write

    S2 = 1n 1

    ni=1

    (Xi X)2

    =1

    n 1

    (X1 X)2 +

    ni=2

    (Xi X)2

    =1

    n 1

    ni=2

    (Xi X)2

    +ni=2

    (Xi X)2

    ,

    which show that S2 is a function ofX2 X , . . . , X n X. If we can show thatthese rvs are jointly independent of X then we are done. So this is what wedo now. Consider the transformation

    U1 = X, Uj = Xj X, 2 j n.

    4

  • 8/6/2019 Ch5 Guzel Ders Notu

    5/25

    The tranformation is one-one from n to n. The inverse transformation is

    X1 = U1 ni=2

    Ui, Xj = Uj + U1, 2 j n

    and the Jacobian is equal to n. Thus,

    fUUU(uuu) =n

    (2)n/2e

    12(u1

    ni=2 ui)

    2

    e12

    ni=2(ui+u1)

    2

    =n

    (2)n/2e

    n2u

    21e

    12 [(n

    i=2 ui)2+n

    i=2 u2i ].

    Thus, (a) is proved. We now prove (c). Let

    Xn =

    1

    n

    ni=1 X

    i, S2

    n =

    1

    n 1ni=1 (X

    i

    Xn)2

    .

    Write

    (n 1)S2n =n1i=1

    (Xi Xn1 + Xn1 Xn)2 + (Xn Xn)2

    =n1

    i=1(Xi Xn1)2 + 2

    n1

    i=1(Xi Xn1)(Xn1 Xn)

    +(n 1)(Xn1 Xn)2 + (Xn Xn)2.Note that

    (n 1)(Xn1 Xn)2

    =n 1

    n2

    n

    n 1n1i=1

    Xi ni=1

    Xi

    2

    =

    n

    1

    n2 n

    n 1 1n1i=1

    Xi Xn2

    =n 1

    n2(Xn1 Xn)2

    5

  • 8/6/2019 Ch5 Guzel Ders Notu

    6/25

    and

    (Xn Xn)2 =

    Xn 1n

    n

    i=1Xi

    2

    =

    n 1

    nXn n 1

    n

    1

    n 1n1i=1

    Xi

    2

    =

    n 1

    n

    2 Xn Xn1

    2.

    Thus,

    (n

    1)S2n = (n

    2)S2n

    1 +

    n 1n

    2(Xn

    Xn

    1)

    2 +n 1n

    2

    Xn Xn12

    = (n 2)S2n1 +n 1

    n(Xn Xn1)2.

    If n = 2 then

    (2 1)S22 =1

    2(X2 X1)2 21

    since

    12(X2 X1) N(0, 1).

    Now suppose that (k 1)S2k 2k1 (induction assumption) and we showthat kS2k+1 2k. By the identity above,

    kS2k+1 = (k 1)S2k +k

    k + 1(Xk+1 Xk)2.

    Since S2k is independent of Xk and Xk+1, the two summands are indepen-

    dent. The first term is 2

    k1 by assumption and the second term is 2

    1

    since

    kk+1(Xk+1 Xk) N(0, 1). Hence kS2k+1 2k and (c) is proved by

    induction.

    6

  • 8/6/2019 Ch5 Guzel Ders Notu

    7/25

    Definition. The Students t distribution with p degrees of freedom, wherep is any positive integer, has the pdf

    f(x) =

    p+1

    2 p

    2

    1

    (p)1/2

    (1 + x2

    /p)(p+1)/2

    .

    The F distribution with p, q degrees of freedom, where p, q are any positiveintegers, has the pdf

    f(x) =p+q

    2

    p2

    q2

    pq

    p/2 xp/21[1 + (p/q)x](p+q)/2

    .

    Theorem 5.3.2. Let X, Y be independent rvs, X N(0, 1) and Y 2pthen X/

    Y /p tp.

    Proof. Let

    U = X/

    Y/p, V = Y.

    The transformation is one-one with inverse tranformation

    X = U

    V/p, Y = V

    and Jacobian

    v/p. Hence

    fU,V(u, v) = fX(u

    v/p)fY(v)

    v/p

    =12

    e12u

    2v/p 1

    (p/2)2p/2vp/21ev/2

    v/p.

    Thus,

    fU(u) =

    0

    fU,V(u, v)dv

    = 12

    0

    e12(1+u

    2/p)vv(p+1)/21dv

    =12

    1

    (p/2)2p/2((p + 1)/2)

    [12

    (1 + u2/p)](p+1)/2.

    7

  • 8/6/2019 Ch5 Guzel Ders Notu

    8/25

    Corollary 5.3.3. Let X1, . . . , X n be a random sample from N(, 2). Then

    XS/

    n

    tn1.

    Proof. By Theorem 5.3.1, X and S2 are independent and

    (X )/(n) N(0, 1), (n 1)S22

    2n1.

    Hence it follows from Theorem 5.3.2 that

    X S/

    n

    =(X )/(n)

    [(n 1)S2/2]/(n 1) tn1.

    Theorem 5.3.4. Let X, Y be independent rvs, X

    2

    p

    and Y

    2

    q

    thenX/pY/q Fp,q.

    Proof. Let U = X/pY/q , V = X. Find the joint pdf of U, V and integrate vout.

    Corollary 5.3.5. Let X1, . . . , X n be a random sample from N(X, 2X)

    and Y1, . . . , Y m be a random sample from N(Y, 2Y) where the two random

    samples are independent. Let S2X and S2Y be the sample variances of the two

    random samples. Then

    S2X/2X

    S2Y/2Y

    Fn1,m1.

    Proof. By Theorem 5.3.1,

    (n 1)S2X2X

    2n1,(m 1)S2Y

    2Y 2m1.

    It follows from Theorem 5.3.4. that

    S2X/2X

    S2Y/2Y

    =[(n 1)S2X/2X]/(n 1)[(m 1)S2Y/2Y]/(m 1)

    Fn1,m1.

    8

  • 8/6/2019 Ch5 Guzel Ders Notu

    9/25

    5.4 Order statistics (CB 5.4)

    Definition. The order statistics of a sample X1, . . . , X n are the samplevalues placed in ascending order. The i-th order statistic is denoted by X(i).

    One can define a variety of statistics using the order statistics. The samplemeadian is

    M =

    X((n+1)/2) if n is odd

    (X(n/2) + X(n/2+1))/2 if n is even.

    The median is a measure of central tendency which is robust against out-liers. More generally, the sample (100)p-th percentile is equal to X{np} if12n

    < p < .5 and X(n+1{n(1p)}) if .5 < p < 1

    12n. The 25-th percentile is

    called the first sample quartile, the 75-th percentile is called the third samplequartile. Sample percentiles are estimates of the population percentiles.

    Theorem 5.4.1. Let X1, . . . , X n be a random sample from a distributionwith cdf F. Then

    P(X(j) x) =n

    k=j

    n

    k

    Fk(x)(1 F(x))nk for all x.

    Proof. Observe that the j-th smallest value is x if any only if the totalnumber of observations x is at least j. The latter probability is clearly

    nk=j

    n

    k

    Fk(x)(1 F(x))nk.

    As a application,

    P(max1in Xi x) = P(X(n) x) = Fn(x),and

    P( min1in

    Xi x) = P(X(1) x) = (1 F(x))n.

    9

  • 8/6/2019 Ch5 Guzel Ders Notu

    10/25

    Another application of this is that if the population distribution is discreteand P(X1 = x) > 0 then

    P(X(j) = x) = P(X(j) x) P(X(j) x)

    =n

    k=j

    nk

    [Fk(x)(1 F(x))nk Fk(x)(1 F(x))nk].

    If the distribution of Xi is continuous then

    P(Xi = Xj) = 0

    and hence the pr. of having ties in the order statistics is 0.

    Theorem 5.4.2. Let X1, . . . , X n be a random sample from a distributionwith pdf f and cdf F. Then

    (a) fX(1),...,X(n)(x1, . . . , xn) = n!f(x1) f(xn), x1 < x2 < < xn,(b) fX(j)(x) =

    n!(j1)!(nj)!f(x)F

    j1(x)(1 F(x))nj, 1 j n,

    (c) fX(i),X(j)(u, v) =n!

    (i1)!(ji1)!(nj)!f(u)Fi1(u)(F(v)F(u))ji1(1F(v))nj,

    u < v, 1 i < j n.

    Proof.

    (a) For x1 < x2 < < xn,fX(1),...,X(n)(x1, . . . , xn)

    = limi0

    1in

    P(x1 1 < X(1) x1 + 1, . . . , xn n < X(n) xn + n)(21)(22) (2n) .

    Observe that for small 1, . . . , n > 0 the intervals (x1

    1, x1 + 1],

    10

  • 8/6/2019 Ch5 Guzel Ders Notu

    11/25

  • 8/6/2019 Ch5 Guzel Ders Notu

    12/25

    It is clear (exercise) that

    lim0

    B()

    = 0.

    The two combined give

    lim0

    P(x < X(j) x + )

    = lim0

    (A()

    +

    B()

    )

    =n!

    (j 1)!(n j)!f(x)Fj1(x)(1 F(x))nj.

    (c) The proof is similar to that of (b) and is left as an exercise.

    Example. Let X1, . . . , X n be a random sample from a distribution with pdff and cdf F. The joint pdf of X(1), X(n) is

    fX(1),X(n)(x1, x2) = n(n 1)f(x1)f(x2)(F(x2) F(x1))n2, x1 < x2.Hence the joint pdf of X(1) and X(n) X(1) is

    fX(1),X(n)X(1)(u, v) = fX(1),X(n)(u, u + v)

    = n(n 1)f(u)f(u + v)(F(u + v) F(u))n2, u , v > 0.Hence the pdf of X(n) X(1) is

    fX(n)X(1)(u) = n(n 1)f(u)v=0

    f(u + v)(F(u + v) F(u))n2du.

    In special cases this integral has closed form. For example CB derives thisfor the uniform distribution.

    5.5 Convergence concepts (CB 5.5)

    Definition. A sequence of random variables X1, X2, . . . converges in prob-ability to a random variable X, denoted by Xn

    p X, if for every > 0,limn

    P(|Xn X| > ) = 0.

    12

  • 8/6/2019 Ch5 Guzel Ders Notu

    13/25

    Note that the random variables X1, X2, . . . are not assumed independent andif fact in order to have convergence they have to be dependent. The targetrandom variable is sometimes nonrandom, i.e. P(X = c) = 1 for some

    constant c, in which case we say that Xn converges in probability to c (Xnp

    c).

    There are numerous ways to prove convergence in probability. One of whichis to use the convergence of moments: If we can show that

    limn

    E|Xn X|p = 0 for some p > 0 (1)

    then by Chebychevs inequality,

    P(|X

    n X

    |> ) = P(

    |X

    n X

    |p > p)

    1

    pE

    |X

    n X

    |p

    0.

    Actually, if (1) holds then we say that Xn converges to X in Lp and is denoted

    by XnLp X. So Lp convergence implies convergence in probability.

    Theorem 5.5.1. (Weak Law of Large Numbers) Let X1, . . . , X n be iid. rvswith mean and variance 2 < . Then Xn converges to in L2 and inprobability.

    Proof.

    E|Xn |2 = Var(Xn) = 2

    n 0 as n .

    Let Xn be a statistic and assume that the population distribution has aparameter . Xn is said to be a (weakly) consistent estimator of ifXn

    p .Thus, the sample mean is a consistent estimator of the population mean.

    Theorem 5.5.2. Suppose that XXXn = (Xn,1, . . . , X n,k) and XXX = (X1, . . . , X k)

    are such that Xn,jp

    Xj, 1 j k. If g : k

    is continuous theng(XXXn) g(XXX).Proof. Recall that a continuous mapping is uniformly continuous on anybounded interval. Let B be a fixed positive constant. For each > 0 there

    13

  • 8/6/2019 Ch5 Guzel Ders Notu

    14/25

    exists such that for any xxx, yyy with max1jk |xj| B and max1jk |xjyj| , we have

    |g(xxx) g(yyy)| .

    Now writeP(|g(XXXn) g(XXX)| > )

    = P(|g(XXXn) g(XXX)| > , max1jk

    |Xj| B)+P(|g(XXXn) g(XXX)| > , max

    1jk|Xj| > B)

    P(max1jk

    |Xn,j Xj| > , max1jk

    |Xj| B)+P(|g(XXXn) g(XXX)| > , max

    1jk|Xj| > B)

    P(max1jk |Xn,j Xj| > ) + P(max1jk |Xj| > B)

    k

    j=1

    P(|Xn,j Xj| > ) +k

    j=1

    P(|Xj| > B).

    The first term tends to 0 by assumptoion and so

    limn

    P(|g(XXXn) g(XXX)| > ) k

    j=1

    P(|Xj| > B).

    Since the lhs is independent ofB we can take B on the rhs as big as we pleaseand hence the lhs is 0.

    Example. Let X1, X2, . . . , X n be iid with finite 4-th moment. Then thesample variance is consistent for the population variance.

    Proof. Write

    S2n =1

    n 1

    n

    i=1

    X2i

    n

    n 1X2n.

    By WLLN and Theorem 5.5.2,

    1

    n 1ni=1

    X2i =n

    n 11

    n

    ni=1

    X2ip E(X2).

    14

  • 8/6/2019 Ch5 Guzel Ders Notu

    15/25

    Also, since Xnp E(X) we have

    n

    n 1X2n

    p E2(X).

    ThusS2n

    p E(X2) E2(X).

    Lets revisit the notion of convergence in distribution:

    Definition. Let X, X1, X2, . . . be rvs with cdfs, resp., FX, FX1, FX2, . . ..

    Say that Xn converges in distribution to X, denoted by Xnd X, if

    limn FXn(x) = FX(x) at every point x where FX is continuous.

    Convergence in distrubution can be proved

    (a) by verifying the definition,

    (b) by showing that the pmf/pdf of Xn converging to a limiting pmf/pdf(Scheffes Theorem),

    (c) by showing that the mgf of Xn converges to a limiting mgf.

    Example. Let U1, U2, . . . be iid uniform (0,1). Show that Xn = n min1in Uiconverges in distribution and identify the limit.

    Proof.

    P(Xn x) = 1 P(Xn > x)= 1 P( min

    1

    i

    n

    Ui > x/n)

    = 1 Pn(U1 > x/n)= 1 (1 x/n)n 1 ex.

    This shows that Xn converges in distribution to the expenential distributionwith mean 1.

    15

  • 8/6/2019 Ch5 Guzel Ders Notu

    16/25

    Example. In the previous example, show that min1in Xip 0.

    Proof.We need to show that

    P(| min1in

    Xi 0| > ) 0, > 0.

    The lhs is equal to

    P( min1in

    Xi > ) = P(Xi > for all i = 1, . . . , n) = (1 )n 0.

    Theorem 5.5.3. If Xnp

    X then Xn

    d

    X. The converse is true if

    P(X = c) = 1 for some c.

    Proof. Assume that Xnp X and let x be a point of continuity of P(X

    x). Then

    |P(Xn x) P(X x)|= |P(Xn x, X x) + P(Xn x, X > x)

    P(Xn x, X x) P(Xn > x, X x)|

    P(Xn

    x, X > x) + P(Xn > x, X

    x).

    For any > 0,

    0 P(Xn x, X > x)= P(Xn x, X (x, x + ]) + P(Xn x, X > x + ) P(X (x, x + ]) + P(|Xn X| > ) P(X (x, x + ]) as n

    by convergence in pr. Since > 0 is arbitrary, we have

    limn

    P(Xn x, X > x) lim0

    P(X (x, x + ]) = 0.

    Similarly, one can show

    limn

    P(Xn > x, X x) = 0.

    16

  • 8/6/2019 Ch5 Guzel Ders Notu

    17/25

    Theorem 5.5.4. (Central Limit Theorem) Let X1, X2, . . . , X n be iid withmean and variance 2. Define

    Zn =Xn /

    n

    .

    Then

    limn

    P(Zn z) =z

    12

    ez2/2dz.

    Proof. The result is true as stated. However, lets assume that the mgfM(t) of Xi exists. Without loss of generality assume that = 0 and

    2 = 1.

    By the Taylor expansion,

    M(t/

    n) = M(0) + M(0)tn

    +1

    2M(n)

    tn

    2where n is between 0 and t/

    n. Clearly

    M(0) = 1, M(0) = 0,

    so that

    M(t/n) = 1 +1

    n

    t2

    2 M(n)Since

    Zn =1n

    ni=1

    Xi

    we have

    MZn(t) = Mn(t/

    n) =

    1 +

    1

    n

    t2

    2M(n)

    n

    .

    Since n 0 we haveM(n) M(0) = 1.

    Hence

    MZn(t) et2/2 = cdf ofN(0, 1).

    17

  • 8/6/2019 Ch5 Guzel Ders Notu

    18/25

    Suppose we wish to estimate by Xn. The probability that the estimate isoff by at most is

    P(|Xn | )= P

    n/ Xn

    /

    n n/

    Pn/ Z n/ .

    If we want this probability to be, say, 95%, then we can solve

    .95 = P

    n/ Z n/

    requiring

    = 1.96/

    n.

    Example. Suppose Yn B(n, p). ThenYn npnp(1 p)

    d Z N(0, 1).

    Proof.

    One can write

    Yn =n

    j=1

    Xj

    where the Xj are iid B(1, p). Since EXi = p =: , Var(Xi) = p(1 p) =: 2,Yn npnp(1 p) =

    nj=1(Xj )

    n2=

    Xn /

    n

    .

    Hence the result follows readily from the CLT.

    Example. Suppose Y Poisson (n). ThenYn n

    n

    d Z N(0, 1).

    18

  • 8/6/2019 Ch5 Guzel Ders Notu

    19/25

    Proof. We know that Yn has the same distribution asn

    i=1 Xi whereX1, X2, . . . , X n be iid Poisson (1). Hence

    Yn nn

    d=

    ni=1(Xi 1)

    n Z

    by the CLT.

    Theorem 5.5.5. (Slutskys Theorem) If Xnd X and Yn p some con-

    stant a then

    (a) XnYnd aX

    (b) Xn + Ynd X+ a.

    Example. Let X1, X2, . . . , X n be iid with mean and variance

    2. Showthat

    n(Xn )/S d Z N(0, 1).

    Proof.

    n(Xn

    )/S =

    Xn

    /n

    Swhere the first term

    d Z by the CLT and the second term p 1 by theWLLN. The desired result follows from Slutskys Theorem. Theorem 5.5.6. (Continuous mapping theorem) If Xn

    d X and g is acontinuous function then g(Xn)

    d g(X).

    Example. Let X1, X2, . . . , X n be iid with finite 4th moment. Derive the

    asymptotic distribution of the sample variance.Proof.

    S2n =1

    n 1ni=1

    (Xi Xn)2 = 1n 1

    ni=1

    (Xi )2 nn 1(Xn )

    2.

    19

  • 8/6/2019 Ch5 Guzel Ders Notu

    20/25

    Hence

    S2n 2 =1

    n 1ni=1

    [(Xi Xn)2 2] + 1n 1

    2 nn 1(Xn )

    2.

    Thus,

    n 1n

    (S2n 2) =1n

    ni=1

    [(Xi )2 2] + 1n

    2 1n

    [

    n(Xn )]2.

    By the CLT,

    n(Xn ) d N(0, 2).

    By the continuous mapping theorem,

    [

    n(Xn )]2 d 221.Hence,

    1n

    [

    n(Xn )]2 p 0.

    Also by the CLT,

    1n

    ni=1

    [(Xi )2 2] d N(0, 4)

    where is the 4th central moment of X. It follows from Slutskys Theoremthat

    n 1n

    (S2n 2) d N(0, 4),

    and equivalently,

    n(S2n 2) d N(0, 4).

    20

  • 8/6/2019 Ch5 Guzel Ders Notu

    21/25

    Theorem 5.5.7. (Delta method) Let Yn be a sequence of rvs and be aconstant such that

    n(Yn ) d N(0, 2).

    Let g be a function with a non-zero first derivative at . Thenn[g(Yn) g()] d N(0, 2[g()]2).

    Proof. By the Taylor expansion

    g(Yn) = g() + g()(Yn ) + Rn

    where where the remainder Rn = o(Yn ). It then follows from the assump-tion

    n(Yn

    )

    d

    N(0, 2) that

    nRn =

    n(Yn )o(1) p 0.As a result

    n[g(Yn) g()] = g()

    n(Yn ) +

    nRnd g()Z N(0, 2[g()]2).

    Example. Let X1, X2, . . . , X n be iid exponential with mean 1/ (i.e. pdf ofX1 is f(x) = e

    xI(0,)(x)). Then a natural estimator of is 1/Xn. What

    is a (1 )100% confidence interval for ? By the CLT,

    n(Xn 1) d N(0, 2).Let g(x) = 1/x,x > 0. Then g(1) = 2

    n(

    1

    Xn ) = n[g(Xn) g(1)] d N(0, 2).

    Further, since Xnp 1, by Slutskys Theorem we have

    nXn( 1Xn

    ) d N(0, 1).

    Hence an approximate (1 )100% is1

    Xn z/2

    1nXn

    .

    21

  • 8/6/2019 Ch5 Guzel Ders Notu

    22/25

    Theorem 5.5.8. (Second-order delta method) Let Yn be a sequence of rvsand be a constant such that

    n(Yn ) d N(0, 2).Let g be a function such that g() = 0 and g() > 0. Then

    n[g(Yn) g()] d 2 g()2

    21

    Proof. By the Taylor expansion

    g(Yn) = g() + g()Yn 2

    + Rn

    where where the remainder Rn = o((Yn )2). As before

    nRn = [

    n(Yn )]2o(1) p 0.Hence

    n[g(Yn)

    g()] = g()

    [

    n(Yn )]2

    2

    + nRnd

    g()2

    Z2

    2

    Example. Suppose that X1, . . . , X n are iid. with mean and variance 2.

    What is the asymptotic distribution of X2n 2?Solution. First assume that = 0. Applying the delta method withf(x) = x2, we have

    n(X2n

    2)

    d

    N(0, 2[2]2).

    If = 0, by the second-order delta method,

    n(X2n 0) d 221.

    22

  • 8/6/2019 Ch5 Guzel Ders Notu

    23/25

    Example. Suppose that X1, . . . , X n are iid. with a continuous distribution.Derive the asymptotic distribution of the empirical distribution

    Fn(x) =1

    n

    n

    i=1

    I(Xi

    x).

    Solution. The mean and variance of the random variable I(Xi x) areF(x) and F(x)(1 F(x)), respectively. By the CLT,

    n(Fn(x) F(x)) d N(0, F(x)(1 F(x))).

    Example. Suppose that X1, . . . , X n are iid. with a continuous distribution.Derive the asymptotic distribution of X([np]).

    Solution. Assume first that the Xi are uniform. First it is clear thatX([np])

    p to the p-th population percentile. Hence one should center X([np])by p. The question is what should an be so that an(X([np]) p) converges indistribution. Observe that

    P(an(X([np]) p) x) = P(X([np]) p + x/an)

    = P n

    i=1I(Xi p + x/an) [np] .

    Let

    Yn(x) =1n

    ni=1

    [I(Xi p + x/an) (p + x/an)],

    and write

    Yn(x) = Yn(0) + [Yn(x) Yn(0)].By the previous example,

    Yn(0)d N(0, p(1 p)).

    23

  • 8/6/2019 Ch5 Guzel Ders Notu

    24/25

    Now,

    Yn(x) Yn(0) = 1n

    ni=1

    [I(p < Xi p + x/an) x/an]

    and hence

    Var(Yn(x) Yn(0)) = 1n

    nVar(I(p < X1 p + x/an))= (x/an)(1 x/an) 0 as n .

    As a result,

    Yn(x)d N(0, p(1 p)) for each fixed x.

    Now

    P

    ni=1

    I(Xi p + x/an) [np]

    = P

    nYn(x) + n(p + x/an) [np]

    = P

    Yn(x) [np] n(p + x/an)

    n

    .

    In order for this to converge, we need

    [np]

    n(p+x/an)

    n to converge, which isequivalent to an = cn for some constant c. Taking an = n, we have

    [np] n(p + x/an)n

    x

    and therefore

    limn

    P

    Yn(x) [np] n(p + x/an)

    n

    = P

    p(1 p)Z x= P

    p(1 p)Z x

    = (x/

    p(1 p)).

    24

  • 8/6/2019 Ch5 Guzel Ders Notu

    25/25

    Thus,

    n(X([np]) p) d N(0, p(1 p)).

    Next assume that the Xi are absolutely continuous with a pdf f. For con-venience assume that F is one-to-one with inverse function F1. We wishto find constants an, bn such that an(X([np]) F1(p)) converges in distribu-tion. Note that our sample X1, . . . , X n has the same joint distribution asF1(U1), . . . , F 1(Un) where the Ui are iid. Hence

    an(X([np]) F1(p)) = an(F1(U([np])) F1(p)).By the previous part and the delta method,

    n(F1(U([np])) F1(p))d

    N(0, c2p(1 p))where

    c = (F1)(p) =1

    f(F1(p)).

    This concludes the derivation.

    25