bayesianlesson0

Upload: skoobooks

Post on 03-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Bayesianlesson0

    1/42

    Common distribution

    Lesson 0: Preliminaries

    Le Thi Xuan Mai

    The university of natural sciences

    February 24, 2013

    T.X.M. Le Bayesian statistics

    http://find/
  • 7/28/2019 Bayesianlesson0

    2/42

    Common distribution

    Discrete random variables

    T.X.M. Le Bayesian statistics

    http://find/
  • 7/28/2019 Bayesianlesson0

    3/42

    Common distribution

    The binomial distribution

    As its name suggests, the binomial distribution refers to randomvariables with two outcomes.

    Example:

    smoking status: a person does or does not smoke

    healthy insurance coverage: a person does or does not havehealth insurance.

    We now consider the first example to demonstrate the calculationof binomial probabilities.

    Suppose that four adults have been randomly selected and askedwhether or not they currently smoke. The random variable ofinterest in this example is the number of persons who respondyes to the question about smoking (denoted by X).

    T.X.M. Le Bayesian statistics

    http://find/
  • 7/28/2019 Bayesianlesson0

    4/42

    Common distribution

    Suppose that in the population, the proportion of people who

    would respond yes to this question is , and the probability of aresponse of no is then 1 . Since each person is independentof all the other persons, the binomial probability can be calculatedas follows

    P(X = 4) = 4 (the number of 4 responses is 4)

    P(X = 3) = 43(1 ) (since there are four occurrences ofthree yes responses)

    P(X = 2) = 62(1 )2

    P(X = 1) = 4(1 )3

    P(X = 0) = (1 )4

    T.X.M. Le Bayesian statistics

    C di ib i

    http://find/
  • 7/28/2019 Bayesianlesson0

    5/42

    Common distribution

    Suppose is 0.25. Then the probability of each outcome is as

    follows

    P(X = 4) = C04 (0.25)4 (0.75)0 = 0.0039 = P{0 no responses}

    P(X = 3) = C14

    (0.25)3

    (0.75)1 = 0.0469 = P

    {1 no response

    }P(X = 2) = C24 (0.25)2 (0.75)2 = 0.2109 = P{2 no response}P(X = 1) = C34 (0.25)1 (0.75)3 = 0.4219 = P{3 no response}P(X = 0) = C44 (0.25)0 (0.75)4 = 0.3164 = P{4 no responses}This lead us to define a binomial random variable and calculate itsprobability distribution.

    T.X.M. Le Bayesian statistics

    C di t ib ti

    http://find/http://goback/
  • 7/28/2019 Bayesianlesson0

    6/42

    Common distribution

    A random variable Y {0, 1, . . . , n} has a binomial (n, )distribution if [0, 1] and

    P(Y = y|, n) = Cyn y(1 )ny for y {0, 1, . . . , n}.For this distribution,

    E[Y|] = n;Var[Y|] = n(1 );mode[Y|] = [(n + 1)];p(y|, n) = dbinom(y, n, ).

    IfY1 binomial(n1, ) and Y2 binomial(n2, ) areindependent, then Y = Y1 + Y2 binomial(n1 + n2, ). When n = 1 this distribution is called the Bernoulli distribution.

    The binomial(n, ) model assume that Y is a sum ofindependent binary() random variables.

    T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    7/42

    Common distribution

    Figure : Binomial distribution with n = 10 and = 0.2.

    T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    8/42

    Common distribution

    Figure : Binomial distribution with n = 10 and = 0.8.

    T.X.M. Le Bayesian statistics

    Common distribution

    http://find/http://goback/
  • 7/28/2019 Bayesianlesson0

    9/42

    Common distribution

    Figure : Binomial distribution with n = 100 and = 0.2.

    T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    10/42

    Common distribution

    Figure : Binomial distribution with n = 100 and = 0.8.

    T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    11/42

    The negative binomial distribution

    A random variable Y {0, 1, . . .} has a negative binomial(r, )distribution for any positive integer r if 0 1 and

    P(y|r, ) = Cyy+r1y(1 )r.

    For this distribution,

    E[Y|r, ] = r/(1 );Var[Y|r, ] = r/(1 )2;mode[Y

    |r, ] = [(1

    )(r

    1)/] if r > 1, mode[Y

    |r, ] = 0 ifr

    1

    p(y|) = dnbinom(y, ).

    T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    12/42

    Figure : Negative binomial distribution.

    T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    13/42

    The name negative binomial can be replaced by inversebinomial because it arises from a series of Bernoulli trials wherethe number of successes is fixed at the outset. In this case, Y isthe number of failure preceding the rth success.

    This distribution is sometimes used to model countably infiniterandom variables having variance substantially larger than themean (so that the Poisson model would be inappropriate).

    The geometric is the special case of the negative binomialhaving r = 1. It is the nuber of failures preceding the first successin a series of Bernoulli trials.

    T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    14/42

    The Poisson distribution

    A random variable Y {0, 1, . . .} has a Poisson() distribution if > 0 and

    P(Y = y|) = ye

    y!for y {0, 1, . . .}.

    For this distribution,

    E[Y|] = ;Var[Y|] = ;mode[Y|] = [];p(y|) = dpois(y, ).

    IfY1 Poisson(1) and Y2 Poisson(2) are independent,then Y = Y1 + Y2 Poisson(1 + 2). If it is observed that a sample mean is very different than thesample variance, then the Poisson model may not be appropriate.

    T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    15/42

    Figure : Poisson distribution with = 1, 4, 10.

    T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    16/42

    Continuous random variables

    T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    17/42

    The univariate normal distribution

    A random variable Y R has a normal(, 2) distribution if2 > 0 and

    p(y|, 2) = 122

    e12

    (y)2/2 for < y < .

    For this distribution,

    E[Y|, 2] = ;Var[Y|, 2] = 2;mode[Y|, 2] = ;p(y

    |, 2) = dnorm(y, theta, sigma).

    IfX1 normal(1, 21) and X2 normal(2, 22) areindependent, then

    aX1 + bX2 + c

    normal(a1 + b2 + c, a

    221 + b222).

    T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    18/42

    Figure : Some univariate normal distributions.

    T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    19/42

    The t distribution

    A random variable Y R has a t distribution() if > 0, and

    p(y|) = [(+ 1)/2](/2)

    1 + y

    2

    ](+1)/2.

    For this distribution,

    E[Y

    |] = 0 if > 1;

    Var[Y|] = /( 2) if > 2, Var[Y|] = if 1 < 2;mode = 0median = 0p(y|) = dt(y, nu).

    The parameter is referred to as the degrees of freedom and isusually taken to be a positive integer, though the distribution isproper for any positive real number .

    The t is a common heavy-tailed (but still symmetric and

    unimodal) alternative to the normal distribution.T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    20/42

    Figure : Some t distributions.

    T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    21/42

    The continuous uniform probability distribution

    If a continuous r.v Y can assume any value in the intervala

    y

    b and only these values, and if its probability density

    function p(y|a, b) is constant over that interval and equal to zeroelsewhere, then Y is said to be uniformly distributed and itsdistribution is called a continuous uniform probability distributionor continuous rectangular distribution. Its density distribution is of

    the formp(y|a, b) =

    1

    bafor a y b

    0 else where(1)

    For this distribution,

    E[Y] = a+b2

    Var[Y] = 112 (b a)2median = a+b2

    mode = any value in[a, b].T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    22/42

    Figure : Uniform distribution.

    T.X.M. Le Bayesian statistics

    Common distribution

    h b d b

    http://find/
  • 7/28/2019 Bayesianlesson0

    23/42

    The beta distribution

    A random variable Y [0, 1] has a beta(a, b) distribution ifa > 0,b> 0 and

    p(y|a, b) = (a + b)(a)(b)

    ya1(1 y)b1 for 0 y 1.

    E[Y|a, b] = a

    a+b

    ;

    Var[Y|a, b] = ab(a+b+1)(a+b)2 = E[Y] E[1 Y] 1a+b+1 ;mode[Y|a, b] = a1(a1)+(b1) ifa > 1 and b> 1;p(y

    |a, b) = dbeta(y, a, b).

    The beta distribution is closely related to the gammadistribution.

    A multivariate version of the beta distribution is the Dirichletdistribution.

    T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    24/42

    Figure : Beta distribution.

    T.X.M. Le Bayesian statistics

    Common distribution

    Th d i di ib i

    http://find/
  • 7/28/2019 Bayesianlesson0

    25/42

    The gamma and inverse-gamma distributions

    A random variable Y (0,) has a gamma(a, b) distribution if

    p(y|a, b) = ba

    (a)ya1eby for y > 0, a > 0, b> 0.

    E[Y|a, b] = ab

    ; Var[Y|a, b] = ab2

    ;

    mode[Y|a, b] = a1b

    ifa 1, mode[Y|a, b] = 0 if 0 < a < 1;p(y|a, b) = dgamma(y, a, b). IfY1 gamma(a1, b) and Y2 gamma(a2, b) are independent,then Y1 + Y2 gamma(a1 + a2, b) and Y1/(Y1 + Y2) beta(a1, a2). IfY normal(0, 2) then Y2 gamma(1/2, 1/22). The chi-square distribution with degrees of freedom is thesame as a gamma(/2, 1/2) distribution.

    IfY

    normal(0, 1) then Y2

    chi-square(1).

    T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    26/42

    Figure : Gamma distributions.

    T.X.M. Le Bayesian statistics

    Common distribution

    A d bl Y ( ) h ( b)

    http://find/http://goback/
  • 7/28/2019 Bayesianlesson0

    27/42

    A random variable Y (0,) has an inverse-gamma(a, b)distribution if 1/Y has a gamma(a, b) distribution. In other words,ifX gamma(a, b) and Y = 1/X, then

    Y inverse-gamma(a, b). The density ofY isp(y|a, b) = b

    a

    (a)ya1eb/y for y > 0.

    For this distribution,

    E[Y|a, b] = ba1 ifa 1,E[Y|a, b] = if 0 < a < 1;

    Var[Y|a, b] = b2(a1)2(a2)

    ifa 2, Var[Y|a, b] = if 0 < a < 2;mode[X|a, b] = b

    a+1 .

    Note that the inverse-gamma density is not simply the gammadensity with y replaced by 1/y.

    The inverse-2() is a special case of the inverse-gammadistribution, with a = /2 and b= 1/2.

    T.X.M. Le Bayesian statistics

    Common distribution

    The scaled in erse chisq are distrib tion

    http://find/
  • 7/28/2019 Bayesianlesson0

    28/42

    The scaled inverse chisquare distribution

    The family of scaled inverse chi-squared distributions is closelyrelated to two other distribution families: the inverse-chi-squared

    distribution and the inverse gamma distribution. Compared to the inverse-chi-squared distribution, the scaleddistribution has an extra parameter 2, which scales thedistribution horizontally and vertically. The relationship between

    the inverse-chi-squared distribution the scaled distribution is that if

    X Scale-inv-2(, 2) then X2

    inv-2()The probability density function of the scaled inverse chi-squareddistribution extends over the domain x> 0 and is

    f(x, , 2) =(2/2)/2

    (/2)x/21 exp(

    2

    2x)

    where is the degrees of freedom parameter and 2 is the scale

    parameter. T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    29/42

    EX =2

    2 for > 2Var(X) =

    224

    ( 2)2( 4) for > 4

    mode(X) =2

    + 2 IfX Scale-inv-2(, 2) then kX Scale-inv-2(, k2)

    IfX Scale-inv-2(, 2) then X2

    inv-2()

    IfX Scale-inv-2(, 2) then X Inv-Gamma

    2

    , 2

    2

    T.X.M. Le Bayesian statistics

    Common distribution

    Exponential distributions

    http://find/
  • 7/28/2019 Bayesianlesson0

    30/42

    Exponential distributions

    A random variable Y > 0 has an exponential distribution() if > 0 and

    p(y|) = exp(y).For this distribution,

    E[Y|] = 1/;Var[Y|] = 1/2;mode[Y|] = 0;

    p(y|) = dexp(y, lambda).The exponential distribution is a Gamma(1, ) distribution.

    T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    31/42

    Figure : Exponential distributions.

    T.X.M. Le Bayesian statistics

    Common distribution

    The double exponential distribution (Laplace distribution)

    http://find/
  • 7/28/2019 Bayesianlesson0

    32/42

    The double exponential distribution (Laplace distribution)

    A random variable Y R has a Laplace distribution(, 2) if

    < 0 and

    p(y|, 2) = 12

    exp(|y |

    ).

    For this distribution,

    E[Y|, 2] = ;Var[Y|, 2] = 22;mode[Y|, 2] = ;p(y|, 2) = dlaplace(y, mu, sigma).

    This distribution is symmetric and unimodal, but has heavier tailsand a somewhat different shape, being strictly concave up on bothsides of .

    T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    33/42

    Figure : Laplace distributions.

    T.X.M. Le Bayesian statistics

    Common distribution

    The logistic distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    34/42

    The logistic distribution

    A random variable Y R has a Logistic distribution(, 2) if

    < 0 and

    p(y|, 2) = exp(y )

    [1 + exp( y )]2.

    For this distribution,

    E[Y|, 2] = ;Var[Y|, 2] = (2/3)2;mode[Y

    |, 2] = ;

    p(y|, 2) = dlogis(y, mu, sigma).

    The logistic is another symmetric unimodal distribution, moresimilar to the normal in appearance than the double exponential,

    but with even heavier tails.T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    35/42

    Figure : Some logistic distributions.

    T.X.M. Le Bayesian statistics

    Common distribution

    Cauchy distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    36/42

    Cauchy distribution

    A random variable Y R has a Cauchy distribution (, 2) if > 0, < < , 2 > 0 and

    p(y

    |, 2) =

    1

    [1 + (x

    )2

    ]

    .

    For this distribution, E[Y] and Var[Y] do not exist, though isthe median of this distribution.

    This is a special case of the t distribution t(1, , 2) which has theheaviest possible tails.

    T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    37/42

    Figure : Some Cauchy distributions.

    T.X.M. Le Bayesian statistics

    Common distribution

    The multivariate normal distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    38/42

    A random vector Y Rp

    has a multivariate normal distributionnormal(, ) distribution if is a positive definite p pmatrix and

    p(y|, ) = (2)p/2||1/2 exp{12

    (y)T(y)} for y Rp.

    For this distribution,

    E[Y|, ] = ;Var[Y|, ] = ;mode[Y|, ] = .

    T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    39/42

    IfY1 normal(1, 1) and Y2 normal(2, 2) areindependent, then

    aY1 + bY2 + c normal(a1 + b2 + c, a21 + b22).

    IfZ is the vector with elements Z1, . . . ,Zp i.i.d. normal(0, 1)then

    Y = + AZ

    multivariate normal(, ),

    where and AAT = (A is the Choleski factorization of ).

    The following R-code will generate an n pmatrix such thatthe rows are i.i.d samples from a multivariate normal distribution:

    Z = matrix(rnorm(n p), nrow = n, ncol = p)Y = t(t(Z% %chol(Sigma)) + c(theta)).

    T.X.M. Le Bayesian statistics

    Common distribution

    The Wishart and inverse-Wishart disttributions

    http://find/
  • 7/28/2019 Bayesianlesson0

    40/42

    A random p p symmetric positive definite matrix Y has aWishart(,M) distribution if the integer

    p, M is a p

    p

    symmetric positive definite matrix and

    p(Y|,M) = [2p/2p(/2)|M|/2]1|Y|(p1)/2 exp{tr(M1Y/2)}where

    p(/2) = p(p

    1)/4

    pj=1

    [(+ 1 j)/2], and

    exp{tr(A)} = exp(

    aj,j).

    For this distribution,

    E[Y|,M] = M;Var[Yi,j|,M] = (m2i,j + mi,imj,j);mode[Y|,M] = ( p 1)M.

    T.X.M. Le Bayesian statistics

    Common distribution

    http://find/
  • 7/28/2019 Bayesianlesson0

    41/42

    The Wishart distribution is a multivariate version of the gammadistribution.

    IfX1, . . . ,X i.i.d. multivariate normal(0,Mpp), then

    i=1

    XiXTi Wishart (,Mpp).

    The following R-code is used to generate a Wishart distributedrandom matrix:

    X = matrix(rnorm( p),nrow=nu,ncol=p) #standard normal matZ = X%

    %chol(M) #

    Y = t(Z)% %Z #Wishart matrix

    T.X.M. Le Bayesian statistics

    Common distribution

    A random p p symmetric positive definite matrix Y has an

    http://find/
  • 7/28/2019 Bayesianlesson0

    42/42

    inverse-Wishart(,M) distribution ifX1 has a Wishart(,M)distribution. In other words, ifX Wishart(,M) and X = Y1,then Y

    inverse-Wishart(,M). The density ofY is

    p(Y|,M) =[2p/2p(/2)|M|/2]1 |Y|(+p+1)/2 exp{tr(M1Y1/2)}.

    For this distribution,

    E[Y|,M] = ( p 1)1M1;mode[Y|,M] = (+ p+ 1)1M1.

    If inverse-Wishart(, S1

    ), we havemode[Y|, S] = (+ p+ 1)1S. If 0 were the most probable value of a priori, then we couldset S= (0 + p+ 1)0, so that

    inverse-Wishart(, [(+ p

    1)0]

    1) and mode[

    |, S] = 0.

    T.X.M. Le Bayesian statistics

    http://find/