[김재영] probability
TRANSCRIPT
Studies in Economic Statistics Jae-Young Kim
1 Introduction to Probability
1.1 Introduction
Definition 1.1 (Probability Space). A probability space is a triple (Ω,F , P)where,
1. Ω (Sample Space): the set of all possible outcomes of a random experiment.
2. F (σ-field or σ-algebra): a collection of subsets of Ω.
3. P (Probability Measure): a real-valued function defined on F .
Example 1.1 (Tossing a Coin).
• Ω = H, T
• F = ∅, H, T, H, T
• P(∅) = 0
• P(H) = P(T) = 1/2
• P(H, T) = 1
Definition 1.2 (σ-field (σ-algebra)). A class F of subsets of Ω is called σ-fieldor σ-algebra if it satisfies:
1. Ω ∈ F
2. For A ∈ F , Ac ∈ F
3. For Ai ∈ F , i = 1, 2, · · ·, ∪iAi ∈ F
Remarks
• A σ-field is always a field, but not vice versa.
• An element A ∈ F is called an event.
• An element ω ∈ Ω is called an outcome.
1
Studies in Economic Statistics Jae-Young Kim
Definition 1.3 (The smallest σ-field generated by A, σ(A)). Let A be a classof subsets of Ω. Consider a class that is the intersection of all the σ-field containingA; it is called σ-field generated by A and is denoted by σ(A). σ(A) satisfies
1. A ⊂ σ(A).
2. σ(A) is a σ-field.
3. If A ⊂ G, and G is a σ-field, then σ(A) ⊂ G.
Example 1.2 (σ(A)).
• Ω = 1,2,3,4,5,6
• A = 1,3,5
• A = A
⇒ σ(A) = A, Ac, ∅, Ω
Definition 1.4 (Probability Measure). A real-valued set function defined on aσ-field is a probability measure if it satisfies
1. P(A) ≥ 0, ∀A ∈ F
2. P(Ω) = 1
3. For Ai ∩ Aj = ∅, i = j, P(∪
i Ai) = ∑i P(Ai)
Remarks
• The three properties given above are often referred to as the axioms ofprobability.
• A probability (measure) has the range on [0, 1], and a measure has therange on [0, ∞].
Definition 1.5 (Lebesque Measure). First we define µ on an open interval in thenatural way. Note that any open set in R can be represented as countable union ofdisjoint open intervals.
• Outer measure of A
µ∗(A) = inf ∑A⊂∪kCk
µ(Ck), Ck: Open covering
2
Studies in Economic Statistics Jae-Young Kim
• Inner measure of Aµ∗(A) = 1 − µ∗(Ac)
• Lebesque Measure: µ(A) = µ∗(A) = µ∗(A)
Theorem 1.1 (Unique Extension). A probability measure on a field F0 has aunique extension in a σ-field generated by F0.
1. Let P be a probability measure on F0 and let F=σ(F0).Then, there exists aprobability measure Q on F such that Q(A) = P(A) for A ∈ F0.
2. Let Q′ be another probability measure on F such that Q′(A) = P(A) for A∈ F0.Then Q′(A) = Q(A) for A ∈ F .
3. For Ai ∈ F , Ai ∩ Aj = ϕ,∪∞
i=1 Ai ∈ F , Q is countably additive.
Theorem 1.2 (Properties of Probability Measure).
1. For A ⊂ B, P(A) ≤ P(B).ProofHint: P(B - A) = P(B) - P(A)
2. P(A ∪ B) = P(A) + P(B) - P(A ∩ B).ProofHint: A ∪ B = A ∪ (B ∩ Ac)
3. P(A ∪ B) ≤ P(A)+ P(B)
• Extension
P(n∪
k=1
Ak) =n
∑k=1
P(Ak) − ∑i<j
P(Ai ∩ Aj)+
· · · + (−1)n+1P(A1 ∩ A2 ∩ · · · ∩ An)
• Boole’s inequality
P(∞∪
i=1
Ai) ≤∞
∑i=1
P(Ai)
3
Studies in Economic Statistics Jae-Young Kim
1.2 Some Limit Concepts of Probability
Definition 1.6 (Limit of Events for Monotone Sequences). Let En be a se-quence of events. En is monotone when E1 ⊂ E2 ⊂ · · · or E1 ⊃ E2 ⊃ · · · .
1. Monotone increasing sequence of events :
E1 ⊂ E2 ⊂ . . . → (lim En =∞∪
n=1
En)
2. Monotone decreasing sequence of events :
E1 ⊃ E2 ⊃ . . . → (lim En =∞∩
n=1
En)
Theorem 1.3 (A monotone sequence of events En).
P(lim En) = lim P(En)
Proof.
• E0 = ϕ, En: monotone increasing
• Fn = En − En−1, P(Fi) = P(Ei) − P(Ei−1)
• P(∪n
i=1 Fi) = ∑ni=1 P(Fi) = P(En) = P(
∪ni=1 Ei)
Definition 1.7 (Limit Supremum and Limit Infimum of Events). For a se-quence of events En, define
lim supn
En =∞∩
n=1
∞∪k=n
Ek (∀n ≥ 1, ∃k ≥ n such that ω ∈ Ek, En infinitely often)
lim infn
En =∞∪
n=1
∞∩k=n
Ek (∃n ≥ 1 such that ∀k ≥ n, ω ∈ Ek, En eventually)
lim En = lim sup En = lim inf En
Lemma 1.1 (Borel - Cantelli). Let En be a sequence of events.
If∞
∑i=1
P(Ei) < ∞, then P(lim sup En) = 0
4
Studies in Economic Statistics Jae-Young Kim
Proof.
P(lim sup En) = P(∞∩
n=1
∞∪k=n
Ek) ≤ P(∞∪
k=n
Ek) ≤∞
∑k=n
P(Ek) → 0
RemarksNote that if P(En) → 0, P(lim inf En) = 0
Lemma 1.2 (2nd Borel - Cantelli Lemma). Let En be a independent sequenceof events.
If∞
∑i=1
P(Ei) = ∞, then P(lim sup En) = 1
1.3 Conditional Probability and Independence
Definition 1.8 (Conditional Probability). For an event A s.t P(A) > 0, theconditional probability of A given B is defined as
P(A | B) =P(A ∩ B)
P(B)Definition 1.9 (Independence: A ⊥ B). Let A, B ∈ F , B = ϕ
• If A ⊥ B, then P(A ∩ B) = P(A)P(B).
• If A ⊥ B, then P(A | B) = P(A).
• P(A | B) = P(A∩B)P(B) = P(A)P(B)
P(B) = P(A)
RemarksIf A or B is empty, then they are always independent.
Definition 1.10 (Pairwise Independence).
• Let Γ be a class of subsets of Ω.
• For any pair A, B ∈ Γ, if P(A ∩ B) = P(A)P(B), then events in Γ arepairwise independent.
Definition 1.11 (Mutual Independence).
• Let Γ be a class of subsets of Ω.
• For any collection of events (Ai1, . . . , Aik), i, k = 1, 2, . . . in Γ, if P(Ai1 ∩Ai2 ∩ · · · ∩ Aik) = Πk
j=1P(Aij), then events in Γ are mutually independentor completely indempendent.
5
Studies in Economic Statistics Jae-Young Kim
1.4 Bayes Theorem
Theorem 1.4 (Bayes Theorem). For A, B ∈ F , P(A) > 0, P(B) > 0,
• P(B | A) = P(A∩B)P(A) = P(A|B)P(B)
P(A|B)P(B)+P(A|Bc)P(Bc)
• P(A | B) = P(A∩B)P(B) = P(B|A)P(A)
P(B|A)P(A)+P(B|Ac)P(Ac)
Remarks A Partition Ai of Ω
• Ai, i = 1, 2, . . . , n
• Ai is a partition of Ω if it satisfies
(i)∪n
i=1 Ai = Ω
(ii) Ai ∩ Aj = ϕ, i = j
• Ai, i = 1, 2, . . . , n, a partition of Ω, P(Ai) > 0
• For every B ∈ F , P(B) > 0,
• P(Ai | B) = P(B|Ai)P(Ai)Σn
i=1P(B|Ai)P(Ai)
Remarks Bayesian Approach
• On a probability space (Ω,F , P)
• Events H ∈ F , P(· | H) = PH
• Let Hi be a partition of Ω, which are unobservable events.
• Let B ⊂ Ω be observable.
• P(Hi | B) = P(Hi)P(B|Hi)∑n
i=1 P(Hi)P(B|Hi)
Remarks Classical VS Bayesian Approach
Y = Xβ + ε
• Classical (Frequentist) Approach
(a) X, Y are random variables.
(b) Parameters (β) are fixed.
• Bayesian Approach
(a) Unknowns (Unobservable) are regarded as random variables.
(b) β, ε are random variables.
6
Studies in Economic Statistics Jae-Young Kim
2 Random Variables, Distribution Functions, and Ex-pectation
2.1 Random Variables
Definition 2.1 (Random Variable).
• A finite function X : Ω → R is a random variable (r.v) if for each B ∈ B,X−1(B)=ω : X(ω) ∈ B ∈ F , where B is the Borel σ-algebra on R
Remarks
• A random variable is a real measurable function.
• A random variable X : Ω → R defined on (Ω,F , P) is called F/B-measurable function.
Definition 2.2 (Measurable Mapping).
• Measurable mapping: Generalization of measurable function
• Let (Ω,F ), (Ω′,F ′) be two measurable spaces.
• A mapping T : Ω → Ω′ is said to be F/F ′-measurable if for any B ∈ F ′,T−1(B) = ω ∈ Ω : T(ω) ∈ B ∈ F .
Theorem 2.1.
• Let (Ω,F , P) be a probability space.
• Let X be a random variable defined on Ω.
• Then, the random variable X induces a new probability space (R,B, PX)where X : Ω → R.
Proof.
For B ∈ B, let PX(B) = P[X−1(B)] = P[ω : X(ω) ∈ B].
It is sufficient to show that
1. PX(R) = 1
2. PX(B) ≥ 0 for any B ∈ B3. For Bi, i = 1, 2, . . . , with Bi ∩ Bj = ∅
PX(∪iBi) = ∑i
P(Bi)
7
Studies in Economic Statistics Jae-Young Kim
2.2 Probability Distribution Function
Definition 2.3 (Distribution Function). Let X be a random variable. Given x,a real valued function FX(·) defined as FX(x) = P[ω : X(ω) ≤ x] is called thedistribution function (DF) of a random variable X.
Definition 2.4 (Cumulative distribution function (cdf)).
FX(x) = P[ω : X(ω) ≤ x] = P(X ≤ x) = PX(−∞, x] = PX[r : −∞ < r ≤ x]
FX(x2) − FX(x1) = PX(x1, x2]
Theorem 2.2 (Properties of Distribution Function).
1. limx→−∞ FX(x) = 0, limx→+∞ FX(x) = 1
2. For x1 ≤ x2, FX(x1) ≤ FX(x2) (Monotone and Non-decreasing)
3. lim0<h→0 FX(x + h) = FX(x) (Right Continuity)
RemarksA distribution function is not necessarily left continuous.
Definition 2.5 (Discrete Random Variable). A random variable X is said to bediscrete if the range of X is countable or if there exists E, a countable set, suchthat P(X ∈ E) = 1.
Definition 2.6 (Continuous Random Variable). A random variable X is saidto be continuous if there exists a function fX(·) such that FX(x) =
∫ x−∞ fX(t)dt
for every real number x.
Remarks Another Characterization of Continuous Random Variable
• Let FX(·) be a distribution function (DF) of a random variable X.
(a) A distribution function, FX(·) is absolutely continuous if andonly if there exists a non-negative function f such that
FX(x) =∫ x
−∞f (t)dt ∀x ∈ R
(b) That is, a random variable X is a continuous random variable ifand only if FX(·) is absolutely continuous.
8
Studies in Economic Statistics Jae-Young Kim
Definition 2.7 (Continuity).
• A function f : X → Y is continuous at a point x0 ∈ X if, at x0, for givenany ϵ > 0, ∃δ > 0 such that
ρ(x0, x) < δ ⇒ ρ′[ f (x0), f (x)] < ϵ
where ρ and ρ′ are metrics on X and Y.
• A function f is said to be continuous if it is continuous at each x ∈ X.
Definition 2.8 (Uniform Continuity).
• Let f : X → Y be a mapping from a metric space < X, ρ > to < Y, ρ′ >.
• We say that f is uniformly continuous if for any given ϵ > 0, ∃δ > 0 suchthat, for any x1, x2 ∈ X,
ρ(x1, x2) < δ ⇒ ρ′( f (x1), f (x2)) < ϵ.
Remarks
Uniformly continuous ⇒ Continuous
When f is defined on compact set (closed and bounded set if Rn), Con-tinuous ⇒ Uniformly Continuous.
Definition 2.9 (Absolute Continuity of a Function on Real Line).
• A real-valued function f defined on [a, b] is said to be absolutely continu-ous on [a, b] if, for any given ϵ > 0, ∃δ > 0 such that
k
∑i=1
(ai, bi) < δ ⇒k
∑i=1
| f (bi) − f (ai)| < ϵ
for (ai, bi) pairwise disjoint, i = 1, · · · , k, k being arbitrary.
Remarks
• Absolutely continuous ⇒ Uniformly continuous
• Uniformly continuous ; Absolutely continuous
9
Studies in Economic Statistics Jae-Young Kim
Definition 2.10 (Absolute Continuity of a Measure: P ≪ Q).
• Let P, Q be two σ-finite measures in F .
- For a given ϵ > 0, ∃δ > 0 s.t Q(A) < δ ⇒ P(A) < ϵ.
- If Q(A) = 0 ⇒ P(A) = 0 ∀A ∈ F
⇒ P is absolute continuous with respect to Q or we denote that (P ≪ Q).
Example 2.1.
• P(A) =∫
A f dQ, A ∈ F
• FX(x) =∫ x−∞ f (t)dt
Theorem 2.3 (Radon-Nikodym Theorem). Let P, Q be two σ-finite measuresin F . If P ≪ Q, then there exists f ≥ 0 such that P(A) =
∫A f dQ for any
A ∈ F . We write f = dPdQ and call it Radon-Nikodym derivative.
Definition 2.11 (Probability Mass Function). If X is a discrete random vari-able with distinct values x1, x2, . . . , xk, then the function, denoted by fX(xi) =P[X = xi] = P[ω : X(ω) = xi] such that
• fX(xi) > 0 for x = xi, i = 1, . . . , k
• fX(x) = 0 for x = xi
• ∑ fX(xi) = 1
is said be the probability mass function (pmf) of X.
Remarks
• Some other names of p.m.f are Discrete density function, discrete fre-quency function, and probability function.
• Note that fX(xi) = FX(xi) − FX(xi−1)
Definition 2.12 (Probability Density Function). If X is continuous randomvariable, then the function fX(·) such that FX(x) =
∫ x−∞ fX(t)dt is called the
probability density function of X.
• fX(x) ≥ 0, ∀x
•∫ ∞−∞ fX(x)dx = 1
10
Studies in Economic Statistics Jae-Young Kim
Remarks
• Some other names of p.d.f are Density function, continuous density func-tion, and integrating density function.
• P[X = xi] = 0
• fX(x) = dFX(x)dx
• P(a < X ≤ b) = F(b) − F(a) =∫ b
a f (x)dx
Remarks Decomposition of a Distribution Function
• Any cdf F(x) may be represented in the form of mixed distribution :
FX(x) = p1FDX (x) + p2FC
X (x) where pi ≥ 0, i = 1, 2, p1 + p2 = 1, D:discrete, C: continuous.
Theorem 2.4 (Function of a Random Variable). Let X be a random variableand g be a Borel measurable function.Then, Y = g(X) is also a random variable.
Proof. It suffices to show that Y ≤ y ∈ F to see if Y = g(X) is a randomvariable. That is, Y ≤ y = g(X) ≤ y = ω : X ∈ g−1(−∞, y] ∈ F
2.3 Expectation and Moments
Definition 2.13 (Expected Value). Let X be a random variable. Then, we defineE(X) as expected value, (mathematical) expectation or mean of X.
1. Continuous random variable ⇒ E(X) =∫
x f (x)dx
2. Discrete random variable ⇒ E(X) = ∑ xi fi
Definition 2.14 (Expectation of a Function of Random Variable). Let Y =g(X) be a random variable. Suppose that
∫| g(x) | f (x)dx < ∞. Then, we
define E[Y] = E[g(X)] =∫
g(x) f (x)dx =∫
y f (y)dy.
Theorem 2.5 (Preservation of Monotonicity). Let E[gi(X)] be an expectationfor a real valued function gi of X. Suppose that E(| gi(X) |) =
∫| gi(x) |
f (x)dx < ∞. If g1(x) ≤ g2(x) for all x, then E[g1(X)] ≤ E[g2(X)].
Proof.
Suppose that g1(x) ≤ g2(x) for all x.
11
Studies in Economic Statistics Jae-Young Kim
Then, E[g1(X)] − E[g2(X)] =∫
g1(x) f (x)dx −∫
g2(x) f (x)dx
=∫
[g1(x) − g2(x)] f (x)dx ≤ 0.
Remarks
• Suppose that g1(x) ≤ g2(x) for almost every x and | g1 |< ∞ and| g2 |< ∞. Then, P[ω : g1(X(ω)) ≤ g2(X(ω)) = 1.
• That is, A = ω : g1(x) ≤ g2(x) with P(A) = 1 and Ac = ω :g1(x) > g2(x) with P(Ac) = 0
• Finally, E[g1(X) − g2(X)] =∫
A[g1(x) − g2(x)] f (x)dx +∫
Ac [g1(x) −g2(x)] f (x)dx ≤ 0.
Theorem 2.6 (Properties of Expectation).
1. When c is constant, E(c) = c
2. E(cX) = cE(X) (cf. E(XY | X) = XE(Y | X))
3. Linear Opeartor E(X + Y) = E(X) + E(Y)
4. If X ⊥ Y, then E(XY) = E(X)E(Y)
Proof.
1.∫
c f (x)dx = c∫
f dx = c · 1 = c
2. Trivial.
3. E(X +Y) =∫∫
(x + y) f (x, y)dxdy =∫∫
x f (x, y)dxdy +∫∫
y f (x, y)dxdy=
∫x[
∫f (x, y)dy]dx +
∫y[
∫f (x, y)dx]dy =
∫x f (x)dx +
∫y f (y)dy
= E(X) + E(Y))
4. It is trivial when we use f (x, y) = f (x) f (y).
Definition 2.15 (Moments).
• rth moment of X ⇒ mr = µ′r = E(Xr) =
∫xr f (x)dx
• rth central moment of X ⇒ µr = E[(X−E(X))r] =∫
(X−E(X))r f (x)dx
12
Studies in Economic Statistics Jae-Young Kim
Example 2.2.
1. E(X) = ∑i xi fi, X = 1n ∑ xi
2. Var(X) = E[(X − E(X))2]
3. Skewness = E[(X − E(X))3]
4. Kurtosis = E[(X − E(X))4]
Definition 2.16 (Moment Generating Function). For a continuous randomvariable X,
• MX(t) = E[etx] =∫
etx f (x)dx for −h < t < h, for some small h > 0
• dMX(t)dt =
∫xetx f (x)dx
• dr MX(t)dtr =
∫xretx f (x)dx
• µ′r = E[Xr] = dr MX(t)
dtr |t=0
For a discrete random variable X,
• MX(t) = E[etx] = ∑i etxi f (xi) where ex = ∑∞i=0
1i! x
i
• µ′r = E[Xr] = dr MX(t)
dtr |t=0
Theorem 2.7. For 0 < s < r, if E[| X |r] exists, then E[| X |s] < ∞.
Remarks
• There must exist h > 0 such that MX(t) = E[etx] =∫
etx f (x)dx for−h < t < h.
• The moment generating function (mgf) does not always exist for arandom variable X.
Example 2.3.
• Consider the r.v X having pdf f (x) = x−2 I[1,∞)(x).
⇒ If the mgf of X exists, then it is given by∫ ∞
1 x−2etxdx by the definition ofmgf. However, it can be shown that the integral does not exist for any t > 0.In fact, E[X] = ∞.
13
Studies in Economic Statistics Jae-Young Kim
• Cauchy distribution: t(1)
⇒ E[X] = ∞ and thus all the moments do not exist.
Definition 2.17 (Characteristic Function).
• ϕX(t) = E[eitX] =∫
eitx f (x)dx where i =√−1
c f . eiy = cos(y) + isin(y)
Remarks
• ϕX(t) ⇔ FX: Characteristic function exists for any random variableX.
• | eitx |=| cost(tx) + isin(tx) |= cos2(tx) + sin2(tx) = 1
• drϕX(t)dtr |t=0= E[(itX)r] = irµ
′r
• MX(t) → mr
• FX(x) ⇔ mr for all r (if mr exists for every r)
2.4 Characteristics of Distribution
Location (Representative Value)
1. Expectation: µ = µ1 = E(X) =∫
x f (x)dx
(a) E(c) = c
(b) E(cX) = cE(X)
(c) E(X + Y) = E(X) + E(Y)
(d) If X⊥Y, then E(XY) = E(X)E(Y).
2. αth-Quantile ξα: the smallest ξ such that FX(ξ) ≤ α
3. Median: 0.5th quantile
(a) m or Xmed such that P(X < m) ≤ 12 and P(X > m) ≤ 1
2
(b) In a symmetric distribution, E(X) = m.
4. Mode: Xmod
(a) A mode of a distribution of one random variable X is a valueof x that maximizes the pdf or pmf.
(b) There may be more than one mode. Also, there may be nomode at all.
14
Studies in Economic Statistics Jae-Young Kim
Measures of Dispersion
1. Variance: µ2 = Var(X) = E[(X − µ)2]
(a) Var(c) = 0
(b) Var(cX) = c2Var(X)
(c) Var(a + bX) = b2Var(X)
2. Standard Deviation: SD(X) =√
Var(X) (cf. SD(a + bX) =|b|SD(X))
3. Interquantile Range: ξ0.75 − ξ0.25
– This is useful for an asymmetric distribution.
Skewness
1. Skewness: µ3 = E[(X − µ)3]
(a) µ3 > 0: skewed to the right
(b) µ3 = 0: symmetric
(c) µ3 < 0: skewed to the left
2. Skewness Coefficient: unit-free measure
µ3
σ3 =E[(X − µ)3]
(E[(X − µ)2])3/2
Kurtosis
1. Kurtosis: µ4 = E[(X − µ)4]
(a) µ4 > 3: long tail (leptokurtic)
(b) µ4 = 3: normal (mesokurtic)
(c) µ4 < 3: short tail (platykurtic)
2. Kurtosis Coefficient: unit-free measure
µ4
σ4 =E[(X − µ)4]
(E[(X − µ)2])4/2
15
Studies in Economic Statistics Jae-Young Kim
2.5 Inequalities
Theorem 2.8 (Markov Inequality). Let X be a random variable and g(·) a non-negative Borel measurable function. Then, for every k > 0,
P[g(X) ≥ k] ≤ E[g(X)]k
Proof.
E[g(X)] =∫
g(x) f (x)dx =∫
X:g(x)≥kg(x) f (x)dx +
∫X:g(x)<k
g(x) f (x)dx
≥∫
X:g(x)≥kg(x) f (x)dx ≥
∫X:g(x)≥k
k f (x)dx
≥ k∫
X:g(x)≥kf (x)dx = kP[g(X) ≥ k]
Example 2.4.
• Apply Markov inequality to g(x) = (X − µ)2, k = r2σ2X
⇒ Chebyshev’s inequality : P[(X − µ)2 ≥ r2σ2X] ≤ 1
r2
• g(x) =| X |, g(x) =| X |α
Theorem 2.9 (Jensen’s Inequality). Let X be a random variable with mean E[X],and let g(·) be a convex function. Then E[g(X)] ≥ g(E[X]).
Proof. Since g(x) is continuous and convex, there exists a line, satisfyingl(x) ≤ g(x) and l(E[X]) = g(E[X]). By definition, l(x) goes through thepoint (E[X], g(E[X])) and we can let l(x) = a + bx. That is,
E[l(X)] = E[(a + bX)] = a + bE[X] = l(E[X])
⇒ (E[X]) = l(E[X]) = E[l(X)] ≤ E[g(X)]
Theorem 2.10 (Holder’s Inequality). Let X, Y be two random variables and p, qare numbers such that p > 1, q > 1, 1
p + 1q = 1. Then,
E[XY] ≤ E[| X |p]1p E[| Y |q]
1q
16
Studies in Economic Statistics Jae-Young Kim
Example 2.5.
Apply Holder’s inequality to p = q = 2
E[XY] ≤ E[X2]12 E[Y2]
12 : Cauchy-Schwarz’s inequality
⇒ Cov(X, Y) ≤√
Var(X)√
Var(Y) (c f . Cov(X, Y) = E[(X−µX)(Y−µY)])
∴ −1 ≤ ρXY = Cov(X,Y)√Var(X)
√Var(Y)
≤ 1
3 Joint and Conditional Distributions, Stochastic In-dependence and More Expectations
3.1 Joint Distribution
Definition 3.1 (n-dimensional Random Variable).
• Let X(ω) = (X1(ω), X2(ω), ·, Xn(ω)) for ω ∈ Ω be an n-dimensionalfunction defined on (Ω,F , P) into Rn
• X(ω) is called n-dimensional random variable if the inverse image of everyn-dimensional interval in Rn, I = (x1, x2, · · · , xn) : −∞ < xi < ai,ai ∈ R, i = 1, 2, · · · , n is in F .
• i.e. X−1(I) = ω : X1(ω) ≤ x1, · · · , Xn(ω) ≤ xn ∈ F .
Theorem 3.1 (Construction of a n-dimensional Random Variable). Let Xi,i = 1, · · · , n be each one-dimensional random variable. Then, X = (X1, · · · , Xn)is an n-dimensional random variable.
Definition 3.2 (Joint Cumulative Distribution Function). Let X be n-dimensionalrandom variable; X = (X1, · · · , Xn). Then, the joint cumulative distributionfunction of X is defined as
FX(x1, · · · , xn) = FX1,··· ,Xn(x1, · · · , xn) = P[ω : X1(ω) ≤ x1; · · · ; Xn(ω) ≤ xn]
for each (x1, · · · , xn) ∈ Rn
Theorem 3.2 (Properties of Joint Cumulative Distribution Function).
1. Non-decreasing with respect to all arguments x1, · · · , xn
17
Studies in Economic Statistics Jae-Young Kim
2. Right continuous with respect to all arguments x1, · · · , xn
c f . lim0<h→0 F(x + h, y) = lim0<h→0 F(x, y + h) = F(x, y)
3. F(+∞, +∞) = 1, FXY(−∞, y) = FXY(x,−∞) = 0 for all x, y
4. F(x2, y2)− F(x2, y1)− F(x1, y2)+ F(x1, y1) ≥ 0 (∵ P[x1 ≤ X ≤ x2, y1 ≤Y ≤ y2] ≥ 0)
Definition 3.3 (Joint Probability Mass Function). Let X = (X1, X2, . . . , Xn)be a discrete random vector with distinct values a1, a2, . . . , ak ∈ Rn. Then thefunction, denoted by fX(ai) = P[X = ai], such that
• fX(x) > 0 for x = ai, i = 1, . . . , k
• fX(x) = 0 for x = ai
• ∑i fX(ai) = 1
is called the joint probability mass function of X.
Definition 3.4 (Joint Probability Density Function). Let X = (X1, X2, . . . , Xn)be a continuous random vector and FX1,...,Xn be its cumulative distribution func-tion. Then the function fX1,...,Xn such that
FX1,...,Xn(x1, x2, . . . , xn) =∫ x1
−∞· · ·
∫ xn
−∞f (t1, t2, . . . , tn)dt1 · · · dtn
exists. That function is called the joint probability density function of X.
Remarks
• f (x1, . . . , xn) ≥ 0,∀(x1, . . . , xn)
• f (x1, . . . , xn) = ∂n F(x1,...,xn)∂x1···∂xn
•∫ ∞−∞ · · ·
∫ ∞−∞ f (t1, t2, . . . , tn)dt1 · · · dtn = 1
3.2 Marginal Distribution
Definition 3.5 (Marginal Distribution). Let X, Y be two random variables. Thenthe marginal distributions of X and Y are:
FX(x) = FXY(x, +∞) = P[X ≤ x, Y < +∞]
FY(y) = FXY(+∞, y) = P[X < +∞, Y ≤ y]
18
Studies in Economic Statistics Jae-Young Kim
Definition 3.6 (Marginal Probability Density Function). Let X, Y be two ran-dom variables and let fX,Y(x, y) be the joint pdf of X, Y. Then marginal probabilitydensity functions of X and Y are:
• (Discrete case)
fX(x) = ∑nj=1 f (xi, yj)
fY(y) = ∑ni=1 f (xi, yj)
• (Continuous case)
fX(x) =∫
f (x, y)dy
fY(y) =∫
f (x, y)dx
3.3 Conditional Distribution
Definition 3.7 (Conditional Probability Distribution Function). Let X, Y betwo random variables. Then the conditional distribution of X given Y is:
FX|Y(x | y) = P(X ≤ x | Y = y)
and the conditional density of X given Y is:
fX|Y(x | y) =∂FX|Y(x | y)
∂x(Continuous)
fX|Y(x | y) = P(X = x | Y = y) (Discrete)
FX|Y(x | y) =∫ x
−∞f (u | y)du
Remarks
• FX|Y(x | y) =∫ x−∞
fX,Y(u,y)fY(y) du
• ∂FX|Y(x|y)∂x = fX,Y(x,y)
fY(y)
Theorem 3.3 (Alternative Derivation of Conditional Density).
fX|Y(x | y) =fX,Y(x, y)
fY(y)if fY(y)>0
19
Studies in Economic Statistics Jae-Young Kim
Proof. First, consider discrete random variables X, Y. Let Ax = w : X(w) =x, By = w : Y(w) = y. Then we have,
fX|Y(x | y) = P(X = x | Y = y) = P(Ax | By) =P(Ax ∩ By)
P(By)
=P(w : X(w) = x, Y(w) = y)
P(w : Y(w) = y) =fX,Y(x, y)
fY(y)
Next, consider continuous random variables X, Y. Let Ax = w : X(w) 5x and Bε = w : y − ε 5 Y(w) 5 y + ε. Define By = limε→0 Bε. Then wehave,
FX|Y(x | y) = P(Ax | By) = limε→0
P(w : X(w) 5 x, y − ε 5 Y(w) 5 y + ε)P(w : y − ε 5 Y(w) 5 y + ε)
=limε→0
12ε
∫ y+εy−ε
∫ x−∞ fX,Y(u, v)dudv
limε→012ε
∫ y+εy−ε fY(v)dv
=∫ x
∞ fX,Y(u, y)dufY(y)
=∫ x
−∞
fX,Y(u, y)fY(y)
du
Therefore, fX|Y(x|y) = fX,Y(x,y)fY(y) .
3.4 Independence of Random Variables
Definition 3.8 (Independence of Random Variables). The random variablesX and Y are said to be independent if
fX,Y(x, y) = fX(x) fY(y) (P(Ax ∩ By) = P(Ax)P(By))
Random variables that are not independent are said to be dependent.
Theorem 3.4. X and Y are independent if and only if
FX,Y(x, y) = FX(x)FY(y) ∀(x, y) ∈ R2
Proof.
⇐) By partial differentiations
⇒) FX,Y(x, y)
=P(ω : X(ω) 5 x, Y(ω) 5 y) = P(w : X(w) 5 x ∩ w : Y(w) 5 y)=P(ω : X(ω) 5 x)P(w : Y(w) 5 y) = FX(x)FY(y)
20
Studies in Economic Statistics Jae-Young Kim
Definition 3.9 (Pairwise and Mutual Independence). Let X1, X2, · · · , Xn berandom variables.
• X1, . . . , Xn are pairwise independent if Xi⊥Xj for ∀i, j = 1, 2, · · · , n, i =j
• X1, . . . , Xn are mutually independent if for any k collection,(Xi1 , Xi2 , . . . , Xik) ∈ (X1, X2, . . . , Xn), k = 2, 3, . . . , n,
FXi1 ,··· ,Xik(xi1 , · · · , xik) =
k
∏j=1
FXij(xij)
Theorem 3.5 (Preservation of Independence). Let X, Y be random variablesand g1, g2 be Borel-measurable functions. If X⊥Y, then g1(X)⊥g2(Y).
Proof.
P(g1(X) 5 x, g2(Y) 5 y) = P(g1(X) ∈ (−∞, x], g2(Y) ∈ (−∞, y])= P(X ∈ g−1
1 (−∞, x], Y ∈ g−12 (−∞, y])
= P(X ∈ g−11 (−∞, x])P(Y ∈ g−1
2 (−∞, y])= P(g1(X) ∈ (−∞, x])P(g2(Y) ∈ (−∞, y])= P(g1(X) ≤ x])P(g2(Y) ≤ y)
Definition 3.10. Identically Distributed Random Variables Let X, Y be randomvariables. X and Y are identically distributed if FX(a) = FY(a) ∀a ∈ R andwe denote X d= Y
Theorem 3.6. If Xi (i = 1, 2, · · · , n) are independent identically distributed,
FX1,··· ,Xn(x1, · · · , xn) =n
∏i=1
FX(xi)
Definition 3.11 (Moment Generating Function of Joint Distribution). For arandom vector X = (X1, X2, · · · , Xn)′, the moment generating function is
mX(t) = E[et′X] = E[et1X1+t2X2+···+tnXn ] < ∞ − hi < ti < hi (i = 1, 2, . . . , n hi > 0)
21
Studies in Economic Statistics Jae-Young Kim
Definition 3.12 (Cross Moments).
µ′r1,r2
= E[Xr11 Xr2
2 ]: (r1, r2)th cross moment
µr1,r2 = E[(X1 − µ1)r1(X2 − µ2)r2 ]: (r1, r2)th cross central moment
Remarks
µ′r1,r2
= ∂r1+r2 MX,Y(t1,t2)∂tr1
1 ∂tr22
|t1=t2=0
(i)r1+r2 µ′r1,r2
= ∂r1+r2 ϕX,Y(t1,t2)∂tr1
1 ∂tr22
|t1=t2=0
(ϕX,Y : Characteristic f unction)
Theorem 3.7. X1, X2, . . . , Xn are mutually independent if and only if
MX1,X2,··· ,Xn(t1, t2, · · · tn) = MX1(t1)MX2(t2) · · · MXn(tn)
Theorem 3.8. Let X⊥Y and g1, g2 be Borel-measurable functions. Then,
E[g1(X)g2(Y)] = E[g1(X)][g2(Y)]
Remarks
• A trivial corollary of the theorem is that X⊥Y ⇒ Cov(X, Y) = 0
Theorem 3.9. Let X1, X2, . . . , Xn be random variables. Let S = ∑ni=1 aiXi. Then,
Var(S) =n
∑i=1
a2i Var(Xi) + ∑
i =jaiajCov(Xi, Xj)
If X1, X2, . . . , Xn are independent,
Var(S) =n
∑i=1
a2i Var(Xi)
22
Studies in Economic Statistics Jae-Young Kim
3.5 Conditional Expectation
Definition 3.13 (Conditional Expectation). Let X be an integrable randomvariable on (Ω,F , P) and that G is a sub σ-field of F (G ⊂ F ). Then there exista random variable E[X|G], called the conditional expected value of X given G,with following properties:
(1) E[X|G] is G-measurable and integrable.
(2) E[X|G] satisfies the functional equation∫G
E[X|G]dP =∫
GXdP, G ∈ G
Definition 3.14 (Conditional Mean). Let X, Y be random variables and h(·) bea Borel-measurable function. Then,
E[h(X)|Y = y] = ∑i
h(xi) f (xi|y) (Discrete)
=∫
h(x) f (x|y)dx (Continuous)
Remarks
E[h(X)|Y] is also a random variable.
Theorem 3.10 (Properties of Conditional Expectation).
1. E[c|Y] = c, c : consant
2. For h1(), h2(), Borel-measurable functions
E[c1h1(X) + c2h2(X)|Y] = c1E[h1(X)|Y] + c2E[h2(X)|Y]
3. P[X ≥ 0] = 1 ⇒ E[X|Y] ≥ 0
4. P[X1 ≥ X2] = 1 ⇒ E[X1|Y] ≥ E[X2|Y]
5. ϕ(·): A function of X, Y ⇒ E[ϕ(X, Y)|Y = y] = E[ϕ(X, y)|Y = y]
6. Ψ(·): A Borel-measurable function ⇒ E[Ψ(X)ϕ(X, Y)|X] = Ψ(X)E[ϕ(X, Y)|X]
23
Studies in Economic Statistics Jae-Young Kim
Theorem 3.11 (Law of Iterated Expectations). Let X, Y be random variablesand E[h(X)] exist. Then,
E[E[h(X)|Y]] = E[h(X)]
Proof. ∫ ∞
−∞
∫ ∞
−∞h(x) fX,Y(x, y)dxdy
=∫ ∞
−∞[∫ ∞
−∞h(x)
fX,Y(x, y)fY(y)
dx] fY(y)dy
=∫ ∞
−∞E[h(X)|y] fY(y)dy = E[E[h(X)|Y]]
= E[h(X)]
Definition 3.15 (Conditional Variance). Let X, Y be random variables and E[X|Y]be a conditional expectation of X given Y. Then,
Var(X|Y) = E[(X − E[X|Y])2|Y]
Theorem 3.12. Let X, Y be random variables with finite variances. Then,
1. Var(X|Y) = E[X2|Y] − (E[X|Y])2
2. Var(X) = E[Var(X|Y)] + Var(E[X|Y])
Proof.
1. E[(X − E[X|Y])2|Y] = E[X2 − 2XE[X|Y] + (E[X|Y])2|Y]=E[X2|Y]-2E[XE[X|Y]|Y]+E[(E[X|Y])2|Y]=E[X2|Y]-(E[X|Y])2
2. E[Var(X|Y)] = E[E[X2|Y] − (E[X|Y])2]=E[X2] − (E[X])2 − (E[(E[X|Y])2] − (E[X])2)=Var(X) − Var(E[X|Y])∴ Var(X) = E[Var(X|Y)] + Var(E[X|Y])
24