eecs 126: probability & random processes fall 2020

EECS 126: Probability & Random ProcessesFall 2021

PageRank

Shyam Parekh

• Originally used by Google for ranking the pages from a keyword search.

𝜋 𝑖 =

𝑗∈𝑋

𝜋 𝑗 𝑃 𝑗, 𝑖 , ∀ 𝑖 ∈ 𝒳 ⟺ 𝜋 = 𝜋𝑃

• Original algorithm also used a damping factor.

• With the normalization condition 𝑖𝜖𝑋 𝜋(𝑖) = 1, π =1

39[12, 9, 10, 6, 2]

• This is similar to a Markov Chain (defined next).

PageRank

𝑃 𝐴, 𝐵 = 1/2, 𝑃 𝐷, 𝐸 = 1/3, …

• DTMC 𝑋(𝑛 , 𝑛 ≥ 0} over state space 𝒳 with 𝑃 = [𝑃(𝑖, 𝑗)] as the transition probability matrix with 𝑃 𝑖, 𝑗 = 𝑃 𝑋 𝑛 + 1 = 𝑗 𝑋 𝑛 = 𝑖].

– State transition diagram.

– Memoryless Property: 𝑃 𝑋 𝑛 + 1 = 𝑗 𝑋 𝑛 = 𝑖, 𝑋 𝑚 ,𝑚 < 𝑛] = 𝑃 𝑖, 𝑗 ∀ 𝑖, 𝑗.

• 𝜋𝑛+1 i = 𝑗∈𝑋 𝜋𝑛 𝑗 𝑃(𝑗, 𝑖) ⟺ 𝜋𝑛+1 = 𝜋𝑛𝑃 ⟺ 𝜋𝑛 = 𝜋0𝑃𝑛, ∀ 𝑛 ≥ 0.

– A normalized 𝜋 is an invariant distribution ⟺ 𝜋 = 𝜋P.

– Irreducible: MC can reach any state from any other state (possibly in multiple steps).

– Aperiodic: Let 𝑑 𝑖 ≔ 𝑔. 𝑐. 𝑑. 𝑛 ≥ 1 𝑃𝑛 𝑖, 𝑖 > 0}. An irreducible DTMC is aperiodic if 𝑑 𝑖 = 1, ∀ 𝑖. (Fact: In an irreducible DTMC, 𝑑 𝑖 is same ∀ i.)

Discrete Time Markov Chain (DTMC)

Irreducible & Periodic

Irreducible & Aperiodic

Reducible

• Theorem: Consider an irreducible DTMC over a finite state space. Then,

(a) There is a unique invariant distribution 𝜋.

(b) Long-term fraction of time (𝑋 𝑛 = 𝑖) ≔ lim𝑁→∞

1

𝑁 𝑛=0𝑁−11{𝑋 𝑛 = 𝑖} = π(𝑖).

(c) If the DTMC is aperiodic, 𝜋𝑛 → 𝜋.

• Example showing aperiodicity is necessary for (c).

• In part (c), 𝜋𝑛 → 𝜋 irrespective of the initial distribution 𝜋0.

– This implies each row of 𝑃𝑛 converges to 𝜋 as 𝑛 → ∞.

Big Theorem for Finite DTMC

0 1

Illustrations

1. Mean Hitting Time β 𝑖 ≔ 𝐸 𝑇𝐴 𝑋0 = 𝑖], 𝑖 ∈ 𝒳, A ⊂ 𝒳.

– First Step Equations (FSEs) are: 𝛽 𝑖 = 1 + 𝑗 𝑃 𝑖, 𝑗 𝛽 𝑗 , 𝑖𝑓 𝑖 ∉ 𝐴

0, 𝑖𝑓 𝑖 ∈ 𝐴

2. Probability of hitting a state before another α 𝑖 ≔ 𝑃 𝑇𝐴 < 𝑇𝐵 𝑋0 = 𝑖], 𝑖 ∈ 𝒳, 𝐴, 𝐵 ⊂ 𝒳, 𝐴 ∩ 𝐵 = ∅.

– FSEs are: 𝛼 𝑖 =

𝑗 𝑃 𝑖, 𝑗 𝛼 𝑗 , 𝑖𝑓 𝑖 ∉ 𝐴 ∪ 𝐵

1, 𝑖𝑓 𝑖 ∈ 𝐴0, 𝑖𝑓 𝑖 ∈ 𝐵

3. Average discounted reward 𝛿 𝑖 ≔ 𝐸 𝑍 𝑋0 = 𝑖], 𝑖 ∈ 𝒳, where 𝑍 = 𝑛=0

𝑇𝐴 𝛽𝑛ℎ 𝑋 𝑛 ,𝐴 ⊂ 𝒳 with 0 < 𝛽 ≤ 1 as the discount factor.

– FSEs are: 𝛿 𝑖 = ℎ 𝑖 + 𝛽 𝑗 𝑃 𝑖, 𝑗 𝛿 𝑗 , 𝑖𝑓 𝑖 ∉ 𝐴

ℎ 𝑖 , 𝑖𝑓 𝑖 ∈ 𝐴

Hitting Times

• Canonical Sample Space with outcome 𝜔 being the Markov Chain trajectory/realization:

• 𝑃 𝑋0 = 𝑖0, 𝑋1 = 𝑖1, … , 𝑋𝑛 = 𝑖𝑛 = 𝜋 𝑖0 𝑃 𝑖0, 𝑖1 …𝑃(𝑖𝑛−1, 𝑖𝑛)

• There exists a consistent probability measure over the sample space.

Markov Chain Trajectory/Realization

• Let {𝑋 𝑖 , 𝑖 ≥ 1} be a sequence of Independent and Identically Distributed (IID) RVs with mean 𝜇, and let 𝑆(𝑛) = 𝑖=1

𝑛 𝑋(𝑖). Assume 𝐸 𝑋 𝑖 < ∞.

• Strong Law of Large Numbers (SLLN): 𝑃( lim𝑛→∞

𝑆(𝑛)

𝑛= μ) = 1.

–𝑆(𝑛)

𝑛converges to 𝜇 almost surely.

• Weak Law of Large Numbers (WLLN): lim𝑛→∞𝑃(𝑆(𝑛)

𝑛− 𝜇 > 𝜖) = 0, ∀ 𝜖 > 0.

–𝑆(𝑛)

𝑛converges to 𝜇 in probability.

• Almost sure convergence implies convergence in probability.

• SLLN Example: 𝑌 𝑛 =𝑆(𝑛)

𝑛, where 𝑋 𝑖 = outcome of ith roll of a balanced die.

• Part (b) of the Big Theorem indicates almost sure convergence.

– For an irreducible DTMC over a finite numerical state space, 1

𝑛 𝑖=0𝑛−1𝑋(𝑖) converges almost

surely to 𝐸(𝑋) where 𝑋 is an RV with the distribution same as the invariant distribution 𝜋.

Laws of Large Numbers

• Let {𝑋 𝑖 , 𝑖 ≥ 1} be a sequence of Independent and Identically Distributed (IID) RVs with mean 𝜇, and let 𝑆(𝑛) = 𝑖=1

𝑛 𝑋(𝑖). Assume 𝐸 𝑋(𝑖)4 < ∞.

• Strong Law of Large Numbers (SLLN): 𝑃( lim𝑛→∞

𝑆(𝑛)

𝑛= μ) = 1.

SLLN vs. WLLN, SLLN Proof

• Ti = Time to return to i. Ti’s are IID.

Key Points from the Big Theorem Proof

T0 T1 T2 T3 T4 T5

eecs 126: probability & random processes fall 2020

Documents