mini-course on algorithmic aspects of stochastic games and related models marcin jurdziński...
TRANSCRIPT
![Page 1: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/1.jpg)
Mini-course on algorithmic aspects of stochastic games and related models
Marcin Jurdziński (University of Warwick)
Peter Bro Miltersen (Aarhus University)
Uri Zwick ( 武熠 ) (Tel Aviv University)
Oct. 31 – Nov. 2, 2011
![Page 2: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/2.jpg)
Day 1Monday, October 31
Uri Zwick ( 武熠 )
(Tel Aviv University)
Perfect Information Stochastic Games
![Page 3: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/3.jpg)
Day 2Tuesday, November 1
Marcin Jurdziński
(University of Warwick)
Parity Games
![Page 4: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/4.jpg)
Day 3Wednesday, November 2
Peter Bro Miltersen
(Aarhus University)Imperfect Information Stochastic Games
![Page 5: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/5.jpg)
Day 1Monday, October 31
Uri Zwick ( 武熠 ) (Tel Aviv University)Perfect Information Stochastic Games
Lectures 1-2From shortest paths problemsto 2-player stochastic games
![Page 6: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/6.jpg)
Warm-up:1-player “games”
Reachability
Shortest / Longest paths
Minimum / Maximum mean payoff
Minimum / Maximum discounted payoff
![Page 7: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/7.jpg)
1-Player reachability “game”
From which states can we reach the target?
![Page 8: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/8.jpg)
1-Player shortest paths “game”
Find shortest paths to targetEdges/actions have costs
![Page 9: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/9.jpg)
1-Player shortest paths “game”with positive and negative weights
Shortest paths not definedif there is a negative cycle
![Page 10: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/10.jpg)
1-Player LONGEST paths “game”with positive and negative weights
LONGEST paths not definedif there is a positive cycle
Exercise 1a: Isn’t the LONGEST paths problem NP-hard?
![Page 11: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/11.jpg)
1-Player maximum mean-payoff “game”
No targetTraverse one edge per dayMaximize per-day profit
Find a cycle withmaximum mean cost
Exercise 1b: Show that finding a cycle with maximum total cost is NP-hard.
![Page 12: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/12.jpg)
1-Player discounted payoff “game”
1 元 gained on the i-th day is worth only i 元 at day 0
Equivalently, each day taxi breaks down with prob. 1
![Page 13: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/13.jpg)
The real fun begins:1½-player games
Maximum / minimum ReachabilityLongest / Shortest stochastic pathsMaximum / minimum mean payoff
Maximum / minimum discounted payoff
Add stochastic actions and getStochastic Shortest Paths andMarkov Decision Processes
![Page 14: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/14.jpg)
Stochastic shortest paths (SSPs)
Minimize the expected costof getting to the target
![Page 15: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/15.jpg)
Stochastic shortest paths (SSPs)
Exercise 2:
Find the optimal policy for this SSP.What is the expected cost of getting to IIIS?
![Page 16: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/16.jpg)
Policies / Strategies
A rule that, given the full history of the play, specifies the next action to be taken
A deterministic (pure) strategy is a strategy that does not use randomization
A memoryless strategy is a strategy that only depends on the current state
A positional strategy is a deterministic memoryless strategy
![Page 17: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/17.jpg)
Exercise 3:
Prove (directly) that if a SSP problem has an optimal policy, it also has a
positional optimal policy
![Page 18: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/18.jpg)
Stochastic shortest paths (SSPs)
![Page 19: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/19.jpg)
Stopping SSP problems
A policy is stopping if by following it wereach the target from each state with probability 1
An SSP problem is stopping if every policy is stopping
Reminiscent of acyclic shortest paths problems
![Page 20: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/20.jpg)
Positional policies/strategies
Theorem: If an SSP problem has an optimal policy, it also has a positional optimal policy
Theorem: A stopping SSP problem has an optimalpolicy, and hence a positional optimal policy
Theorem: A SSP problem has an optimal policyiff there is no “negative cycle”
![Page 21: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/21.jpg)
Evaluating a stopping policy
(stopping) policy (absorbing) Markov Chain
Values of a fixed policy can be found by solving a system of linear equations
![Page 22: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/22.jpg)
Improving switches
![Page 23: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/23.jpg)
Improving a policy
![Page 24: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/24.jpg)
Improving a policy
![Page 25: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/25.jpg)
Policy iteration (Strategy improvement)[Howard ’60]
![Page 26: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/26.jpg)
Potential transformations
![Page 27: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/27.jpg)
Potential transformations
![Page 28: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/28.jpg)
Using values as potentials
![Page 29: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/29.jpg)
Optimality condition
![Page 30: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/30.jpg)
Solving Stochastic shortest pathsand Markov Decision Processes
Can be solved in polynomial time usinga reduction to Linear Programming
Is there a polynomial time version of the policy iteration algorithm ???
Can be solved using the policy iteration algorithm
![Page 31: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/31.jpg)
Dual LP formulationfor stochastic shortest paths
[d’Epenoux (1964)]
![Page 32: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/32.jpg)
Primal LP formulationfor stochastic shortest paths
[d’Epenoux (1964)]
![Page 33: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/33.jpg)
Solving Stochastic shortest pathsand Markov Decision Processes
Can be solved in polynomial time usinga reduction to Linear Programming
Is there a strongly polynomial time version of the policy iteration algorithm ???
Current algorithms for Linear Programmingare polynomial but not strongly polynomial
Is there a strongly polynomial algorithm ???
![Page 34: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/34.jpg)
Markov Decision Processes[Bellman ’57] [Howard ’60] …
No targetProcess goes on forever
One (and a half) player
Limiting average version
Discounted version
![Page 35: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/35.jpg)
![Page 36: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/36.jpg)
Discounted MDPs
![Page 37: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/37.jpg)
Discounted MDPs
![Page 38: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/38.jpg)
Discounted MDPs
![Page 39: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/39.jpg)
Non-discounted MDPs
![Page 40: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/40.jpg)
More fun:2-player games
Reachability
Longest / Shortest paths
Maximum / Minimum mean payoff
Maximum / Minimum discounted payoff
Introduce an adversaryAll actions are deterministic
![Page 41: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/41.jpg)
2-Player mean-payoff game[Ehrenfeucht-Mycielski (1979)]
No targetTraverse one edge per day
Maximize / minimize per-day profit
Various pseudo-polynomial algorithms
Is there a polynomial time algorithm ???
![Page 42: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/42.jpg)
![Page 43: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/43.jpg)
Yet more fun:2½-player games
Maximum / Minimum Reachability
Longest / Shortest stochastic paths
Maximum / Minimum mean payoff
Maximum / Minimum discounted payoff
Introduce an adversaryand stochastic actions
![Page 44: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/44.jpg)
Both players have optimal positional strategies
Can optimal strategies be found in polynomial time?
Limiting average version
Discounted version
Turn-based Stochastic Payoff Games[Shapley ’53] [Gillette ’57] … [Condon ’92]
No sinks Payoffs on actions
![Page 45: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/45.jpg)
Discounted 2½-player games
![Page 46: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/46.jpg)
Optimal Values in 2½-player games
Both players have positional optimal strategies
positional general
positional general
There are strategies that are optimal for every starting position
![Page 47: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/47.jpg)
2½G NP co-NP
Deciding whether the value of a state isat least (at most) v is in NP co-NP
To show that value v ,guess an optimal strategy for MAX
Find an optimal counter-strategy for min by solving the resulting MDP.
Is the problem in P ?
![Page 48: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/48.jpg)
Discounted 2½-player games
Strategy improvementfirst attempt
![Page 49: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/49.jpg)
Discounted 2½-player games
Strategy improvementsecond attempt
![Page 50: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/50.jpg)
Discounted 2½-player games
Strategy improvementcorrect version
![Page 51: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/51.jpg)
Turn-based Stochastic Payoff Games (SPGs)long-term planning in a
stochastic and adversarial environment
Deterministic MDPs (DMDPs) non-stochastic, non-adversarial
Markov Decision Processes (MDPs)
non-adversarialstochastic
Mean Payoff Games (MPGs)
adversarialnon-stochastic
2½-players
2-players 1½-players
1-player
![Page 52: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/52.jpg)
Day 1Monday, October 31
Uri Zwick ( 武熠 ) (Tel Aviv University)Perfect Information Stochastic Games
Lecture 1½
Upper bounds for policy iteration algorithm
![Page 53: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/53.jpg)
Complexity of Strategy Improvement
Greedy strategy improvement for non-discounted 2-player and 1½-player games is exponential !
[Friedmann ’09] [Fearnley ’10]
A randomized strategy improvement algorithm for 2½-player games runs in sub-exponential time
[Kalai (1992)] [Matousek-Sharir-Welzl (1992)]
Greedy strategy improvement for 2½-player gameswith a fixed discount factor is strongly polynomial
[Ye ’10] [Hansen-Miltersen-Z ’11]
![Page 54: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/54.jpg)
The RANDOM FACET algorithm[Kalai (1992)] [Matousek-Sharir-Welzl (1992)]
[Ludwig (1995)]
A randomized strategy improvement algorithm
Initially devised for LP and LP-type problems
Applies to all turn-based games
Sub-exponential complexity
Fastest known for non-discounted 2(½)-player games
Performs only one improving switch at a time
Work with strategies of player 1
Find optimal counter strategies for player 2
![Page 55: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/55.jpg)
The RANDOM FACET algorithm
![Page 56: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/56.jpg)
The RANDOM FACET algorithmAnalysis
![Page 57: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/57.jpg)
The RANDOM FACET algorithmAnalysis
![Page 58: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/58.jpg)
The RANDOM FACET algorithmAnalysis
![Page 59: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/59.jpg)
The RANDOM FACET algorithmAnalysis
![Page 60: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/60.jpg)
Day 1Monday, October 31
Uri Zwick ( 武熠 ) (Tel Aviv University)Perfect Information Stochastic Games
Lecture 3
Lower bounds for policy iteration algorithm
![Page 61: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/61.jpg)
Oliver Friedmann – Univ. of Munich
Thomas Dueholm Hansen – Aarhus Univ.
Uri Zwick ( 武熠 ) – Tel Aviv Univ.
Subexponential lower bounds forrandomized pivoting rules for
the simplex algorithm
单纯形算法中随机主元旋转规则的次指数级下界
![Page 62: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/62.jpg)
Maximize a linear objective function subject to a set of linear equalities and inequalities
Linear Programming
![Page 63: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/63.jpg)
Linear Programming
Simplex algorithm (Dantzig 1947)
Ellipsoid algorithm (Khachiyan 1979)
Interior-point algorithm (Karmakar 1984)
![Page 64: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/64.jpg)
Move up, along an edge to a neighboringvertex, until reaching the top
The Simplex AlgorithmDantzig (1947)
![Page 65: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/65.jpg)
Largest improvement?Largest slope?
…
Pivoting Rules – Where should we go?
![Page 66: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/66.jpg)
Largest improvementLargest slope
Dantzig’s rule – Largest modified costBland’s rule – avoids cycling
Lexicographic rule – also avoids cycling
Deterministic pivoting rules
All known to require an exponential number of steps, in the worst-case
Klee-Minty (1972)Jeroslow (1973), Avis-Chvátal (1978),
Goldfarb-Sit (1979), … , Amenta-Ziegler (1996)
![Page 67: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/67.jpg)
Klee-Minty cubes (1972)
Taken from a paper by Gärtner-Henk-Ziegler
![Page 68: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/68.jpg)
Is there a polynomial pivoting rule?
Is the diameter polynomial?
Hirsch conjecture (1957):The diameter of a d-dimensional, n-faceted polytope is at most n−d
Refuted recently by Santos (2010)!
Diameter is still believed to be polynomial(See, e.g., the polymath3 project)
Quasi-polynomial (nlog d+1) upper bound Kalai-Kleitman (1992)
![Page 69: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/69.jpg)
Random-EdgeChoose a random improving edge
Randomized pivoting rules
Random-Facet is sub-exponential!
Random-FacetIf there is only one improving edge, take it.
Otherwise, choose a random facet containing the currentvertex and recursively find the optimum within that facet.
[Kalai (1992)] [Matoušek-Sharir-Welzl (1996)]
Are Random-Edge and Random-Facet polynomial ???
![Page 70: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/70.jpg)
Abstract objective functions (AOFs)
Every face shouldhave a unique sink
Acyclic Unique Sink Orientations (AUSOs)
![Page 71: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/71.jpg)
AUSOs of n-cubes
The diameter is exactly n
Stickney, Watson (1978)Morris (2001)
Szabó, Welzl (2001)Gärtner (2002)
USOs and AUSOs
Bypassing the diameter issue!
2n facets2n vertices
![Page 72: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/72.jpg)
AUSO resultsRandom-Facet is sub-exponential
[Kalai (1992)] [Matoušek-Sharir-Welzl (1996)]
Sub-exponential lower bound for Random-Facet [Matoušek (1994)]
Sub-exponential lower boundfor Random-Edge [Matoušek-Szabó (2006)]
Lower bounds do not correspondto actual linear programs
Can geometry help?
![Page 73: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/73.jpg)
LP resultsExplicit LPs on which Random-Facet and Random-Edgemake an expected sub-exponential number of iterations
TechniqueConsider LPs that correspond to
Markov Decision Processes (MDPs)
Observe that the simplex algorithm on these LPscorresponds to the Policy Iteration algorithm for MDPs
Obtain sub-exponential lower bounds for theRandom-Facet and Random-Edge variants of thePolicy Iteration algorithm for MDPs, relying on similar lower bounds for Parity Games (PGs)
![Page 74: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/74.jpg)
Dual LP formulation for MDPs
![Page 75: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/75.jpg)
Primal LP formulation for MDPs
![Page 76: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/76.jpg)
Vertices and policies
![Page 77: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/77.jpg)
Lower bounds for Policy Iteration
Switch-All for Parity Games is exponential [Friedmann ’09]
Switch-All for MDPs is exponential [Fearnley ’10]
Random-Facet for Parity Games is sub-exponential [Friedmann-Hansen-Z ’11]
Random-Facet and Random-Edge for MDPs and hence for LPs are sub-exponential [FHZ ’11]
![Page 78: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/78.jpg)
3-bit counter
(−N)15
![Page 79: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/79.jpg)
3-bit counter
0 1 0
![Page 80: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/80.jpg)
Observations
Five “decisions” in each level
Process always reaches the sink
Values are expected lengths of paths to sink
![Page 81: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/81.jpg)
3-bit counter – Improving switches
0 1 0
Random-Edge can choose eitherone of these improving switches…
![Page 82: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/82.jpg)
Cycle gadgets
Cycles close one edge at a time
Shorter cycles close faster
![Page 83: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/83.jpg)
Cycle gadgets
Cycles open “simultaneously”
![Page 84: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/84.jpg)
3-bit counter 23
0 1 01
![Page 85: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/85.jpg)
From b to b+1 in seven phases
Bk-cycle closes
Ck-cycle closes
U-lane realignsAi-cycles and Bi-cycles for i<k open
Ak-cycle closes
W-lane realignsCi-cycles of 0-bits open
![Page 86: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/86.jpg)
3-bit counter 34
0 1 1
![Page 87: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/87.jpg)
Size of cycles
Various cycles and lanes compete with each other
Some are trying to open while some are trying to close
We need to make sure that our candidates win!
Length of all A-cycles = 8n
Length of all C-cycles = 22n
Length of Bi-cycles = 25i2n
O(n4) vertices for an n-bit counter
Can be improved using a more complicated construction and an improved analysis (work in progress)
![Page 88: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/88.jpg)
Lower bound for Random-Facet
Implement a randomized counter
![Page 89: Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdziński (University of Warwick) Peter Bro Miltersen (Aarhus University)](https://reader031.vdocuments.pub/reader031/viewer/2022033108/56649ca75503460f9496a37e/html5/thumbnails/89.jpg)
Many intriguing and important open problems …
AUSOs ? SPGs ?
Strongly Polynomial algorithms for
MPGs ? PGs ?Polynomial algorithms for
MDPs ?
PolynomialPolicy Iteration Algorithms ???