issues on the border of economics and computation נושאים בגבול כלכלה וחישוב

32
Issues on the border of economics and computation הההההה ההההה ההההה ההההההSpeaker: Dr. Michael Schapira Topic: Dynamics in Games (Part II) (Some slides from Prof. Avrim Blum’s course at CMU and Prof. Yishay Mansour’s course at TAU)

Upload: ianthe

Post on 22-Feb-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב. Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part II) (Some slides from Prof. Avrim Blum’s course at CMU and Prof. Yishay Mansour’s course at TAU). Recap: Regret Minimization. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Issues on the border of economics and computation

נושאים בגבול כלכלה וחישובSpeaker: Dr. Michael Schapira

Topic: Dynamics in Games (Part II)(Some slides from Prof. Avrim Blum’s course at CMU and Prof. Yishay Mansour’s course at TAU)

Page 2: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Recap:Regret

Minimization

Page 3: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

• Sunny:

• Rainy:

• No meteorological understanding!– using other web

sites

forecast Web site

CNN

BBC

weather.com

OUR

Goal: Nearly the most accurate forecast

Example 1: Weather Forecast

Page 4: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Example 2: Route Selection

Challenge:Partial Information

Goal: Fastest route

Page 5: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Financial Markets

• Model: Select a portfolio each day.

• Gain: The changes in stock values.

• Performance goal: Compare well with the best “static” policy.

Page 6: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Reminder: Minimizing Regret

• There are n strategies (experts) 1,2, …, n

• Algorithm selects a strategy in {1,…,n}

• At each round t=1,2, …,T

• and then observes the loss li,t[0,1] of each strategy i{1,…,n}

• Let li = Stli,t. Let lmin = minili

• Goal: Do “nearly as well” as lmin in hindsight.• Have no regret!

Page 7: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Randomized Weighted Majority

Initially: wi,1=1 and pi,1=1/n for each strategy i

wi,t+1 := wi,t(1-li,t).

At time t=1,…,TSelect strategy i with probability pi,t

Observe loss vector ltUpdate the weights

pi,t+1 := wi,t+1/Wt+1 where Wt = Siwi,t+1

Page 8: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Formal Guarantee for RWM

If L = expected total loss of alg by time T+1 and OPT = accumulated loss of best strategy by time T+1

L < OPT + 2(T log(n))1/2.

Theorem:

“additive regret” bound

An algorithm is called a “no-regret algorithm” ifL < OPT + f(T) and f(T)/T goes to 0 as T goes to

infinity

Page 9: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Analysis• Let Ft be the expected loss of RWM at time t

Ft=Sipi,tli,t

• Observe that– Wfinal = n(1-F1)(1 - F2)…

– ln(Wfinal) = ln(n) + åt [ln(1 - Ft)] < ln(n) - åt Ft(using ln(1-x) < -x)

= ln(n) - L.

(using å Ft = L = E[total loss of RWM by time T+1])

Page 10: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Analysis (Cont.)• Let i be the best strategy in hindsight

• Observe that wi,final = 1(1-li,1)(1-li,2)…(1-li,n)

ln(wi,final) = ln(1-li,1)+ln(1-li,2)+…+ln(1-li,T) >

-Stli,t-Stli,t2 = li(1+) = -(Stli,t)(1+) =

-li(1+) = -lmin(1+)

(using -z-z2 ≤ ln(1-z) for 0 ≤ z ≤ ½ and li,t in [0,1])

Page 11: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Analysis (Cont.)• wi,final < Wfinal

ln(wi,final) < ln(Wfinal)

-lmin(1+) < ln(n) – L

L < (1+)lmin + (1/)ln(n)

• Set =(log(n) / T)1/2 to get

L < lmin + 2(T * log(n))1/2

Page 12: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Equivalently• There are n strategies (experts) 1,2, …, n

• Algorithm selects a strategy in {1,…,n}• At each round t=1,2, …, T

• and then observes the gain gi,t[0,1] of eachstrategy i{1,…,n}

• Let gi = Stgi,t. Let gmax = maxigi

• Goal: Do “nearly as well” as gmax in hindsight

• Let G be the algorithm’s expected total gain by time T+1. RWM (setting li,t=1-gi,t) guarantees that

G > gmax – 2(T * log(n))1/2

Page 13: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

So, why is this in an algorithmic game theory

course?

Page 14: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Rock-Paper-Scissors

(1,-1) (-1,1) (0, 0)

(-1,1) (0, 0) (1,-1)

(0, 0) (1,-1) (-1,1)

Page 15: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Rock-Paper-Scissors• Say that you are the row player

… and you play multiple times

• How should you play the game?! – what does “playing well” mean?– highly opponent dependent

• One (weak?) option: Do “nearly as well” as best pure strategy in hindsight!

(1,-1) (-1,1) (0, 0)

(-1,1) (0, 0) (1,-1)

(0, 0) (1,-1) (-1,1)

Page 16: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Rock-Paper-Scissors

• Use no-regret algorithm to select strategy ateach time step

• Why does this make sense?

(1,-1) (-1,1) (0, 0)

(-1,1) (0, 0) (1,-1)

(0, 0) (1,-1) (-1,1)

Page 17: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Zero-Sum Games

Page 18: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Reminder: Mixed strategies

• Definition: a “mixed strategy” is a probability distribution over actions.– If {a1,a2,…,am} are the pure strategies of A,

then {p1,…,pm} is a mixed strategy for A if

-1,1 1,-11,-1 -1,1

Tail Heads

Tail

Heads

å

m

iip

1

1

1/2

1/2

1/3

2/3

9/10

1/10

0

1

1/4

1/2

0ip

(1)

(2)

For all i

Page 19: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Reminder: Mixed Nash Eq.

• Main idea: given a fixed behavior of the others, I will not change my strategy.

• Definition: (SA,SB) are in Nash Equilibrium, if each strategy is a best response to the other.

-1,1 1,-11,-1 -1,1

זוג פרטזוגפרט

1/21/2

1/2

1/2

Page 20: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Zero-Sum Games• A zero-sum game is a 2-player strategic

game such that for each s S, we have u1(s) + u2(s) = 0.– What is good for me, is bad for my

opponent and vice versa

• Note: Any game where the sum is a constant c can be transformed into a zero-sum game with the same set of equilibria:– u’1(a) = u1(a) – u’2(a) = u2(a) - c

Page 21: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

2-Player Zero-Sum Games

(1,-1) (-1,1) (0, 0)

(-1,1) (0, 0) (1,-1)

(0, 0) (1,-1) (-1,1)

(-1,1) (1,-1)

(1,-1) (-1,1)

Left

Right

Left Right

Page 22: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

How to Play Zero-Sum Games?

• Assume that only pure strategies are allowed

• Be paranoid: Try to minimize your loss by assuming the worst!

• Player 1 takes minimum over row values:• T: -6, M: -1, B: -6

• then maximizes:• M: -1

L M R

T 8,-8 3,-3 -6,6

M 2,-2 -1,1 3,-3

B -6,6 4,-4 8,-8

Page 23: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Minimax-Optimal Strategies

• A (mixed) strategy s1* is minimax optimal for player 1, if mins2 S2

u1(s1*,s2) ≥ mins2 S2 u1(s1,s2) for all s1 S1

• Similar for player 2

• Can be found via linear programming.

Page 24: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Minimax-Optimal Strategies

• E.g., penalty shot

(-1,1) (1,-1)

(1,-1) (-1,1)

Left

Right

Left Right

Minimax optimal for both players is 50/50.Gives expected gain of 0. Any other is worse.

Page 25: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Minimax-Optimal Strategies

• E.g., penalty shot with goalie who’s weaker on the left.

(0,0) (1,-1)

(1,-1) (-1,1)

Left

Right

Left Right

Minimax optimal for both players is (2/3,1/3).Gives expected gain 1/3. Any other is worse.

Page 26: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Minimax Theorem(von Neumann 1928)

• Every 2-player zero-sum game has a unique value V.

• Minimax optimal strategy for R guarantees R’s expected gain at least V.

• Minimax optimal strategy for C guarantees R’s expected gain at most V.

Page 27: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Proof Via Regret (sketch)

• Suppose for contradiction this is false.

• This means some game G has VC > VR:– If column player commits first, there

exists a row that gets at least VC.– But if row player has to commit first, the

column player can make him get only VR.

• Scale matrix so that payoffsare in [0,1] and say that VR = (1-)Vc.

VC

VR

Page 28: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Proof (Cont.)• Consider the scenario that both players

repeatedly play the game G and each uses a no-regret algorithm.

• Let si,t be the (pure) strategy of player i at time t

• Let qi = (1/T)Stsi,t– qi is called i’s empiricial distribution

• Observe that player 1’s average gain whenplaying a pure strategy s1 against thesequence {s2,t} is exactly u1(s1,q2)

Page 29: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Proof (Cont.)• Player 1 is using a no-regret algorithm

and sohis average gain is at least Vc (as T goes to infinity)

• Similarly, player 2 is using a no-regret algorithm and so player 1’s average gain is at most VR (as T goes to infinity).

• A contradiction!VC

VR

Page 30: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Convergence to Nash

• Suppose that each of the players in a 2-player zero-sum game is using a no-regret algorithm to select strategies.

• Let qi be the empirical distribution of each player i in {1,2}.

• (q1,q2) converges to a Nash equilibrium as T goes to infinity!

Page 31: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Rock-Paper-Scissors

• Use no-regret algorithm1. Adjust to the opponent’s play2. No need to know the entire game in

advance3. Payoff can be more than the game’s

value V4. If both players do this outcome is a

Nash equilibrium

Page 32: Issues on the border of  economics and computation נושאים בגבול כלכלה וחישוב

Summary

• No-regret algorithms help deal with uncertainty in repeated decision making.

• Implications for game theory: When both players use no-regret algorithms in a 2-player zero-sum game convergence to a Nash equilibrium is guaranteed.