หัดเขียน a.i. แบบ alphago กันชิวๆ
TRANSCRIPT
หัดเขียน A.I. แบบ AlphaGo กันชิวๆKan Ouivirach
http://www.tgdaily.com/web/134986-machine-learning-ai-and-digital-intelligences-effect-on-business
http://www.bigdata-madesimple.com/
Machine Learning
“Given example pairs, induce such that for given pairs and generalizes well for unseen ”
–Peter Norvig (2014)
(x, y) fy = f(x)
x
“ไขความลับ อัลฟ่าโกะ การเรียนรู้แบบเชิงลึก และอนาคตของปัญญาประดิษฐ์”
ดร. สรรพฤทธิ์ มฤคทัต22 มี.ค. 2559
Google DeepMind: Ground-breaking AlphaGo masters the game of Go
https://www.youtube.com/watch?v=SUbqykXVx0A
Minimax Decisions
Reinforcement Learning
and
http://www.123rf.com/photo_8104943_hand-drawn-tic-tac-toe-game-format.html
argmaxa 2 ACTIONS(s)MIN-VALUE(RESULT(state, a))
MINIMAX-DECISION(state)function returns an action
return
MAX-VALUE(state)
TERMINAL-TEST(state) UTILITY(state)v �1
ACTIONS(state)
v MAX(v,MIN-VALUE(RESULT(s, a)))
v
function
return
returns
then return
a utility value
afor each
in do
if
TERMINAL-TEST(state) UTILITY(state)
ACTIONS(state)
v
function
return
returns
then return
a utility value
afor each
in do
if
MIN-VALUE(state)
v 1
v MIN(v,MAX-VALUE(RESULT(s, a)))
https://github.com/zkan/tictactoe/blob/master/tictactoe_minimax.py
http://researchers.lille.inria.fr/~munos/
Reinforcement Learning
Policy
• The learning agent's way of behaving at a given time
• A mapping from perceived states of the environment to actions to be taken when in those states
• A simple lookup table
Google DeepMind's Deep Q-learning playing Atari Breakout
https://www.youtube.com/watch?v=V1eYniJ0Rnk
Flappy Bird Hack using Reinforcement Learning
http://sarvagyavaish.github.io/FlappyBirdRL/
http://quotes.lifehack.org/quote/albert-einstein/learning-is-experience-everything-else-is-just/
Exploration & Exploitation
http://mariashriver.com/blog/2012/10/balancing-happiness-and-heartache-in-alzheimers-kerry-lonergan-luksic/
Q-Learning
Q(s, a) Q(s, a) + ↵(R(s) + � maxa0Q(s0, a0)�Q(s, a))
↵ learning rate� discount factor
Q table of action values indexed by state and action
s
a
previous stateprevious action
a0 s0
R(s) reward in state s
action taken in state
1 2
3 4
5 Goal!
0
0
000
0
0
0
00
R =
0
BBBBBB@
1 2 3 4 5
1 �1 �1 0 �1 �1
2 0 �1 �1 0 100
3 0 �1 �1 0 �1
4 �1 0 0 �1 �1
5 �1 0 �1 �1 100
1
CCCCCCA
Define a reward function.
Q(s, a) Q(s, a) + ↵(R(s) + � maxa0Q(s0, a0)�Q(s, a))
Suppose:
� = 0.8↵ = 1
1 2
3 4
5 Goal!
00
000
0
0
0
00
Q =
0
BBBBBB@
1 2 3 4 5
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
5 0 0 0 0 0
1
CCCCCCA
Let’s start here.
R =
0
BBBBBB@
1 2 3 4 5
1 �1 �1 0 �1 �1
2 0 �1 �1 0 100
3 0 �1 �1 0 �1
4 �1 0 0 �1 �1
5 �1 0 �1 �1 100
1
CCCCCCA
1 2
3 4
5 Goal!
1000
000
0
0
0
00
Q(2, 5) = R(2, 5) + 0.8⇥max(Q(5, 2))�Q(2, 5)= 100 + 0.8⇥ 0� 0
= 100
Q =
0
BBBBBB@
1 2 3 4 5
1 0 0 0 0 0
2 0 0 0 0 100
3 0 0 0 0 0
4 0 0 0 0 0
5 0 0 0 0 0
1
CCCCCCA
When choose to go to 5..
1 2
3 4
5 Goal!
1000
0800
0
0
0
00
Q(4, 2) = R(4, 2) + 0.8⇥max(Q(2, 1), Q(2, 4), Q(2, 5))�Q(4, 2)= 0 + 0.8⇥ 100� 0
= 80
Q =
0
BBBBBB@
1 2 3 4 5
1 0 0 0 0 0
2 0 0 0 0 100
3 0 0 0 0 0
4 0 80 0 0 0
5 0 0 0 0 0
1
CCCCCCA
When choose to go to 2..
1 2
3 4
5 Goal!
1000
0800
64
0
0
00
Q(3, 4) = R(3, 4) + 0.8⇥max(Q(4, 2), Q(4, 3))�Q(3, 4)= 0 + 0.8⇥ 80� 0
= 64
Q =
0
BBBBBB@
1 2 3 4 5
1 0 0 0 0 0
2 0 0 0 0 100
3 0 0 0 64 0
4 0 80 0 0 0
5 0 0 0 0 0
1
CCCCCCA
When choose to go to 4..
http://www.123rf.com/photo_8104943_hand-drawn-tic-tac-toe-game-format.html
State Space
• Board
Actions
• Move
Rewards
• +1 if A.I. wins
• -1 if A.I. loses
• 0.5 if the game’s a draw
Back to our Tic Tac Toe!
https://github.com/zkan/tictactoe/blob/master/tictactoe_rl.py