深層強化学習について(チームラボ勉強会 2016/04/21)
TRANSCRIPT
Google DeepMindPlaying Atari with Deep Reinforcement Learninghttps://www.youtube.com/watch?v=V1eYniJ0Rnk
Preferred NetworksRobot Control with Distributed Deep Reinforcement Learninghttps://www.youtube.com/watch?v=-YMfJLFynmA
Google DeepMindOne-shot grasping often leads to failed grasp attemptshttps://www.youtube.com/watch?v=Q9tDHuidzak
(Reinforcement Learning)(Deep Reinforcement Learning)Deep Q LearningDQN (Deep Q Network)DeepMind
t
st at rt st+1
rt st at -10-10-10-10-10-10+100+2+2+2+2+2+2: https://youtu.be/XsFhA8kyfkI
Q (Q)(Q)
Q00000000
1. 0-10-10-10-10-10-10+100+2+2+2+2+2+2
Q00000000
2.
[][] -10 [][]0[][][0]0 + (-10) = -10 0 (0 + (-10)/2 = -5 -10-10-10-10-10-10+100+2+2+2+2+2+2-5
Q-50000000
2.
[][] +2 [][]0[][][0]0 + (+2) = +2: 0+1-10-10-10-10-10-10+100+2+2+2+2+2+2+1
Q-500+10000
2.
[][] +2 [][][+1][][][+1]1 + (+2) = +3: +1+2-10-10-10-10-10-10+100+2+2+2+2+2+2+2
Q-500+20000
2.
[][] -10 [][]0[][][0]0 + (-10) = -10: 0-5-10-10-10-10-10-10+100+2+2+2+2+2+2-5
Q-50-5+20000
2.
[][] -10 [][]0[][][0]0 + (-10) = -10: 0-5-10-10-10-10-10-10+100+2+2+2+2+2+2-5
Q-50-5+2-5000
2.
[][] +100 [][]0[][][0]0 + (+100) = +100: 0+50-10-10-10-10-10-10+100+2+2+2+2+2+2+50
Q-50-5+2-50+500
2.
[][] +100 [][][+50][][][+50]+50 + (+100) = +150: +50+100-10-10-10-10-10-10+100+2+2+2+2+2+2+100
Q-50-10+4-100+1000
2.
[][] +2 [][][+100][][][0]+100 + (+2) = +102: 0+51-10-10-10-10-10-10+100+2+2+2+2+2+2+51
Q-50-10+4-100+100+51
2.
[][] +2 [][][0][][][+51]+0 + (+2) = +2: +51+26-10-10-10-10-10-10+100+2+2+2+2+2+2+26
Q-50-10+4-100+100+51
2.
[][] -10 [][][+100][][][-10]+100 + (-10) = +90: -10+40-10-10-10-10-10-10+100+2+2+2+2+2+2+40
Q-50-10+4+400+100+51
-10-10-10-10-10-10+100+2+2+2+2+2+2
QQ0.50