reinforcement learning opensource to baselines...korea advanced institute of science technology...

Post on 05-May-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Reinforcement LearningOpensource to Baselines

Member

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

2

● 한국생산기술연구원○ 로보틱스 및 재활로봇 연구

● KAIST 연구원○ IoT 시스템 개발 & 강화학습

연구

● 플랜아이 연구원○ 추천 시스템 & 강화학습 연구

● 한양대학교 석사과정○ 강화학습 시뮬레이션 구축○ 속도 예측 연구

● 호서대학교 로봇 자동화공학과 학부생○ 임베디드, ROS, 제어, 강화학습○ 강화학습을 로봇에 적용하는

연구

Index

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

3

● Reinforcement Learning

● Policy based Reinforcement Learning

● Value based Reinforcement Learning

● Famous open source for Reinforcement Learning

● Why?

● Requirements

● Supported algorithms

● To do List

● How to use

● License

Reinforcement Learning

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

4

Policy base Reinforcement Learning

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

5

Policy base Reinforcement Learning

Probability to select action(a) at state(s)

Policy base Reinforcement Learning

Policy base Reinforcement Learning

State Distribution

Policy base Reinforcement Learning

State Distribution

Policy

Policy base Reinforcement Learning

State Distribution

Policy

Reward

Policy base Reinforcement Learning

Policy base Reinforcement Learning

Policy base Reinforcement Learning

Policy base Reinforcement Learning

Policy base Reinforcement Learning

Policy base Reinforcement Learning

Policy base Reinforcement Learning

Policy base Reinforcement Learning

Policy base Reinforcement Learning

Policy base Reinforcement Learning

● Policy Gradient

● Advantage Actor Critic

● Natural Policy Gradient

● Trust Region Policy Gradient

● Proximal Policy Gradient

Policy Gradient

● Policy Gradient Methods for Reinforcement Learning with Function Approximation

○ Policy can be trained by iterable method

○ Policy can be trained by Policy Gradient

Method

● Policy Gradient

○ Optimal Policy can be obtained by Gradient Ascend

Method

● Iterable Method

○ Optimal Policy Can be obtained by iterable method

like deep learning method

Policy Gradient

● Why advantage is needed?

○ Reduce variance, without changing expectation

○ a

○ laksdfasdf

Advantage Actor Critic

Natural Policy Gradient

● The parameter update can not guarantee

the performance improvement of the actual function

Trust Region Policy Optimization● Trust Region

Trust Region Policy Optimization● Trust Region

Trust Region Policy Optimization● Trust Region

Trust Region Policy Optimization● Trust Region

Trust Region Policy Optimization● Trust Region

Trust Region Policy Optimization● Trust Region

Trust Region Policy Optimization● Trust Region

Trust Region Policy Optimization● Trust Region

Trust Region Policy Optimization● Trust Region

Trust Region Policy Optimization● Trust Region

Trust Region Policy Optimization● Trust Region

Proximal Policy Optimization● Trust Region

Proximal Policy Optimization● Trust Region

Proximal Policy Optimization● Trust Region

Proximal Policy Optimization● Trust Region

Proximal Policy Optimization● Trust Region

Proximal Policy Optimization● Trust Region

Proximal Policy Optimization● Trust Region

Proximal Policy Optimization● Trust Region

Proximal Policy Optimization● Trust Region

Proximal Policy Optimization● Trust Region

Proximal Policy Optimization● Trust Region

Proximal Policy Optimization● Trust Region

Proximal Policy Optimization● Trust Region

Proximal Policy Optimization● Trust Region

Value base Reinforcement Learning

Value base Reinforcement Learning

● Playing Atari with Deep Reinforcement Learning(NIPS 2013)

● Human-level control through deep reinforcement learning(Nature 2015)

● Deep Reinforcement Learning with Double Q-Learning

● Dueling Network Architectures for Deep Reinforcement Learning

● Prioritized Experience Replay

● …..

Value base Reinforcement Learning

Dispatch Time : 5 min

Distance : 2 min

Value base Reinforcement Learning

Dispatch Time : 5 min

Distance : 2 min

Value base Reinforcement Learning

3 min 8 min

Value base Reinforcement Learning

● A Distributional perspective on Reinforcement Learning(2017)

● Distributional Reinforcement Learning with Quantile Regression(2017)

● Implicit Quantile Networks for Distributional Reinforcement Learning(2018)

Value base Reinforcement Learning

Value base Reinforcement Learning

Value base Reinforcement Learning

Distributional Reinforcement Learningwith Quantile Regrssion

● Histogram cannot meet condition

● Wasserstein Distance

Distributional Reinforcement Learningwith Quantile Regrssion

Implicit Quantile Networkfor Distributional Reinforcement Learning

Implicit Quantile Networkfor Distributional Reinforcement Learning

RandomSampling

Value base Reinforcement Learning

Famous open source for Reinforcement Learning

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

64

Famous open source for Reinforcement Learning

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

65

● Deepminds

Famous open source for Reinforcement Learning

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

66

● Deepminds

● Open AI

Famous open source for Reinforcement Learning

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

67

● Deepminds

● Open AI

Famous open source for Reinforcement Learning

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

68

● Deepminds

● Open AI

Famous open source for Reinforcement Learning

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

69

● Deepminds

● Open AI

Famous open source for Reinforcement Learning

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

70

● Deepminds

● Open AI

Famous open source for Reinforcement Learning

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

71

● Deepminds

● Open AI

Famous open source for Reinforcement Learning

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

72

● Deepminds

○ Dopamine

■ Value based Reinforcement Learning opensource baseline

■ https://github.com/google/dopamine

● Open AI

Famous open source for Reinforcement Learning

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

73

● Deepminds

○ Dopamine

■ Value based Reinforcement Learning open source baseline

■ https://github.com/google/dopamine

● Open AI

○ Baselines

■ Policy based Reinforcement Learning open source baseline

■ https://github.com/openai/baselines

Why?

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

74

Why?

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

75

● Hard to Install

Why?

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

76

● Hard to Install

○ baselines : mp4py, mujoco…

○ dopamine : easy

Why?

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

77

● Hard to Install

○ baselines : mp4py, mujoco…

○ dopamine : easy

Why?

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

78

● Hard to Install

○ baselines : mp4py, mujoco…

○ dopamine : easy

● Hard to use to your environment

Why?

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

79

● Hard to Install

○ baselines : mp4py, mujoco…

○ dopamine : easy

● Hard to use to your environment

○ manipulate the api

○ https://github.com/google/dopamine

○ https://github.com/openai/baselines

Why?

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

80

● Hard to Install

○ baselines : mp4py, mujoco…

○ dopamine : easy

● Hard to use to your environment

○ manipulate the api

○ https://github.com/google/dopamine

○ https://github.com/openai/baselines

● Korean

Requirements

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

81

Requirements

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

82

● Hard to Install

Requirements

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

83

● Easy to Install

Requirements

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

84

● Easy to Install

Requirements

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

85

● Easy to Install

● Hard to use to your environment

Requirements

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

86

● Easy to Install

● Easy to use to your environment

Requirements

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

87

● Easy to Install

● Easy to use to your environment

○ Provide many tutorial code(discrete, continuous action)

Requirements

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

88

● Easy to Install

● Easy to use to your environment

○ Provide many tutorial code(discrete, continuous action)

○ Keep the flow of other reinforcement learning code

Requirements

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

89

● Easy to Install

● Easy to use to your environment

○ Provide many tutorial code(discrete, continuous action)

○ Keep the flow of other reinforcement learning code

Requirements

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

90

● Easy to Install

● Easy to use to your environment

● Korean

Supported algorithms

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

91

Supported algorithms

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

92

● Vanilla Policy Gradient

● Advantage Actor Critic

● Proximal Policy Optimization

● Deep Deterministic Policy Gradient

Supported algorithms

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

93

● Vanilla Policy Gradient(Discrete Action)

● Advantage Actor Critic(Discrete Action)

● Proximal Policy Optimization(Discrete, Continuous Action)

● Deep Deterministic Policy Gradient(Discrete, Continuous Action)

To do List

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

94

● Vanilla Policy Gradient(Discrete Action)

● Advantage Actor Critic(Discrete Action)

● Proximal Policy Optimization(Discrete, Continuous Action)

● Deep Deterministic Policy Gradient(Discrete, Continuous Action)

To do List

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

95

● Vanilla Policy Gradient(Discrete Action)

● Advantage Actor Critic(Discrete Action)

● Proximal Policy Optimization(Discrete, Continuous Action)

● Deep Deterministic Policy Gradient(Discrete, Continuous Action)

● Applicate LSTM to Vanilla Policy Gradient, Advantage Actor Critic, Proximal Policy Optimization

● Actor Critic Experience Replay

● Soft actor critic

● Value based Reinforcement Learning

○ DQN, Double DQN, Deuling DQN to Rainbow

○ Distributional RL

How to use

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

96

How to use

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

97

License

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

98

License

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

99

Free

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

100

We are hiring

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

101

Support us in any formhttps://github.com/RLOpensource/tensorflow_RL/blob/master/model.py

top related