direct use of particle filters for decision making

27
direct use of particle filters for decision making Ryuichi Ueda, Chiba inst. of Technology パーティクルフィルタを推定だけでなく行動決定に直接使う試み 千葉工業大学 上田隆一 日本知能情報ファジィ学会(SOFT)ベンチャー研究会 1回「動きの様相から先を読む」研究会 @名古屋工業大学

Upload: -

Post on 21-Jan-2018

512 views

Category:

Technology


0 download

TRANSCRIPT

direct use of particle filtersfor decision making

Ryuichi Ueda, Chiba inst. of Technology

パーティクルフィルタを推定だけでなく行動決定に直接使う試み

千葉工業大学 上田隆一日本知能情報ファジィ学会(SOFT)ベンチャー研究会第1回「動きの様相から先を読む」研究会

@名古屋工業大学

metacognition -Flavell 1979

• knowledge or cognition about cognitive phenomena

– evaluation of the extent of its own knowledge

• how to implement to robots

– probability (Bayes') theory

– implementation• methods of probabilistic robotics, methods of machine learning,

artificial neural networks, ...

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 2

probability expression of knowledge

• state variables: 𝒙 = 𝑥1, 𝑥2, … 𝑥𝑛– 𝑛 = 3 : mobile robot self-localization

– 𝑛 = 10𝑁: SLAM (mapping)

– The actual 𝒙 is unknown.

• 𝑏𝑒𝑙 𝒙 : the belief of the robot about 𝒙

– a probability density function

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 3

𝑥

𝑦

heavy-tailed distribution(unconfident)

𝑥

𝑦

peaky distribution(confident)

𝑥

𝑦

peaky but some peaks

particle filters

• a popular method for self-localization

– Monte Carlo localization: particle filter for self-localization

• used for all of the methods in this presentation

• representation of the belief

• updates of the particles

– Sensor information reduces the distribution of the particles.

– Robot motion invokes motion of the particles.Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 4

by courtesy of Ryoma Aoki (Ueda lab.)

particles (candidates of the pose)

the actual pose(unknown)

an example –Tsukuba challenge

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 5

decide its action based on the most

reliable particle

• 2km run of autonomous robots

– a standard method for win

• put a LIDAR on their robot

• make a map with a SLAM method beforehand

• probabilistic self-localization with the map

• motion planning with non-probabilistic methods

Hayashibara laboratory's team of Chiba Inst. of Technology in 2017 (completes 2km run.)

severer cases• RoboCup

– small camera

– vibration and collision

– few landmarks

• a micromouse in the maze

– only four range sensors

– perceptual aliasing

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 6

Robots must decides their motion based on uncertain 𝑏𝑒𝑙s.

decision with broad beliefs

• Is it possible?

– easy for human beings • "You sense that you do not yet know a certain chapter in your text

well enough to pass tomorrow's exam, so you read it through once more." [Flavell 1979]

• Intelligent robots in the real world must be able to ...

– find an action that is effective even if the belief is broad

– find an action to reduce the uncertainty

• 2 (+ 1) cases are presented from our study.

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 7

CASE 1: REAL-TIME QMDPR. Ueda, T. Arai, K. Sakamoto, Y. Jitsukawa, K. Umeda, H. Osumi, T. Kikuchi and M. Komura: "Real-time decision making with state-value function under uncertainty of state estimation – Evaluation with Local Maxima and Discontinuity," IEEE ICRA, 2005.

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 8

a navigation problem with multiple destinations

• the problem:

– There are more than one destinations.

– The robot knows its uncertainty of self-localization (𝑏𝑒𝑙).

– The robot must decide an effective action.

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 9

destination 1

destination 2

Which is easy to go?

a goalie task for RoboCup 4 legged robot league

• three kinds of "destinations (sub-task)"a) staying in the goal (ball: invisible)

b) punching the ball (ball: near to the goal)

c) closing a goalpost (ball: at a side of the goal)

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 10

ERS-210

x

y

(r,j)

(x,y,q)

goalie

goal

(a)

(b)

(c)

difference of accuracy requirement

• requirement to reach the sub-tasks

– (a) accurate self-localization only relative to the goal

– (b) no need of accurate self-localization

– (c) accurate self-localization

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 11

goal

(a)

(b)

(c)

𝑏𝑒𝑙 is broad butthe relative pose toward the goal is accurate.

goal

The robot must choose its sub-task with theconsideration of accuracy requirement in real-time.

real-time QMDP

• QMDP value method: written in [Littman 95]

• composed of offline and online calculation

– offline• calculate the value (cost to go) function

without consideration of uncertain

• state variables: 𝒙 = 𝑥, 𝑦, 𝜃, 𝑥ball, 𝑦ball

– online1. place all particles on the value function

2. choose an action that maximizesthe average value of the particles

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 12

state space (5D)

b c

value

a

ballgoal

calculated value function

• 3,000,000 discrete states in 5D state space

• 49[min] calculation with a 3.6[GHz] CPU

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 13

a part of the value function(values on 𝑥𝑦-plane with a fixed (𝜃, 𝑥ball, 𝑦ball) )

motion with real-time QMDP

• Motion correspond to the sub-tasks can be seen.

– real-time calculation of1000 particleswith 192MHz CPU

– Detailed evaluation can be seen in [Ueda ICRA2005]

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 14

x2

waiting in the goal closing a goal post punching the ball

https://youtu.be/fsQicKXE5AU

CASE 2: PROBABILISTIC FLOW CONTROLRyuichi Ueda: Generation of Compensation Behavior of Autonomous Robot for Uncertainty of Information with Probabilistic Flow Control, Advanced Robotics, 29(11), pp. 721-734, June, 2015.

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 15

motivation

• When I go to bed at midnight without light, how I behave?

– I search a wall by my hand, and trace the wall.

• symbolical study: "coastal navigation" [Roy 99]

– planning with uncertainty evaluation at offline calculation

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 16

A degree of freedom is erased.

goal

possibility of lost

(x,y,q)

wall

H

how to realize the behaviorwith real-time QMDP

• problems:

– no strategy for obtaining information

• no consideration of uncertainty at offline

• no consideration of future observation at online

– deadlocks

• The robot stops when any motion cannot improve the average value.

• not fatal in RoboCup but fatal in navigation

A small modification gives an interesting behavior.

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 17

probabilistic flow control (PFC)

• an additional assumption

– The robot can know whether it reached on a goal or not through a sensor.

• modification of calculation

– The average value is weighted by the value.

– Particles near a goalhave a priority.

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 18

valu

e

state space

high weighted

low weighted

a navigation problem with one landmark

• state variables: 𝒙 = 𝑥, 𝑦, 𝜃

• information:

– landmark observation

– goal or not

• Particles do not convergemost of time.

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 19

destination

landmark

Poses of these particles nevercontradict the sensor data.

application of PFC

• The robot moves as it is dragged by some particles near the goal.

• real-time QMDP

– 73 deadlocks in 100 trials

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 20

other applications• Some unpublished modifications are applied to.

– (I must write but ...)

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 21

searching behavior of a manipulator with (modified) PFC

a rod(position

unknown)

red color: likelihood of the

rod existence

wandering behavior of a raspberry pi mouse with (modified) PFC

goal

these movies: https://blog.ueda.tech/?page_id=10034

CASE 3: PARTICLE FILTER ON EPISODE• Ryuichi Ueda, Kotaro Mizuta, Hiroshi Yamakawa and Hiroyuki Okada:

Particle Filter on Episode for Learning Decision Making Rule, Proc. of The 14th International Conference on Intelligent Autonomous Systems (IAS-14), Shanghai, July, 2016.

• The 35-th Annual Conference of the RSJ (to appear)

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 22

motivation

• decision making before/without environmental maps

• Memory goes before, and a map follow after.

– Hippocampus of mammals generate a sequence of memory, and the sequence becomes maps with dropout of time sequence

– information: memory > map

• Robots can store seemingly unlimited memory.

– different from creatures

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 23

no need of SLAM for intelligent decision making (?)

particle filter on episode (PFoE)

• procedure

1. record I/O and reward

2. calculate the similarity between the current situation and each past state

3. choose an action that maximizes future reward

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 24

time axis

states (sensors)

epis

od

e

rewards

be

lief

s s s s s s s

present time

1 -1

a a a a a a a actions

past current

particles

a simple application• The robot goes from the bottom of the T shape maze

to the goal that is set alternately on one of the arm.

• conditions (very simplified)

– rewards:

• 1: the robot turned to the goal arm

• -1: it turned to the wrong arm

– only four states: start, T-junction, after turn, end of an arm

– only one chance of decision:right or left at T-junction

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 25

another version of PFoE

• used for teaching

• presented in the 35-th Annual Conference of the RSJ

– currently secret

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 26

these movies on the web : https://blog.ueda.tech/?page_id=10021

conclusion of this presentation

• Real-time QMDP can choose appropriate locations of a goalkeeper in accordance with the belief of the robot.

– on a 192MHz CPU, 32MB DRAM

– It was actually used in RoboCup competitions for some years.

• PFC compensates loss of information by motion of robots.

– Robots with PFC show "searching behavior."

• We are trying to build a cognitive/metacognitive model for robots with poor computing resources.

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 27