direct use of particle filters for decision making

direct use of particle filtersfor decision making

Ryuichi Ueda, Chiba inst. of Technology

パーティクルフィルタを推定だけでなく行動決定に直接使う試み

千葉工業大学上田隆一日本知能情報ファジィ学会（SOFT）ベンチャー研究会第1回「動きの様相から先を読む」研究会

@名古屋工業大学

metacognition －Flavell 1979

• knowledge or cognition about cognitive phenomena

– evaluation of the extent of its own knowledge

• how to implement to robots

– probability (Bayes') theory

– implementation• methods of probabilistic robotics, methods of machine learning,

artificial neural networks, ...

Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 2

probability expression of knowledge

• state variables: 𝒙 = 𝑥1, 𝑥2, … 𝑥𝑛– 𝑛 = 3 : mobile robot self-localization

– 𝑛 = 10𝑁: SLAM (mapping)

– The actual 𝒙 is unknown.

• 𝑏𝑒𝑙 𝒙 : the belief of the robot about 𝒙

– a probability density function


𝑥

𝑦

heavy-tailed distribution(unconfident)

𝑥

𝑦

peaky distribution(confident)

𝑥

𝑦

peaky but some peaks

particle filters

• a popular method for self-localization

– Monte Carlo localization: particle filter for self-localization

• used for all of the methods in this presentation

• representation of the belief

• updates of the particles

– Sensor information reduces the distribution of the particles.

– Robot motion invokes motion of the particles.Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 4

by courtesy of Ryoma Aoki (Ueda lab.)

particles (candidates of the pose)

the actual pose(unknown)

an example –Tsukuba challenge


decide its action based on the most

reliable particle

• 2km run of autonomous robots

– a standard method for win

• put a LIDAR on their robot

• make a map with a SLAM method beforehand

• probabilistic self-localization with the map

• motion planning with non-probabilistic methods

Hayashibara laboratory's team of Chiba Inst. of Technology in 2017 (completes 2km run.)

severer cases• RoboCup

– small camera

– vibration and collision

– few landmarks

• a micromouse in the maze

– only four range sensors

– perceptual aliasing


Robots must decides their motion based on uncertain 𝑏𝑒𝑙s.

decision with broad beliefs

• Is it possible?

– easy for human beings • "You sense that you do not yet know a certain chapter in your text

well enough to pass tomorrow's exam, so you read it through once more." [Flavell 1979]

• Intelligent robots in the real world must be able to ...

– find an action that is effective even if the belief is broad

– find an action to reduce the uncertainty

• 2 (+ 1) cases are presented from our study.


CASE 1: REAL-TIME QMDPR. Ueda, T. Arai, K. Sakamoto, Y. Jitsukawa, K. Umeda, H. Osumi, T. Kikuchi and M. Komura: "Real-time decision making with state-value function under uncertainty of state estimation – Evaluation with Local Maxima and Discontinuity," IEEE ICRA, 2005.


http://ieeexplore.ieee.org/document/1570646/

a navigation problem with multiple destinations

• the problem:

– There are more than one destinations.

– The robot knows its uncertainty of self-localization (𝑏𝑒𝑙).

– The robot must decide an effective action.


destination 1

destination 2

Which is easy to go?

a goalie task for RoboCup 4 legged robot league

• three kinds of "destinations (sub-task)"a) staying in the goal (ball: invisible)

b) punching the ball (ball: near to the goal)

c) closing a goalpost (ball: at a side of the goal)


ERS-210

x

y

(r,j)

(x,y,q)

goalie

goal

(a)

(b)

(c)

difference of accuracy requirement

• requirement to reach the sub-tasks

– (a) accurate self-localization only relative to the goal

– (b) no need of accurate self-localization

– (c) accurate self-localization


goal

(a)

(b)

(c)

𝑏𝑒𝑙 is broad butthe relative pose toward the goal is accurate.

goal

The robot must choose its sub-task with theconsideration of accuracy requirement in real-time.

real-time QMDP

• QMDP value method: written in [Littman 95]

• composed of offline and online calculation

– offline• calculate the value (cost to go) function

without consideration of uncertain

• state variables: 𝒙 = 𝑥, 𝑦, 𝜃, 𝑥ball, 𝑦ball

– online1. place all particles on the value function

2. choose an action that maximizesthe average value of the particles


state space (5D)

b c

value

a

ballgoal

calculated value function

• 3,000,000 discrete states in 5D state space

• 49[min] calculation with a 3.6[GHz] CPU


a part of the value function(values on 𝑥𝑦-plane with a fixed (𝜃, 𝑥ball, 𝑦ball) )

motion with real-time QMDP

• Motion correspond to the sub-tasks can be seen.

– real-time calculation of1000 particleswith 192MHz CPU

– Detailed evaluation can be seen in [Ueda ICRA2005]


x2

waiting in the goal closing a goal post punching the ball

https://youtu.be/fsQicKXE5AU

https://youtu.be/fsQicKXE5AU

CASE 2: PROBABILISTIC FLOW CONTROLRyuichi Ueda: Generation of Compensation Behavior of Autonomous Robot for Uncertainty of Information with Probabilistic Flow Control, Advanced Robotics, 29(11), pp. 721-734, June, 2015.


http://www.tandfonline.com/doi/abs/10.1080/01691864.2015.1009943

motivation

• When I go to bed at midnight without light, how I behave?

– I search a wall by my hand, and trace the wall.

• symbolical study: "coastal navigation" [Roy 99]

– planning with uncertainty evaluation at offline calculation


A degree of freedom is erased.

goal

possibility of lost

(x,y,q)

wall

H

how to realize the behaviorwith real-time QMDP

• problems:

– no strategy for obtaining information

• no consideration of uncertainty at offline

• no consideration of future observation at online

– deadlocks

• The robot stops when any motion cannot improve the average value.

• not fatal in RoboCup but fatal in navigation

A small modification gives an interesting behavior.


probabilistic flow control (PFC)

• an additional assumption

– The robot can know whether it reached on a goal or not through a sensor.

• modification of calculation

– The average value is weighted by the value.

– Particles near a goalhave a priority.


valu

e

state space

high weighted

low weighted

a navigation problem with one landmark

• state variables: 𝒙 = 𝑥, 𝑦, 𝜃

• information:

– landmark observation

– goal or not

• Particles do not convergemost of time.


destination

landmark

Poses of these particles nevercontradict the sensor data.

application of PFC

• The robot moves as it is dragged by some particles near the goal.

• real-time QMDP

– 73 deadlocks in 100 trials


other applications• Some unpublished modifications are applied to.

– (I must write but ...)


searching behavior of a manipulator with (modified) PFC

a rod(position

unknown)

red color: likelihood of the

rod existence

wandering behavior of a raspberry pi mouse with (modified) PFC

goal

these movies: https://blog.ueda.tech/?page_id=10034

https://blog.ueda.tech/?page_id=10034

CASE 3: PARTICLE FILTER ON EPISODE• Ryuichi Ueda, Kotaro Mizuta, Hiroshi Yamakawa and Hiroyuki Okada:

Particle Filter on Episode for Learning Decision Making Rule, Proc. of The 14th International Conference on Intelligent Autonomous Systems (IAS-14), Shanghai, July, 2016.

• The 35-th Annual Conference of the RSJ (to appear)


motivation

• decision making before/without environmental maps

• Memory goes before, and a map follow after.

– Hippocampus of mammals generate a sequence of memory, and the sequence becomes maps with dropout of time sequence

– information: memory > map

• Robots can store seemingly unlimited memory.

– different from creatures


no need of SLAM for intelligent decision making (?)

particle filter on episode (PFoE)

• procedure

1. record I/O and reward

2. calculate the similarity between the current situation and each past state

3. choose an action that maximizes future reward


time axis

states (sensors)

epis

od

e

rewards

be

lief

s s s s s s s

present time

1 -1

a a a a a a a actions

past current

particles

a simple application• The robot goes from the bottom of the T shape maze

to the goal that is set alternately on one of the arm.

• conditions (very simplified)

– rewards:

• 1: the robot turned to the goal arm

• -1: it turned to the wrong arm

– only four states: start, T-junction, after turn, end of an arm

– only one chance of decision:right or left at T-junction


another version of PFoE

• used for teaching

• presented in the 35-th Annual Conference of the RSJ

– currently secret


these movies on the web : https://blog.ueda.tech/?page_id=10021

https://blog.ueda.tech/?page_id=10021

conclusion of this presentation

• Real-time QMDP can choose appropriate locations of a goalkeeper in accordance with the belief of the robot.

– on a 192MHz CPU, 32MB DRAM

– It was actually used in RoboCup competitions for some years.

• PFC compensates loss of information by motion of robots.

– Robots with PFC show "searching behavior."

• We are trying to build a cognitive/metacognitive model for robots with poor computing resources.