robotics - pku · 2020. 2. 9. · tracking data association object recognition segmentation filters...

36
Robotics 13 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 13 1

Upload: others

Post on 01-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Robotics

13

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 1

Page 2: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

13 Robotics∗

13.1 Robots

13.2 Computer vision

13.3 Motion planning

13.4 Controller

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 2

Page 3: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Robots

Robots are physical agents that perform tasks by manipulating thephysical world

Wide application: Industry, Agriculture, Transportation, Health, En-vironments, Exploration, Personal Services, Entertainment, Humanaugmentation and so on ⇐ Robotic age ⇒ Intelligent Robots

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 3

Page 4: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Types of robots

• Manipulators: physically anchored to their workplacee.g., factory assembly line, the International Space Station

• Mobile robots: move about their environment– Unmanned ground vehicles (UGVs), e.g., The planetary Rover

(in Mars), intelligent vehicles– Unmanned air vehicles (UAVs), i.e., drone– Autonomous underwater vehicles (AUVs)– Autonomous fight unit

• Mobile manipulator: combined mobility with manipulation– Humanoid robots: mimic the human torso

Other: prosthetic devices (e.g., artificial limbs), intelligent environ-ments (e.g., house equipped with sensors and effectors), multibodysystems (swarms of small cooperating robots)

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 4

Page 5: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Hardware

A diverse set of robot hardware comes from interdisciplinary tech-nologies

– Processors (controllers)– Sensors– Effectors– – Manipulators

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 5

Page 6: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Sensors

Passive sensors or active sensors– Range finders: sonar (land, underwater), laser range finder, radar(aircraft), tactile sensors, GPS

– Imaging sensors: cameras (visual, infrared)– Proprioceptive sensors: shaft decoders (joints, wheels), inertial sen-sors, force sensors, torque sensors

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 6

Page 7: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Manipulators

R

RRP

R R

Configuration of robot specified by 6 numbers⇒ 6 degrees of freedom (DOF)

6 is the minimum number required to position end-effector arbitrarily.For dynamical systems, add velocity for each DOF.

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 7

Page 8: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Non-holonomic robots

θ

(x, y)

A car has more DOF (3) than controls (2), so is non-holonomic;cannot generally transition between two infinitesimally close configu-rations

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 8

Page 9: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Software

Pipeline architecture: execute multiple processes in parallel– sensor interface layer– perception layer– planning and control layer– vehicle interface layer

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 9

Page 10: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Example: A robot car

Touareg interface

Laser mapper

Wireless E-Stop

Top level control

Laser 2 interface

Laser 3 interface

Laser 4 interface

Laser 1 interface

Laser 5 interface

Camera interface

Radar interface Radar mapper

Vision mapper

UKF Pose estimation

Wheel velocity

GPS position

GPS compass

IMU interface Surface assessment

Health monitor

Road finder

Touch screen UI

Throttle/brake control

Steering control

Path planner

laser map

vehicle state (pose, velocity)

velocity limit

map

vision map

vehicle state

obstacle list

trajectory

road center

RDDF database

driving mode

pause/disable command

Power server interface

clocks

emergency stop

power on/off

Linux processes start/stop heart beats

corridor

SENSOR INTERFACE PERCEPTION PLANNING&CONTROL USER INTERFACE

VEHICLE

INTERFACE

RDDF corridor (smoothed and original)

Process controller

GLOBAL

SERVICES

health status

data

Data logger File system

Communication requests

vehicle state (pose, velocity)

Brake/steering

Communication channels

Inter-process communication (IPC) server Time server

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 10

Page 11: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Computer vision

behaviorsscenes

objects

depth map

optical flowdisparityedges regions

features

image sequence

s.f. contour

trackingdata association

objectrecognition

segmentationfilters

edgedetection

matching

s.f.motion

s.f.shading

s.f.stereo

HIGH−LEVEL VISION

LOW−LEVEL VISION

Vision requires combining multiple cues and commonsense knowledge

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 11

Page 12: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Visual recognition

Computer vision– visual recognition– – image classification ⇐ (deep) learning– – – object detection, segmentation, image captioning etc.

⇐ 2D→3D

Deep learning become an important tool for computer visione.g., CNNs, such as PixelCNN

E.g., ImageNet: large scale visual recognition challenge

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 12

Page 13: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Images

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 13

Page 14: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Images

195 209 221 235 249 251 254 255 250 241 247 248

210 236 249 254 255 254 225 226 212 204 236 211

164 172 180 192 241 251 255 255 255 255 235 190

167 164 171 170 179 189 208 244 254 234

162 167 166 169 169 170 176 185 196 232 249 254

153 157 160 162 169 170 168 169 171 176 185 218

126 135 143 147 156 157 160 166 167 171 168 170

103 107 118 125 133 145 151 156 158 159 163 164

095 095 097 101 115 124 132 142 117 122 124 161

093 093 093 093 095 099 105 118 125 135 143 119

093 093 093 093 093 093 095 097 101 109 119 132

095 093 093 093 093 093 093 093 093 093 093 119

255 251

I(x, y, t) is the intensity at (x, y) at time t

CCD camera ≈ 1,000,000 pixels; human eyes ≈ 240,000,000 pixelsi.e., 0.25 terabits/sec

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 14

Page 15: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Image classification

CNNs are the state-of-the-art results

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 15

Page 16: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Perception

Perception: the process mapping sensor measurements into internalrepresentations of the environment

– sensors: noisy– environment: partially observable, unpredictable, dynamic

• HMMs/DBNs can represent the transition and sensor models of apartially observable environment

• DNNs can recognize vision and various objects– the best internal representation is not known– unsupervised learning to learn sensor and motion models from

data

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 16

Page 17: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Perception

Stimulus (percept) S, World W

S = g(W )

E.g., g = “graphics”. Can we do vision as inverse graphics?

W = g−1(S)

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 17

Page 18: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Perception

Stimulus (percept) S, World W

S = g(W )

E.g., g = “graphics”. Can we do vision as inverse graphics?

W = g−1(S)

Problem: massive ambiguity

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 18

Page 19: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Perception

Stimulus (percept) S, World W

S = g(W )

E.g., g = “graphics.” Can we do vision as inverse graphics?

W = g−1(S)

Problem: massive ambiguity

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 19

Page 20: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Perception

Stimulus (percept) S, World W

S = g(W )

E.g., g = “graphics.” Can we do vision as inverse graphics?

W = g−1(S)

Problem: massive ambiguity

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 20

Page 21: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Localization

Compute current location and orientation (pose) given observations(DBN)

Xt+1Xt

At−2 At−1 At

Zt−1

Xt−1

Zt Zt+1

• Treat localization as a regression problem• Can be done in DNNs

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 21

Page 22: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Mapping

Localization: given map and observed landmarks, update pose distri-bution

Mapping: given pose and observed landmarks, update map distribu-tion

SLAM (simultaneous localization and mapping): given observed land-marks, update pose and map distribution

Probabilistic formulation of SLAMadd landmark locations L1, . . . , Lk to the state vector,proceed as for localization

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 22

Page 23: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Mapping

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 23

Page 24: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Detection

Edge detections in image ⇐ discontinuities in scene1) depth2) surface orientation3) reflectance (surface markings)4) illumination (shadows, etc.)

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 24

Page 25: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Object detection

Single object: classification + localizatione.g., single-stage object detectors: YOLO/SSD/RetinaNet

Multiple objects: each image needs a different number of outputs– apply a CNN to many different crops of the image, CNN clas-

sifies each crop as object or backgrounde.g. lots of object detectors

3D object detection: harder than 2De.g., simple camera model

Object detection + Captioning = Dense captioning

Objects + relationships = Scene graphs

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 25

Page 26: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Prior knowledge

Shape from . . . Assumes

motion rigid bodies, continuous motionstereo solid, contiguous, non-repeating bodiestexture uniform textureshading uniform reflectancecontour minimum curvature

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 26

Page 27: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Object recognition

Simple idea– extract 3D shapes from 2D image

Problems– extracting curved surfaces from image– improper segmentation, occlusion– unknown illumination, shadows, noise, complexity, etc.

Approaches– machine learning (deep learning) methods based on image statis-

tics

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 27

Page 28: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Segmentation

Segmentation: is all needed– Semantic segmentation: no objects, just pixelslabel each pixel in the image with a category label; dont differen-

tiate instances, only care about pixelse.g., CNN– Instance segmentation + object detection → multiple object

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 28

Page 29: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Motion Planning

Path planning: find a path from one configuration to another– various path planning algorithms in discrete spaces– path plan vs. task plan

Continuous spaces: plan in configuration space defined by the robot’sDOFs

conf-3

conf-1conf-2

conf-3

conf-2

conf-1

e

ss

e

Solution is a point trajectory in free C-space

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 29

Page 30: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Configuration space planning

Basic problem: ∞d states! Convert to finite state space

Cell decomposition:divide up space into simple cellseach of which can be traversed “easily” (e.g., convex)

become discrete graph search problem

Skeletonization:reduce the free space to a one-dimensional representationidentify finite number of easily connected points/linesthat form a graph s.t. any two points are connectedby a path on the graph

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 30

Page 31: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Cell decomposition example

2DOFs robot arm

workspace coordinates configuration space

start

goal

startgoal

Problem: may be no path in pure freespace (white area) cellsSolution: recursive decomposition of mixed (free+obstacle) cells

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 31

Page 32: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Skeletonization: Voronoi diagram

Voronoi diagram: locus of points equidistant from obstacles

Problem: doesn’t scale well to higher dimensions

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 32

Page 33: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Skeletonization: probabilistic roadmap

A probabilistic roadmap is generated by random points in C-space andkeeping those in freespace; create graph by joining pairs by straightlines

Problem: need to generate enough points to ensure that every start/goalpair is connected through the graph

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 33

Page 34: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Controller

Can view the motor control problem as a search problemin the dynamic rather than kinematic state space:

– state space defined by x1, x2, . . . , x1, x2, . . .– continuous, high-dimensional

Deterministic control: many problems are exactly solvableesp. if linear, low-dimensional, exactly known, observable

Simple regulatory control laws are effective for specified motions

Stochastic optimal control: very few problems exactly solvable⇒ approximate/adaptive methods

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 34

Page 35: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Example

Suppose a controller has “intended” control parameters θ0which are corrupted by noise, giving θ drawn from Pθ0

Output (e.g., distance from target) y = F (θ)y

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 35

Page 36: Robotics - PKU · 2020. 2. 9. · tracking data association object recognition segmentation filters edge detection matching s.f. motion s.f. shading s.f. stereo HIGH−LEVEL VISION

Biological motor control

Motor control systems are characterized by massive redundancy

Infinitely many trajectories achieve any given task

E.g., 3-link arm moving in plane throwing at a targetsimple 12-parameter controller, one degree of freedom at target11-dimensional continuous space of optimal controllers

Idea: if the arm is noisy, only “one” optimal policy minimizes errorat target

I.e., noise-tolerance might explain actual motor behaviour

AI Slides (6e) c© Lin Zuoquan@PKU 1998-2020 13 36