using crowdsourcing, automated methods and google street view to collect sidewalk accessibility data

PowerPoint Presentation

makeability lab

| Project Sidewalk (PI: Jon E. Froehlich)

My name is Kotaro Hara. Today, I will talk about how we can use automated methods and crowdsourcing to collect accessibility information about cities1

A

B

C

DA

B

C

Human-Computer Interaction Lab

Characterizing Sidewalk Accessibility at Scaleusing Google Street View, Crowdsourcing, and Automated MethodsKotaro Hara | Project Sidewalk (PI: Prof. Jon Froehlich)

makeability lab

My name is Kotaro Hara. Today, I will talk about how we can use automated methods and crowdsourcing to collect accessibility information about cities7

I want to start with a story

I want to tell you a story8

YouYour Friend

Imagine that you and a friend are on a walk. Youre both somewhat unfamiliar with the area.

Suddenly, in the middle of the sidewalk, you encounter a fire hydrant

-- Image Referencehttp://www.iconsdb.com/black-icons/fire-hydrant-icon.html9

In this case, you manage to go around because there is a driveway, but they are temporarily forced onto the street which is dangerous.10

No curb ramp!

Now, you get to the end of the block and discover that there is no curb cut. You are forced to turn around and find another way.

The problem is not only the sidewalks remain inaccessible, but there are currently few mechanisms to find out about the accessibility of a route in advance

11

No curb ramp!



-- Quote from paperThe problem is not just that sidewalk accessibility fundamentally affects where and how people travel in cities but also that there are few, if any, mechanisms to determine accessible areas of a city a priori

-- What Jon wroteThe problem is not just that there are inaccessible areas of cities but that there are currently few methods for us to determine them a priori

12

No curb ramp!The problem is not just that there are inaccessible areas of cities, but also that there are currently few methods for us to determine them a priori



-- Quote from paperThe problem is not just that sidewalk accessibility fundamentally affects where and how people travel in cities but also that there are few, if any, mechanisms to determine accessible areas of a city a priori

-- What Jon wroteThe problem is not just that there are inaccessible areas of cities but that there are currently few methods for us to determine them a priori

13

30.6million U.S. adults with mobility impairment

According to the most recent US Census (2010), roughly 30.6 million adults have physical disabilities that affect their ambulatory activities [128].

-----Flickr: 3627562740_c74f7bfb82_o.jpg14

15.2million use an assistive aid

Of these, nearly half report using an assistive aid such as a wheelchair (3.6 million) or a cane, crutches, or walker (11.6 million)

----

Flickr: 14816521847_5c3c7af348_o.jpg15

Incomplete SidewalksPhysical ObstaclesSurface ProblemsNo Curb RampsStairs/Businesses

Despite comprehensive civil rights legislation for Americans with disabilities (e.g., [9,75]), many city streets, sidewalks, and businesses in the US remain inaccessible [90,96,120]. 16

The lack of street-level accessibility information can have a significant impact on the independence and mobility of citizenscf. Nuernberger, 2008; Thapar et al., 2004

The lack of street-level accessibility information can have a significant negative impact on the independence and mobility of citizens [99,120].

99: Nuernberger, A. (2008). Presenting accessibility to mobility-impaired travelers. (Doctoral dissertation,University of California, Berkeley).120: Thapar, N., Warner, G., Drainoni, M., Williams, S., Ditchfield, H., Wierbicky, J., & Nesathurai, S.(2004). A pilot of functional access to public buildings and facilities for persons withimpairments. Disability and Rehabilitation, 26(5), 280-9.17

Accessibility-aware Navigation

So we would like to develop technologies such as an accessibility aware navigation system. It shows an accessible path instead of a shortest path based on your mobility level.18

Visualizing Accessibility of a City

We also want to build an application that allows you to visualize the accessibility of a city. You can quickly compare which area of a city is more accessible. We need geo-data to make these.19

Our goal is to collect and deliver data for the accessibility of every city in the world

To do this, we need a lot of data about accessibility. Our groups goal is to collect and deliver street-level accessibility data for every city in the world.

-- Imagehttp://www.flickr.com/photos/rgb12/6225459696/lightbox/

20

Street audit is conducted by government and/or community organization.Time-consuming and expensive

This sometimes include sidewalk walkability assessmentPhysical Street Audits

Physical Street Audits

Traditionally, information about a neighborhood have been gathered by volunteers or government organizations through physical audit.21

Time-consuming and expensive

However, this is time-consuming and expensive.

22

Mobile CrowdsourcingSeeClickFix.com

Mobile crowdsourcing such as SeeClickFix.com23

These mobile tools require people to be on-site

Mobile CrowdsourcingSeeClickFix.com

Mobile crowdsourcing such as SeeClickFix.com24

Mobile CrowdsourcingNYC 311

And NYC 311 allows citizens to report neighborhood sidewalk accessibility issues.25

Mobile CrowdsourcingNYC 311These mobile tools require people to be on-site

But this requires people to be on-site26

Use Google Street View (GSV) as a massive data source for scalably finding and characterizing street-level accessibility

Our approach is different though complementary. Use Google Street View as a massive data source27

Automation

Crowdsourcing

How can we efficiently collect accurate accessibility data with

Today, I am going to talk about how we can use crowdsourcing and automated methods to collect accessibility data Google Street View.28

29

Amazon Mechanical Turk is an online labor market where you can hire workers to complete small tasks

Amazon Mechanical Turk is an online labor market where you can hire workers to complete small tasks. 30

For example, if you are a worker, you can go to Amazons website to browse through available tasks31

Task: Find the company name from an email domain$0.02 per task

Task interface

Choose one of the tasks. For example, this task is about finding the company name from an email domain. You can get 2 cents for completing a task through this web interface.32

Crowdsourcing

We recruit crowd worker from Amazon Mechanical Turk. For those of you who dont know Mechanical Turk, it is an online labor market where you can work or recruit workers to perform small tasks over the Internet.33

Timer: 00:07:00 of 3 hoursUniversity of Maryland: Help make our sidewalks more accessible for wheelchair users with Google MapsKotaro Hara

103 hoursCrowdsourcing Data Collection Hara K., Le V., and Froehlich J.E [ASSETS2012, CHI2013]Crowdsourcing | Image Labeling

Using this platform, we recruit workers to work on our task. We developed this interface where you can see Google Street View imagey, and label, in this case, an obstacle in path.34

Manual labeling is accurate, but labor intensive

We showed that this is an effective method, but it is labor intensive.35


We showed that this is an effective method, but it is labor intensive.36

Computer Vision

To more efficiently find accessibility attributes, we turned to computer vision, which is used for applications like face detection.37

Computer vision automatically finds curb rampsAutomatic Curb Ramp Detection

Different attributes affect sidewalk accessibility for people with mobility impairment. For example, presence of curb ramps, surface conditions, obstacles, steep gradients, and more.38

Automatic Curb Ramp Detection

Curb Ramp Labels Detected with Computer Vision

And removed even more errors39


Curb Ramp Labels Detected with Computer Vision


Some curb ramps never get detected

False detectionsAutomatic Curb Ramp Detection

Computer vision is not perfect. And there are false positives, which can be fixed by verification. It misses curb ramps, and humans need to label these.41

2x

Manual Label Verification

Here you see detected curb ramps as green boxes on top of the Street View image (to the next slide to play).42

Computer vision + verification is cheaper but less accurate compared to manual labeling

Automatic Task Allocation

Research QuestionHow can we combine manual labeling and computer vision to achieve high accuracy and low cost?

The question is, can we achieve same or better accuracy as a system with a lower time cost compared to manual labeling.

5 min44

TohmeRemote Eye

To do this, we developed a system called Tohme. It combines the two approach.45

Computer vision + verification is cheaper but less accurate


Design Principles

Computer vision + verification is cheaper but less accurate

(not true for easy tasks)Manual labeling is accurate, but labor intensive

Design Principles

Dataset

svDetectAutomatic Curb Ramp DetectionsvCrawlWeb Scraper

TohmeRemote Eye

This is the overview of the system. A custom web scraper that collects dataset including Street View images. A computer vision based detector finds curb ramps.48

svCrawlWeb ScraperDatasetsvDetectAutomatic Curb Ramp Detection

svControlAutomatic Task Allocation

TohmeRemote Eye

So we designed a smart task allocator. 49

svCrawlWeb ScraperDatasetsvDetectAutomatic Curb Ramp Detection


svVerifyManual LabelVerification

TohmeRemote Eye

It routes detection results to a cheap manual verification workflow to remove false positive errors. However, since our verification task disallow workers to fix the false negatives, curb ramps that are missed never get detected.50

svCrawlWeb ScraperDatasetsvDetectAutomatic Curb Ramp DetectionsvControlAutomatic Task Allocation

svVerifyManual LabelVerification

svLabelManual Labeling

TohmeRemote Eye

So if the allocator predicts false negative, it then passes tasks to manual labeling workflow.51

TohmeRemote Eye

.

We get a Street View image.52

TohmeRemote Eye

We run a detector53

TohmeRemote Eye

Complexity:Cardinality:Depth:CV:0.140.330.21 0.22

Then extract features.54

TohmeRemote Eye


No False NegativePredict computer vision performance

Our task allocator predicts presence of false negatives. If it predicts no false negative, then it allocates a task to a verification workflow.55

TohmeRemote Eye


No False NegativeThe easy task is passed to the cheaper verification workflow.

Our task allocator predicts presence of false negatives. If it predicts no false negative, then it allocates a task to a verification workflow.56

TohmeRemote Eye

.

Another example.57

TohmeRemote Eye

Run a detector58

TohmeRemote Eye


Extract features.59

TohmeRemote Eye


False Negative

If the allocator predicts false negative, then it passes the task to the labeling workflow.60

TohmeRemote Eye


False NegativeThe difficult task is passed to the more accurate labeling workflow.

If the allocator predicts false negative, then it passes the task to the labeling workflow.61

svCrawlWeb ScraperDatasetsvDetectAutomatic Curb Ramp DetectionsvControlAutomatic Task AllocationsvVerifyManual LabelVerificationsvLabelManual Labeling

TohmeRemote Eye

Lets first talk about our web scraper62


TohmeRemote Eye

Lets first talk about our web scraper63

Google Street View Panoramas and Metadata3D Point-cloud DataTop-down Google Maps ImageryScraper

We scraped GSV panoramas and metadata from the intersections. We also scraped their accompanying 3-d point cloud data. As well as top-down Google Maps imagery. These datasets are used to train automatic task allocator.

_AUz5cV_ofocoDbesxY3Kw-dlUzxwCI_-k5RbGw6IlEg0C6PG3Zpuwz11kZKfG_vUgD-2VNbhqOqYAKTU0hFneIw

64

SaskatoonLos AngelesBaltimoreWashington D.C.

Washington D.C.

Baltimore

Los Angeles

Saskatoon

Because sidewalk infrastructure can vary in design and appearance across cities and countries, we included 4 regions including Washington DC, Baltimore, Los Angeles, and Saskatoon.65

D.C. | Downtown

D.C. | ResidentialScraper | Areas of Study

We also looked at different types of city areas. 66

Washington D.C.Dense urban areaSemi-urban residential areasScraper

Blue regions represent dense urban areas, and red regions represent residential area.67

Washington D.C.

Baltimore

Los Angeles

SaskatoonTotal Area:11.3 km2Intersections:1,086Curb Ramps:2,877Missing Curb Ramps:647Avg. GSV Data Age:2.2 yr** At the time of downloading data in summer 2013Scraper

In all, we had 11.3 square kilometers. There were 1,086 intersections. We found 2,877 curb ramps and 647 missing curb ramps based on the ground truth data. Average Street View image age was 2.2 years old.

68

How well does GSV data reflect the current state of the physical world?

(pause) But how well does Street View data reflect the current state of curb ramp infrastructure.69

Google Street ViewGoogle Street View

To answer this question, we compared Street View intersections with physical intersections70

Physical IntersectionPhysical Intersection

Google Street ViewGoogle Street ViewVs.Vs.

To answer this question, we compared Street View intersections with physical intersections71

Washington D.C.

BaltimorePhysical Audit AreasGSV and Physical World> 97.7% agreement 273 IntersectionsDataset | Validating Dataset

Small disagreement due to construction.

First, we physically visited intersections and took multiple pictures. The areas included four subset regions, and it consisted of 273 intersections.We then counted the numbers of curb ramps and missing curb ramps in both dataset, and evaluate their concordance.As a result, we observed over 97% agreement between Google Street View and the real world. A small disagreement due to construction.72


TohmeRemote Eye

Moving on to our dataset73


TohmeRemote Eye



TohmeRemote Eye


Dataset

To train and evaluate our computer vision program, 2 members of our research team manually labeled curb ramps in Street View images. In total, we collected 2,877 curb ramp labels.

76

Ground Truth Curb Ramp Dataset2 researchers labeled curb ramps in our dataset2,877 curb ramp labels (M=2.6 per intersection)Dataset

To train and evaluate our computer vision program, 2 members of our research team manually labeled curb ramps in Street View images. In total, we collected 2,877 curb ramp labels.77


TohmeRemote Eye

Our computer vision component has three parts.78


TohmeRemote Eye

Our computer vision component has three parts.79

Deformable Part ModelsFelzenszwalb et al. 2008Automatic Curb Ramp Detection

http://www.cs.berkeley.edu/~rbg/latent/

We experimented with various object detection. We chose to build it on top of a framework called DPM, one of the most successful approaches in object detection.80

Deformable Part ModelsFelzenszwalb et al. 2008Automatic Curb Ramp Detection

http://www.cs.berkeley.edu/~rbg/latent/

Root filter

Parts filter

Displacement cost

DPM models a target object and its parts with histogram of gradient features. It also models the spatial relationship between the parts.81


Multiple redundant detection boxesDetected LabelsStage 1: Deformable Part ModelCorrect1False Positive12Miss0

DPM sweeps through an entire image, and detects areas that look like a curb ramp. Detections are shown in red boxes. Numbers of correct detections and errors are shown in this table. There are some redundant labels such as overlapping boxes.

h7ZW0_VasRt3vhevz1mjeg82


Curb ramps shouldnt be in the sky or on roofsCorrect1False Positive12Miss0

Detected LabelsStage 1: Deformable Part Model

And there shouldnt be curb ramps in the sky.



Detected LabelsStage 2: Post-processing

We use non-maxima suppression to remove overlapping labels, and 3D point cloud data to remove curb ramps that are not on ground level. Note, that this 3D data is coarse we cannot identify detailed structure of curb ramps.



Detected LabelsStage 3: SVM-based RefinementFilter out labels based on their size, color, and position.Correct1False Positive5Miss0

We get a cleaner result, but we still have some errors. We try to remove them by utilizing other information such as size of a bounding box and RGB information.



Correct1False Positive3Miss0

Detected LabelsStage 3: SVM-based Refinement

This is the final result with computer vision alone.


Google Street View Panoramic ImageCurb Ramp Labels Detected by Computer VisionAutomatic Curb Ramp Detection

I will talk about how we can combine crowdsourcing and automated methods to collect curb ramp data from Google Street View efficiently.

Today, how algorithmic work management plays a role in this process.87

Good example!

Bad Example :(

Used two-fold cross validation to evaluate CV sub-system



Computer Vision Sub-System ResultsPrecisionHigher, less false positivesRecallHigher, less false negatives


Computer Vision Sub-System Results

Goal: maximize area under curve


Computer Vision Sub-System ResultsMore than 20% ofcurb ramps were missed

Our curve is less ideal93


Computer Vision Sub-System ResultsConfidence threshold of -0.99, which results in 26% precision and 67% recall

For our system, we set the confidence threshold to emphasize higher recall than higher precision because false positives are easier to correct94

Occlusion

Illumination

Scale

Viewpoint Variation

Structures Similar to Curb Ramps

Curb Ramp Design VariationAutomatic Curb Ramp Detection

Curb Ramp Detection is a Hard Problem

We observed various image properties that could cause computer vision to make errors. Including occlusion, illumination, scale, view point variation, structures similar to curb ramps, and variation in design of curb ramps.95


TohmeRemote Eye

Thats what we do with the task allocator.96


TohmeRemote Eye

Thats what we do with the task allocator.97

Automatic Task Allocation | Features to Assess Scene Difficulty for CV

A number of streets connected in an intersection

Depth information for a road width and variance in distance

Top-down images to assess complexity of an intersection

A number of detections and confidence values

We used following features.To assess complexity of intersections, we used street cardinality in the meta data.98


A number of street from metadata

Depth information to assess a road width and variance in distance



Depth data99



It allows us to estimate a size of a street, which is useful because further the curb ramp, harder to detect.100


A number of streets from metadata




We also assessed the complexity of each intersection with top-down imagery.101

Google MapsStyled MapsTop-down images to assess complexity of an intersectionAutomatic Task Allocation | Features to Assess Scene Difficulty for CV

Because looks of curb ramps vary more in irregular intersections, computer vision tend to fail finding curb ramps. For example, the intersection on the right is arguably more complex than the one on the left.102


A number of streets from metadata



CV Output: A number of detections and confidence values

We also used the number of detection boxes, their positions, and confidence to see how confused the computer vision program was.103


TohmeRemote Eye

104


TohmeRemote Eye

105

3x

Manual Labeling | Labeling Interface

Our manual labeling tool allows people to control a viewing angle. You select the curb ramp button at the top, and label the target. We collect outline labels of curb ramps to collect rich data to train computer vision.106


TohmeRemote Eye

Lets talk about the verification task107


TohmeRemote Eye

Lets talk about the verification task

108

2x

Manual Label Verification

Here you see detected curb ramps as green boxes on top of the Street View image (to the next slide to play).109

Automatic Task Allocation

Can we combine manual labeling and computer vision to achieve high accuracy and low cost?

The question is, can we achieve same or better accuracy as a system with a lower time cost compared to manual labeling.110

Study Method: ConditionsManual labeling without smart task allocation

&vs.CV + Verification without smart task allocationTohmeRemote Eyevs.Evaluation

We compare the performance of manual labeling without smart task allocation, computer vision plus verification without smart task allocation, and finally Tohme.111

AccuracyTask Completion TimeEvaluation

Study Method: Measures

We measured accuracy and average task completion time of each workflow. 112

Recruited workers from Mturk

Used 1,046 GSV images (40 used for golden insertion)Evaluation

Study Method: Approach

113

ResultsLabeling TasksVerification Tasks# of distinct turkers:2421611,270582# of HITs completed:# of tasks completed:6,3504,820

# of tasks allocated:769277

Evaluation

We used Monte Carlo simulations for evaluation

Turkers completed over 6,300 labeling tasks and 4,800 verification tasks and we used monte carlo simulations for evaluation114

Accuracy measuresTask completion time per sceneManual LabelingCV and ManualVerification

&

TohmeRemote EyeManual LabelingCV and ManualVerification

&

TohmeRemote EyeEvaluation | Labeling Accuracy and Time Cost

Error bars are standard deviations.AccuracyCost (Time)

On the left, I show accuracy. On the right, I show cost. We want accuracy to be high, and cost to be low.115

Error bars are standard deviations.Manual LabelingCV and ManualVerification

&

Manual LabelingCV and ManualVerification

&

Accuracy measuresTask completion time per sceneTohmeRemote EyeTohmeRemote EyeEvaluation | Labeling Accuracy and Time Cost

13% reduction in costAccuracyCost (Time)

116


&


&

TohmeRemote EyeTohmeRemote EyeEvaluation | Labeling Accuracy and Time Cost

Error bars are standard deviations.AccuracyCost (Time)

On the left, I show accuracy. On the right, I show cost. We want accuracy to be high, and cost to be low.

For manual labeling approach alone, our accuracy measures are 84 86%. 94 seconds per intersectionFor CV + manual verification, our results dropped substantially but so did the time cost by more than halfSo, now, for Tohme, here we saw similar accuracies to the manual baseline approach 117


&


&

TohmeRemote EyeTohmeRemote EyeEvaluation | Labeling Accuracy and Time Cost

Error bars are standard deviations.13% reduction in costAccuracyCost (Time)

118


svVerifyManual LabelVerificationsvLabelManual LabelingEvaluation | Smart Task Allocator

~80% of svVerify tasks were correctly routed~50% of svLabel tasks were correctly routed

217 of 277 tasks correctly routed to svVerify119


svVerifyManual LabelVerificationsvLabelManual LabelingEvaluation | Smart Task Allocator

If svControl worked perfectly, Tohmes cost would drop to 28% of a manually labelling approach alone.

120

Study MethodManual labeling without smart task allocation

&vs.CV + Verification without smart task allocationTohmeRemote Eyevs.Evaluation


Manual labeling without smart task allocationTohmeRemote Eyevs.Evaluation

Study Method


AccuracyTask Completion TimeEvaluation

Study Method

We measured accuracy and average task completion time of each workflow. 123

Study MethodWe used 1,046 GSV imagesWe recruited workers from Amazon Mechanical Turk towork on labeling tasks and verification tasks40 GSV images were reserved for golden insertionEvaluation

$0.80 for labeling 5 images and $0.80 for verifying 10 images

124

Labeling Tasks# of distinct turkers:2421,270# of HITs completed:# of tasks completed:6,350

# of tasks allocated:769

Evaluation

We recruited multiple workers to work on labeling tasks and verification tasks. We evaluated the result with Monte Carlo simulation.125

Evaluation

We found that manual approach alone and Tohme achieved similar curb ramp detection accuracy (86% vs. 84%)The approach with smart task allocation reduced the labor cost by 13%Result

Example Labels from Manual Labeling

Lets see how turkers labeled.127

Evaluation | Example Labels from Manual Labeling

In general, their labels were high quality

128


In general, their labels were high quality129


Even with a difficult scene with shadows, they labeled correctly most of the times.

130


Even with a difficult scene with shadows, they labeled correctly most of the times.

131


But some times there were errors. 132

This is a driveway. Not a curb ramp.Evaluation | Example Labels from Manual Labeling

For example this person labeled a drive way as a curb ramp.133


And some was a little lazy.134


And labeled two curb ramps with a single label.135

Examples Labels from CV + Verification

Here are some examples.136

Raw Street View ImageEvaluation | Example Labels from CV + Verification

Here are some examples.137

False detectionAutomatic DetectionEvaluation | Example Labels from CV + Verification

With only computer vision, there are false positive detections.138

Automatic Detection + Human VerificationEvaluation | Example Labels from CV + Verification

With human verification, errors get corrected.139

8,209Intersections in DC

Based on the shapefile downloaded fromdata.dc.gov, there are 8,209 intersections in DC

Manual labeling: 94s per intersection * 8,209 intersections = Tohme: 81 s per intersection

----Source:http://data.dc.gov/Metadata.aspx?id=2106

141

8,209Intersections in DCBack of the Envelope CalculationsManually labeling GSV with our custom interfaces would take 214 hours With Tohme, this drops to 184 hours We think we can do better

Based on the shapefile downloaded fromdata.dc.gov, there are 8,209 intersections in DC

Manual labeling: 94s per intersection * 8,209 intersections = Tohme: 81 s per intersection

----Source:http://data.dc.gov/Metadata.aspx?id=2106

142

makeability lab

Smart task management can improve efficiency of semi-automatic crowd-powered systemTakeawayWe can combine crowdsourcing and automated methods to collect accessibility data from Street View

Future Work: Computer VisionContext integration & scene understanding3D-data integrationImprove training & sample sizeMensuration

(i) Context integration. While we use some context information in Tohme (e.g., 3D-depth data, intersection complexity inference), we are exploring methods to include broader contextual cues about buildings, traffic signal poles, crosswalks, and pedestrians as well as the precise location of corners from top-down map imagery.

(ii) 3D-data integration. Due to low-resolution and noise, we currently use 3D-point cloud data as a ground plane mask rather than as a feature to our CV algorithms. We plan to explore approaches that combine the 3D and 2D imagery to increase scene structure understanding (e.g., [28]). If higher resolution depth data becomes available, this may be useful to directly detect the presence of a curb or corner, which would likely improve our results.

(iii) Training. Our CV algorithms are currently trained using GSV scenes from all eight city regions in our dataset. Given the variation in curb ramp appearance across geographic areas, we expect that performance could be improved if we trained and tested per city. 144

Future Work: Deployment of Volunteer Web Site

This work is supported by

Faculty Research Award

makeability lab

The Crowd-Powered Streetview Accessibility Team!

Kotaro Hara

Jin Sun

Victoria Le

Robert Moore

Sean Pannella

Jonah Chazan

David Jacobs

Jon Froehlich

Zachary LawrenceGraduate StudentUndergraduateHigh SchoolProfessorThanks!@kotarohara_en | [email protected]

using crowdsourcing, automated methods and google street view to collect sidewalk accessibility data

Technology