tech kitchen: object detection and classification

Food Image Object Detection and Classification

Challenges and Solutions

Part 1: Detection

自己紹介

• リビツキ　レシェック

• ポーランド出身

• 2016~ クックパッド

• github: lunardog

Warning!This presentation contains images that may cause severe drooling and stomach grumbling.

��@cookpad

History歴史

ImageNet

http://image-net.org



ImageNet Large Scale Visual Recognition Competition

http://www.image-net.org/challenges/LSVRC/



ILSVRC 2010 taskClassificationFor each image, algorithms will produce a list of at most 5 object categories in the descending order of confidence.




ILSVRC 2011 tasks

1. Classification

2. *Classification with localization

*tester task

http://cs231n.stanford.edu/syllabus.html

Classification + Localization



ILSVRC 2012 tasks1. Classification

2. Classification with localization

3. Fine-grained classification

Fine-grained classification




AlexNet

Imagenet classification with deep convolutional neural networks

A Krizhevsky, I Sutskever, GE Hinton, Advances in neural information processing systems, 1097-1105

ILSVRC 2013 tasks1. Detection

2. Classification


ILSVRC 2014 tasks1. Detection

2. Classification


Object Detection




Deep Learning

https://devblogs.nvidia.com

https://devblogs.nvidia.com/parallelforall/nvidia-ibm-cloud-support-imagenet-large-scale-visual-recognition-challenge/

https://devblogs.nvidia.com/parallelforall/nvidia-ibm-cloud-support-imagenet-large-scale-visual-recognition-challenge/

ILSVRC 2015 tasks

1. Object detection

2. Object localization

3. *Object detection from video

4. *Scene classification

ILSVRC 2016 tasks1. Object localization

2. Object detection

3. Object detection from video

4. Scene classification

5. Scene parsing

Cookpad 2016

画像データセット1997年~

レシピ数：国内約260万

+ 国外

+ つくれぽ

+ 手順写真

17言語、60カ国

※数字は2017年02月時点のものです

画像解析の研究関心

• これは料理ですか？

• どの料理ですか？

• 料理はどこですか？

• 。。。

Part 2

Where is the food?料理はどこですか？

ゴールFind food in the image, draw a bounding box around the food item, including the dish, if visible.

If there are multiple items, draw a bounding box around each one.

ゴール

ground truth

bounding box

> 0.9

We count it as a positive detection if Intersection over Union ratio is

greater than 0.9.

number of true positives number of ground truth boxes

number of true positives number of generated boxes

再現率 (precision)

�� (recall)

Methods

1. Build a classifier

2. Pick Regions of Interest

3. Run classifier on each region

4. Remove duplicate detections

IDEA

Fast, Faster R-CNN

(2013) Rich feature hierarchies for accurate object detection and semantic segmentationRoss Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik

(2016) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksShaoqing Ren, Kaiming He, Ross Girshick, Jian Sun

(2015) Fast R-CNNRoss Girshick

https://arxiv.org/find/cs/1/au:+Girshick_R/0/1/0/all/0/1

https://arxiv.org/find/cs/1/au:+Donahue_J/0/1/0/all/0/1

https://arxiv.org/find/cs/1/au:+Darrell_T/0/1/0/all/0/1

https://arxiv.org/find/cs/1/au:+Malik_J/0/1/0/all/0/1


https://arxiv.org/find/cs/1/au:+Ren_S/0/1/0/all/0/1

https://arxiv.org/find/cs/1/au:+He_K/0/1/0/all/0/1


https://arxiv.org/find/cs/1/au:+Sun_J/0/1/0/all/0/1

https://arxiv.org/find/cs/1/au:+Ren_S/0/1/0/all/0/1



問題

1. Computational cost

2. Context is important

3. ...but context can be

confusing.

hand

food

grass

food

http://pixabay.com

http://pixabay.com

http://pixabay.com

Single Shot Detector

(2015) SSD: Single Shot MultiBox DetectorWei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg

https://arxiv.org/find/cs/1/au:+Liu_W/0/1/0/all/0/1

https://arxiv.org/find/cs/1/au:+Anguelov_D/0/1/0/all/0/1

https://arxiv.org/find/cs/1/au:+Erhan_D/0/1/0/all/0/1

https://arxiv.org/find/cs/1/au:+Szegedy_C/0/1/0/all/0/1

https://arxiv.org/find/cs/1/au:+Reed_S/0/1/0/all/0/1

https://arxiv.org/find/cs/1/au:+Fu_C/0/1/0/all/0/1

https://arxiv.org/find/cs/1/au:+Berg_A/0/1/0/all/0/1

https://arxiv.org/find/cs/1/au:+Reed_S/0/1/0/all/0/1

Either The Least Or Most Employable Person Ever

- The Huffington Post

github.com/pjreddiepjreddie.com/darknet/www.kaggle.com/16295-pjreddie

Joseph Redmon

http://github.com/pjreddie

http://github.com/pjreddie

http://pjreddie.com/darknet/

http://pjreddie.com/darknet/

http://www.kaggle.com/16295-pjreddie

http://www.kaggle.com/16295-pjreddie

You Only Look Once

(2015) You Only Look Once: Unified, Real-Time Object DetectionJoseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi

(Dec. 2016) YOLO9000: Better, Faster, StrongerJoseph Redmon, Ali Farhadi

https://arxiv.org/find/cs/1/au:+Redmon_J/0/1/0/all/0/1

https://arxiv.org/find/cs/1/au:+Divvala_S/0/1/0/all/0/1




https://arxiv.org/find/cs/1/au:+Farhadi_A/0/1/0/all/0/1




You Only Look Once: Unified, Real-Time Object DetectionJoseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi

YOLO in Context


https://arxiv.org/find/cs/1/au:+Divvala_S/0/1/0/all/0/1




https://drive.google.com/a/cookpad.jp/file/d/0B8DV9931PX43Mk4wdzk5d1BXUU0/view?usp=sharing

tech kitchen: object detection and classification

Science