kusk object dataset: recording access to objects in food preparation

17
Kusk Object Dataset: Recording Access to Objects in Food Preparation Atsushi Hashimoto , Masaaki Iiyama, Shinsuke Mori, Michihiko Minoh Kyoto University http://kusk.mm.media.kyoto-u.ac.jp/en/

Upload: atsushi-hashimoto

Post on 16-Apr-2017

128 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Kusk Object Dataset: Recording Access to Objects in Food Preparation

Kusk Object Dataset: Recording Access to Objects in Food Preparation

Atsushi Hashimoto, Masaaki Iiyama, Shinsuke Mori, Michihiko MinohKyoto University

http://kusk.mm.media.kyoto-u.ac.jp/en/

Page 2: Kusk Object Dataset: Recording Access to Objects in Food Preparation

Computer Vision (CV) meetsNatural Language Processing (NLP)

• CV-NLP collaboration is an active field.– Supported by Matured Machine Learning Tech.– Cooking Media can be a good practice field!• Long text (Recipe) and organized activity (Cooking)

Video observation/instruction

Machine-Readable

Description(BN/DNN)

Recog. Text Generation

CV NLP

Recog./ParseRetrieve

Human-Friendly

Description

Vision Language Real WorldReal World

Page 3: Kusk Object Dataset: Recording Access to Objects in Food Preparation

Grand goal: Comparing Recipe and Human Actions

• From a viewpoint of computer engineering…– Recipe: A kind of script language– Human Actions: An execution of the script by

human• Potential Applications– Automatic Cooking, Online recipe navigation– Cooking Record for Healthcare , Recipe Generation

Page 4: Kusk Object Dataset: Recording Access to Objects in Food Preparation

Pascal Sentence Dataset

http://vision.cs.uiuc.edu/pascal-sentences/

Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. “Collecting Image Annotations Using Amazon's Mechanical Turk”. In Proc. of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk.

• One jet lands at an airport while another takes off next to it.

• Two airplanes parked in an airport.• Two jets taxi past each other.• Two parked jet airplanes facing opposite

directions.• two passenger planes on a grassy plain

Pascal Sentence Dataset: captions and images- Images are obtained from Pascal DatasetCaptions are annotated by Amazon Mechanical Turk

Page 5: Kusk Object Dataset: Recording Access to Objects in Food Preparation

CV/NLP Datasets in CEA fields• NLP

– Cooking Ontology (CEA2014, Japanese) – Cookpad/Rakuten Recipe (2015, Japanese)

• CV– TUM Kitchen Data Set (2009)– CMU Multi-Modal Activity Database (2009)– Actions for Cooking Eggs Dataset (2012)– MPII Cooking Activities Dataset (2012)– 50 Salads dataset (2013)– The Breakfast Actions Dataset (2014)

• CV x NLP– Yummly API– Flow Graph Corpus (2014) × KUSK Dataset (CEA2014)

Page 6: Kusk Object Dataset: Recording Access to Objects in Food Preparation

KUSK Dataset x Flow Graph CorpusKUSK Dataset (Hashimoto,CEA2014) Flow Graph Corpus (Mori, 2014)

Water Flow Sensors

Eye Tracker

Touch Display

Electric Consumption Sensors

Load Sensing Tables

20 recipes, which are shared with flow-graph corpus60 observations by 33 subjects.

Page 7: Kusk Object Dataset: Recording Access to Objects in Food Preparation

The list of 20 recipesCookPad ID KUSK ID Title of Recipe (Original title is in Japanese00121196 2014RC01 Chicken and Chinese cabbage starchy soup00180223 2014RC02 Tomato soup - Japanese style00196551 2014RC03 Omelets00162433 2014RC04 Mother’s chicken salad00201826 2014RC05 Batter-less Fried croquette 00200883 2014RC06 Beef and mushrooms - Korean style00176550 2014RC07 Saute of Shiitake and Shimeji Mushrooms00202059 2014RC08 Potato salad with fresh potatoes00171343 2014RC09 Celery leaves soup00148537 2014RC10 Cooked Tomato with Chicken and Soy beans00185809 2014RC11 Fried broccoli with chicken00196431 2014RC12 Spicy cooked beans with chicken00157755 2014RC13 Black sesame-crusted fried chicken00192913 2014RC14 Zestily flavored fried eggplants00195151 2014RC15 Meat miso wrap00187900 2014RC16 Simmered Chinese cabbage00155229 2014RC17 Chinese style open tofu omelet 00193642 2014RC18 Aglio e olio peperoncino00182653 2014RC19 Radish cake00168029 2014RC20 Noshidori

* a certain complexity* common ingredients

Page 8: Kusk Object Dataset: Recording Access to Objects in Food Preparation

KUSK Object Dataset (expansion from CEA2014)

• Provide object recognition results in KUSK Dataset Videos– A baseline for CV research– Real image processing results as a input for NLP

• Resources: grabbed/released objects– object class name, timestamp, region (rectangle)– Informative to predict forthcoming cooking process(*

• Statistics– 4391 unique images– Total 133 categories (Each recipe has different cat. set.)

* A. Hashimoto et al, “Intention-Sensing Recipe Guidance via User Accessing to Objects,”International Journal of Human-Computer Interaction, 2016

Page 9: Kusk Object Dataset: Recording Access to Objects in Food Preparation

Obtained images (a select review)

IngredientsU

tensilsSeasonings

Backgrounds

Cauliflowers Garlics Tofu

Enoki dake mushrooms Cabbages Pasta

Chop Sticks Bowls Colander Chop. Board Knife

soup stock powder ketchup Pepper Stem of food Dish detergent Sponge Corner Trash Bag

Page 10: Kusk Object Dataset: Recording Access to Objects in Food Preparation

Semi-automated Annotation (1/2)• 3 manual tasks for annotation

1. Correcting Errors in object region extraction by a method from our previous research(Hashimoto, 2012)

2. List up object names appearing in each recipe3. Adding names (from 2.) to each region (from 1.)

Treatment for orthogonal variants at the 2nd task.> Cooking Ontology (Nanba,CEA2014)

– We manually treated items that are not listed in the ontology.

Page 11: Kusk Object Dataset: Recording Access to Objects in Food Preparation

Semi-automated Annotation (1/2)• Workers : students who do not major informatics– # of workers: More than 20 students– term: two months at maximum for each worker– selection: cooking more than once in a week in the last

half year

• Interface: GUI working on Google Chrome – Most of worker get used to operate the browser.– double-check (reject if two annotators answered

differently)• rejected annotation is meta-reviewed by another worker.

– Check and Advise by authors if necessary.

Page 12: Kusk Object Dataset: Recording Access to Objects in Food Preparation

Object Feature and Recognition Result

• Feature: Output of the last layer of ResNet(*

• ResNet: the best CNN model in 2015 competitions• No fine-tuned

(ResNet training does not run in public CNN libraries)

• Classifier Linear SVM (trained for each recipe)• Assumption: Recipe is known, thereby objects too.

*) Kaiming He et al., “Deep Residual Learning for Image Recognition” arXiv preprint arXiv:1512.03385, 2015”

Page 13: Kusk Object Dataset: Recording Access to Objects in Food Preparation

Object Recognition Accuracy20

14RC

01

2014

RC02

2014

RC03

2014

RC04

2014

RC05

2014

RC06

2014

RC07

2014

RC08

2014

RC09

2014

RC10

2014

RC11

2014

RC12

2014

RC13

2014

RC14

2014

RC15

2014

RC16

2014

RC17

2014

RC18

2014

RC19

2014

RC20

Tota

l0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Page 14: Kusk Object Dataset: Recording Access to Objects in Food Preparation

Evaluation by CMC curve

Rank

Acc.

Page 15: Kusk Object Dataset: Recording Access to Objects in Food Preparation

Discussion

• Difficulty in food recognition– Variations: wrapped? cut? and others (eggs change

appearance extremely)

• Relatively easy to recognize utensils and seasonings:– Every kitchen has limited variations.

(environment adaptive system is promised)

• Possibility of RCNN approach– To deal with failures in object region extraction.

Page 16: Kusk Object Dataset: Recording Access to Objects in Food Preparation

Conclusion

• KUSK Dataset x Flow Graph Corpus– hope to be a base dataset for CV x NLP research– problem: texts (and dishes) are Japanese.

• A dataset from Yummly is available for English speakers.

• KUSK Object Dataset KUSK Dataset⊂– History of user accessing objects in cooking

• Contains important information to predict forthcoming process.• Organized by object name, put/taken label, timestamp, and rect.• Features from ResNet and Recognition Results by Linear SVM

Page 17: Kusk Object Dataset: Recording Access to Objects in Food Preparation

Future works

Mail: [email protected] Twitter: @a_hasimoto or Facebook, Researchgate…

Original KUSK Dataset and old version of KUSK Object Dataset.http://kusk.mm.media.kyoto-u.ac.jp/

• Collaborative research with NLP team in Kyoto Univ.– CV2NLP: Vision-assisted NLP, Recipe Text Generation– NLP2CV: Scenario-guided CV + PR

To get KUSK Object Dataset, please do not hesitate to contact us.