kusk object dataset: recording access to objects in food preparation
TRANSCRIPT
Kusk Object Dataset: Recording Access to Objects in Food Preparation
Atsushi Hashimoto, Masaaki Iiyama, Shinsuke Mori, Michihiko MinohKyoto University
http://kusk.mm.media.kyoto-u.ac.jp/en/
Computer Vision (CV) meetsNatural Language Processing (NLP)
• CV-NLP collaboration is an active field.– Supported by Matured Machine Learning Tech.– Cooking Media can be a good practice field!• Long text (Recipe) and organized activity (Cooking)
Video observation/instruction
Machine-Readable
Description(BN/DNN)
Recog. Text Generation
CV NLP
Recog./ParseRetrieve
Human-Friendly
Description
Vision Language Real WorldReal World
Grand goal: Comparing Recipe and Human Actions
• From a viewpoint of computer engineering…– Recipe: A kind of script language– Human Actions: An execution of the script by
human• Potential Applications– Automatic Cooking, Online recipe navigation– Cooking Record for Healthcare , Recipe Generation
Pascal Sentence Dataset
http://vision.cs.uiuc.edu/pascal-sentences/
Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. “Collecting Image Annotations Using Amazon's Mechanical Turk”. In Proc. of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk.
• One jet lands at an airport while another takes off next to it.
• Two airplanes parked in an airport.• Two jets taxi past each other.• Two parked jet airplanes facing opposite
directions.• two passenger planes on a grassy plain
Pascal Sentence Dataset: captions and images- Images are obtained from Pascal DatasetCaptions are annotated by Amazon Mechanical Turk
CV/NLP Datasets in CEA fields• NLP
– Cooking Ontology (CEA2014, Japanese) – Cookpad/Rakuten Recipe (2015, Japanese)
• CV– TUM Kitchen Data Set (2009)– CMU Multi-Modal Activity Database (2009)– Actions for Cooking Eggs Dataset (2012)– MPII Cooking Activities Dataset (2012)– 50 Salads dataset (2013)– The Breakfast Actions Dataset (2014)
• CV x NLP– Yummly API– Flow Graph Corpus (2014) × KUSK Dataset (CEA2014)
KUSK Dataset x Flow Graph CorpusKUSK Dataset (Hashimoto,CEA2014) Flow Graph Corpus (Mori, 2014)
Water Flow Sensors
Eye Tracker
Touch Display
Electric Consumption Sensors
Load Sensing Tables
20 recipes, which are shared with flow-graph corpus60 observations by 33 subjects.
The list of 20 recipesCookPad ID KUSK ID Title of Recipe (Original title is in Japanese00121196 2014RC01 Chicken and Chinese cabbage starchy soup00180223 2014RC02 Tomato soup - Japanese style00196551 2014RC03 Omelets00162433 2014RC04 Mother’s chicken salad00201826 2014RC05 Batter-less Fried croquette 00200883 2014RC06 Beef and mushrooms - Korean style00176550 2014RC07 Saute of Shiitake and Shimeji Mushrooms00202059 2014RC08 Potato salad with fresh potatoes00171343 2014RC09 Celery leaves soup00148537 2014RC10 Cooked Tomato with Chicken and Soy beans00185809 2014RC11 Fried broccoli with chicken00196431 2014RC12 Spicy cooked beans with chicken00157755 2014RC13 Black sesame-crusted fried chicken00192913 2014RC14 Zestily flavored fried eggplants00195151 2014RC15 Meat miso wrap00187900 2014RC16 Simmered Chinese cabbage00155229 2014RC17 Chinese style open tofu omelet 00193642 2014RC18 Aglio e olio peperoncino00182653 2014RC19 Radish cake00168029 2014RC20 Noshidori
* a certain complexity* common ingredients
KUSK Object Dataset (expansion from CEA2014)
• Provide object recognition results in KUSK Dataset Videos– A baseline for CV research– Real image processing results as a input for NLP
• Resources: grabbed/released objects– object class name, timestamp, region (rectangle)– Informative to predict forthcoming cooking process(*
• Statistics– 4391 unique images– Total 133 categories (Each recipe has different cat. set.)
* A. Hashimoto et al, “Intention-Sensing Recipe Guidance via User Accessing to Objects,”International Journal of Human-Computer Interaction, 2016
Obtained images (a select review)
IngredientsU
tensilsSeasonings
Backgrounds
Cauliflowers Garlics Tofu
Enoki dake mushrooms Cabbages Pasta
Chop Sticks Bowls Colander Chop. Board Knife
soup stock powder ketchup Pepper Stem of food Dish detergent Sponge Corner Trash Bag
Semi-automated Annotation (1/2)• 3 manual tasks for annotation
1. Correcting Errors in object region extraction by a method from our previous research(Hashimoto, 2012)
2. List up object names appearing in each recipe3. Adding names (from 2.) to each region (from 1.)
Treatment for orthogonal variants at the 2nd task.> Cooking Ontology (Nanba,CEA2014)
– We manually treated items that are not listed in the ontology.
Semi-automated Annotation (1/2)• Workers : students who do not major informatics– # of workers: More than 20 students– term: two months at maximum for each worker– selection: cooking more than once in a week in the last
half year
• Interface: GUI working on Google Chrome – Most of worker get used to operate the browser.– double-check (reject if two annotators answered
differently)• rejected annotation is meta-reviewed by another worker.
– Check and Advise by authors if necessary.
Object Feature and Recognition Result
• Feature: Output of the last layer of ResNet(*
• ResNet: the best CNN model in 2015 competitions• No fine-tuned
(ResNet training does not run in public CNN libraries)
• Classifier Linear SVM (trained for each recipe)• Assumption: Recipe is known, thereby objects too.
*) Kaiming He et al., “Deep Residual Learning for Image Recognition” arXiv preprint arXiv:1512.03385, 2015”
Object Recognition Accuracy20
14RC
01
2014
RC02
2014
RC03
2014
RC04
2014
RC05
2014
RC06
2014
RC07
2014
RC08
2014
RC09
2014
RC10
2014
RC11
2014
RC12
2014
RC13
2014
RC14
2014
RC15
2014
RC16
2014
RC17
2014
RC18
2014
RC19
2014
RC20
Tota
l0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Evaluation by CMC curve
Rank
Acc.
Discussion
• Difficulty in food recognition– Variations: wrapped? cut? and others (eggs change
appearance extremely)
• Relatively easy to recognize utensils and seasonings:– Every kitchen has limited variations.
(environment adaptive system is promised)
• Possibility of RCNN approach– To deal with failures in object region extraction.
Conclusion
• KUSK Dataset x Flow Graph Corpus– hope to be a base dataset for CV x NLP research– problem: texts (and dishes) are Japanese.
• A dataset from Yummly is available for English speakers.
• KUSK Object Dataset KUSK Dataset⊂– History of user accessing objects in cooking
• Contains important information to predict forthcoming process.• Organized by object name, put/taken label, timestamp, and rect.• Features from ResNet and Recognition Results by Linear SVM
Future works
Mail: [email protected] Twitter: @a_hasimoto or Facebook, Researchgate…
Original KUSK Dataset and old version of KUSK Object Dataset.http://kusk.mm.media.kyoto-u.ac.jp/
• Collaborative research with NLP team in Kyoto Univ.– CV2NLP: Vision-assisted NLP, Recipe Text Generation– NLP2CV: Scenario-guided CV + PR
To get KUSK Object Dataset, please do not hesitate to contact us.