from large scale image categorization to entry-level categories

41
2014/02/23 CV勉強会@関東 ICCV2013読み会 発表資料 takmin

Upload: takuya-minagawa

Post on 21-Jun-2015

6.082 views

Category:

Technology


0 download

DESCRIPTION

CV勉強会@関東 ICCV2013読み会資料

TRANSCRIPT

Page 1: From Large Scale Image Categorization to Entry-Level Categories

2014/02/23 CV勉強会@関東ICCV2013読み会 発表資料

takmin

Page 2: From Large Scale Image Categorization to Entry-Level Categories

紹介する研究

“From Large Scale Image Categorization to Entry-Level Categories”

Vicente Ordonez

Jia Deng

Yejin Choi

Alexander C. Berg

Tamara L. Berg

Page 3: From Large Scale Image Categorization to Entry-Level Categories

はじめに

この資料は、First AuthorのOrdonezさんが自身のサイトに発表スライドをアップしていたため、それを元に和訳と若干の追加説明を加えたものです。 http://www.cs.unc.edu/~vicente/entrylevel/index.

html

Best Paperに選ばれた理由 大規模一般物体認識に対して新しい問題提起を

行ったこと?

Page 4: From Large Scale Image Categorization to Entry-Level Categories

この絵をなんて呼びますか?

Dolphin (イルカ)

Grampus griseus (ハナゴンドウ)

Page 5: From Large Scale Image Categorization to Entry-Level Categories

この絵をなんて呼びますか?

Object (物体)

Organism (生物)

Animal (動物)

Chordate (脊索動物)

Vertebrate (脊椎動物)

Aquatic bird (水鳥)

Bird (鳥)

Swan (白鳥)

Cygnus Columbianus(コハクチョウ)

Whistling swan (アメリカコハクチョウ)

Page 6: From Large Scale Image Categorization to Entry-Level Categories

基礎知識

WordNet 英語の概念辞書

約15万語

synsetと呼ばれる同義語のグループに分類され、synset同士の関係が記述されている

名詞と動詞は階層構造を持つ

ImageNet WordNetの名詞に対応した画像データセット

1つのsynsetあたり平均1000枚の画像

Page 7: From Large Scale Image Categorization to Entry-Level Categories

画像の内容に名前をつける

Vision

Input Image

Grizzly bear

Homing pigeon

Ball-peen hammer

Steel arch bridge

Farmhouse

Soapweed

Brazilian rosewood

Bristlecone pine

Cliffdiving

Crabapple

Grampus griseus

American black bear

King penguin

Cormorant

Spigot

Diskette, floppy

(0.80)

(0.83)

(0.16)

(0.25)

(0.11)

(0.56)

(0.26)

(0.06)

(0.07)

(0.06)

(0.16)

(0.03)

(0.12)

(0.13)

(0.04)

(0.19)

Thousands of Noisy Category Predictions

Grampus griseus

Pick the Best

Page 8: From Large Scale Image Categorization to Entry-Level Categories

画像の内容に名前をつける

Vision

Input Image

Grizzly bear

Homing pigeon

Ball-peen hammer

Steel arch bridge

Farmhouse

Soapweed

Brazilian rosewood

Bristlecone pine

Cliffdiving

Crabapple

Grampus griseus

American black bear

King penguin

Cormorant

Spigot

Diskette, floppy

(0.80)

(0.83)

(0.16)

(0.25)

(0.11)

(0.56)

(0.26)

(0.06)

(0.07)

(0.06)

(0.16)

(0.03)

(0.12)

(0.13)

(0.04)

(0.19)

Thousands of Noisy Category Predictions

Grampus griseus

Pick the BestDolphin

What Should I Call It?

Naming

Page 9: From Large Scale Image Categorization to Entry-Level Categories

Entry-Level Category

物体の画像を見せられた時、人がつけるであろう名前

Rosch et al, 1976Jolicoeur, Gluck & Kosslyn, 1984

上位後: animal, vertebrateエントリーレベル: bird下位語: Black-capped chickadee (アメリカコガラ)

Page 10: From Large Scale Image Categorization to Entry-Level Categories

Entry-Level Category

上位語: animal, birdエントリーレベル: penguin下位語: Chinstrap penguin (アゴヒモペンギン)

物体の画像を見せられた時、人がつけるであろう名前

Rosch et al, 1976Jolicoeur, Gluck & Kosslyn, 1984

Page 11: From Large Scale Image Categorization to Entry-Level Categories

それって難しいの?

Cormorant

Daisy

Frog Orchid

King penguin

Living thing

Seabird

Penguin

Angiosperm

Flower

Orchid

Plant, FloraBird

Daffodil

Narcissus

Bulbous Plant

wordnet hierarchy

被子植物 球根植物

スイセン

ラッパスイセン

どこまで上位語に遡れば良い?

上位語にEntry Categoryを含まない場合は?

Page 12: From Large Scale Image Categorization to Entry-Level Categories

さて、どうやろう?

Linguistic resources Lots of text

Wordnet Google Web 1T

Our dog Zoe in her bed

Interior design of modern white and brown living room furniture hanging.

The Egyptian cat statue by the floor clock and perpetual motion

Man sits in a rusted car buried in the sand on Waitarere beach

Emma in her hat looking super cute

Little girl and her dog in northern Thailand. They both seemed.

Lots of images with text

SBU Captioned Dataset

Labeled Images

Imagenet

Computer Vision

Page 13: From Large Scale Image Categorization to Entry-Level Categories

Ground Truthデータ作成

Friesian, Holstein, Holstein-Friesian

cowcattlepasturefence

WordNetの葉ノード(x500)

Amazon Mechanical Terk

x8

対応する画像x10

投票

Page 14: From Large Scale Image Categorization to Entry-Level Categories

1. Goal: Category TranslationDetailed Category

Grampus griseus

𝑑

What should I Call It?(Entry-Level Category)

dolphin

𝑒

2. Goal: Content Naming

What should I Call It?(Entry-Level Category)

dolphin

𝑒

Input Image

Page 15: From Large Scale Image Categorization to Entry-Level Categories

1. Goal: Category TranslationDetailed Category

Grampus griseus

𝑑

What should I Call It?(Entry-Level Category)

dolphin

𝑒

2. Goal: Content Naming

What should I Call It?(Entry-Level Category)

dolphin

𝑒

Input Image

Page 16: From Large Scale Image Categorization to Entry-Level Categories

1. Goal: Category TranslationDetailed Category

Grampus griseus

𝑑

What should I Call It?(Entry-Level Category)

dolphin

𝑒

1.1 Text BasedWebコーパスとWordNetの階層構造のみから推定

1.2 Image Based画像特徴からWordNetの単語へ投票して推定

Page 17: From Large Scale Image Categorization to Entry-Level Categories

Cormorant

Sperm whale

Grampus griseus

King penguin

Sem

antic

Distance

𝜓(𝑑, 𝑒) 𝜙(𝑒)

n-gramFrequency

656M

366M

128M

88M 1.2M

22M

15M

0.9M

55M

30M

6.4M

0.08M

1.1 Category Translation: Text-based

Animal

Seabird

Penguin

Cetacean

Whale

Dolphin

MammalBird

wordnet hierarchy

Page 18: From Large Scale Image Categorization to Entry-Level Categories

Cormorant

Sperm whale

Grampus griseus

King penguin

Sem

antic

Distance

𝜓(𝑑, 𝑒) 𝜙(𝑒)656M

366M

128M

88M 1.2M

22M

15M

0.9M

55M

30M

6.4M

0.08M

1.1 Category Translation: Text-based

Animal

Seabird

Penguin

Cetacean

Whale

Dolphin

MammalBird

wordnet hierarchy

Naturalness

𝜏 𝑑, 𝜆 = argmax𝑤

[𝜙 𝑒 − 𝜆𝜓(𝑑, 𝑒)]

Page 19: From Large Scale Image Categorization to Entry-Level Categories

1.2 Category Translation: Image-based

Friesian, Holstein, Holstein-Friesian

(1.9071) cow(1.1851) orange_tree(0.6136) stall(0.5630) mushroom(0.3825) pasture(0.3156) sheep(0.3321) black_bear(0.3015) puppy(0.2409) pedestrian_bridge(0.2353) nest

Vision System

WordNetの葉ノード

対応する画像

アノテーション

TF-IDFでランク付け

Page 20: From Large Scale Image Categorization to Entry-Level Categories

cactus wren bird bird bird

buzzard, Buteo buteo hawk hawk bird

whinchat, Saxicola rubetra bird chat bird

Weimaraner dog dog dog

numbat, banded anteater, anteater anteater anteater cat

rhea, Rhea americana ostrich bird grass

Europ. black grouse, heathfowl bird bird duck

yellowbelly marmot, rockchuck Squirrel marmot rock

HUMANS IMAGEBASED

TEXTBASED

Category Translation: Examples

Page 21: From Large Scale Image Categorization to Entry-Level Categories

cactus wren bird bird bird

buzzard, Buteo buteo hawk hawk bird

whinchat, Saxicola rubetra bird chat bird

Weimaraner dog dog dog

numbat, banded anteater, anteater anteater anteater cat

rhea, Rhea americana ostrich bird grass

Europ. black grouse, heathfowl bird bird duck

yellowbelly marmot, rockchuck Squirrel marmot rock

HUMANS IMAGEBASED

TEXTBASED

Category Translation: Examples

Page 22: From Large Scale Image Categorization to Entry-Level Categories

cactus wren bird bird bird

buzzard, Buteo buteo hawk hawk bird

whinchat, Saxicola rubetra bird chat bird

Weimaraner dog dog dog

numbat, banded anteater, anteater anteater anteater cat

rhea, Rhea americana ostrich bird grass

Europ. black grouse, heathfowl bird bird duck

yellowbelly marmot, rockchuck Squirrel marmot rock

HUMANS IMAGEBASED

TEXTBASED

Category Translation: Examples

「chat」は鳥の種類。コーパスに「おしゃべり」の意味で頻出

Page 23: From Large Scale Image Categorization to Entry-Level Categories

cactus wren bird bird bird

buzzard, Buteo buteo hawk hawk bird

whinchat, Saxicola rubetra bird chat bird

Weimaraner dog dog dog

numbat, banded anteater, anteater anteater anteater cat

rhea, Rhea americana ostrich bird grass

Europ. black grouse, heathfowl bird bird duck

yellowbelly marmot, rockchuck Squirrel marmot rock

HUMANS IMAGEBASED

TEXTBASED

Category Translation: Examples

アメリカダチョウ

Page 24: From Large Scale Image Categorization to Entry-Level Categories

1. Goal: Category TranslationDetailed Category

Grampus griseus

𝑑

What should I Call It?(Entry-Level Category)

dolphin

𝑒

2. Goal: Content Naming

What should I Call It?(Entry-Level Category)

dolphin

𝑒

Input Image

Page 25: From Large Scale Image Categorization to Entry-Level Categories

2. Goal: Content Naming

What should I Call It?(Entry-Level Category)

dolphin

𝑒

Input Image

2.1 Propagated Visual Estimates画像認識結果にWebコーパスの情報を加味して、“自然な”WordNet上の単語を選択

2.2 Supervised LearningImageNetの葉ノードに対する認識結果をEntry-Level Categoriesへ変換

Page 26: From Large Scale Image Categorization to Entry-Level Categories

Coding(LLC),Wang et al. CVPR 2010

Spatial pooling

Flat Classifiers

Local descriptors

Grizzly bear

Homing pigeon

Ball-peen hammer

Steel arch bridge

Farmhouse

Soapweed

Brazilian rosewood

Bristlecone pine

Cliffdiving

Crabapple

Grampus griseus

American black bear

King penguin

Cormorant

Spigot

Diskette, floppy

(0.80)

(0.41)

(0.16)

(0.25)

(0.11)

(0.56)

(0.26)

(0.06)

(0.07)

(0.06)

(0.16)

(0.03)

(0.12)

(0.13)

(0.04)

(0.19)

Selective Search Windows. van De Sande et al. ICCV 2011

Large Scale Categorization

Page 27: From Large Scale Image Categorization to Entry-Level Categories

2.1 Propagated Visual Estimates

Accuracy 𝑓(𝑣, 𝐼)

Cormorant

Sperm whale

King penguin

(0.15)

(0.05)

(0.2)

(0.6)Grampus griseus

Page 28: From Large Scale Image Categorization to Entry-Level Categories

2.1 Propagated Visual Estimates

Accuracy

𝑓(𝑣, 𝐼)

(0.2)

(0.8)

(0.2)

(0.8)

(0.8)

(1.0)

(0.15)

(0.6)

Animal

Seabird

Penguin

Cetacean

Whale

Dolphin

MammalBird

Cormorant

Sperm whale

King penguin

(0.15)

(0.05)

(0.2)

(0.6)Grampus griseus

Page 29: From Large Scale Image Categorization to Entry-Level Categories

2.1 Propagated Visual Estimates

Accuracy

𝑓(𝑣, 𝐼)

(0.2)

(0.8)

(0.2)

(0.8)

(0.8)

(1.0)

Specificity

- 𝜓(𝑣)

Deng et al. CVPR 2012

𝑓 𝑣, 𝐼, 𝜆 = 𝑓(𝑣, 𝐼) [− 𝜓 𝑣 + 𝜆]

(0.15)

(0.6)

Animal

Seabird

Penguin

Cetacean

Whale

Dolphin

MammalBird

Cormorant

Sperm whale

King penguin

(0.15)

(0.05)

(0.2)

(0.6)Grampus griseus

葉か

らの

距離

Page 30: From Large Scale Image Categorization to Entry-Level Categories

2.1 Propagated Visual Estimates

Accuracy

𝑓(𝑣, 𝐼)

(0.2)

(0.8)

(0.2)

(0.8)

(0.8)

(1.0)

Specificity

- 𝜓(𝑣)

Deng et al. CVPR 2012

𝑓 𝑣, 𝐼, 𝜆 = 𝑓(𝑣, 𝐼) [− 𝜓 𝑣 + 𝜆]

(0.15)

(0.6)

Animal

Seabird

Penguin

Cetacean

Whale

Dolphin

MammalBird

Cormorant

Sperm whale

King penguin

(0.15)

(0.05)

(0.2)

(0.6)Grampus griseus

𝜙(𝑣)

Naturalness

656M

366M

128M

88M 1.2M

22M

15M

0.9M

55M

30M 6.4M

𝑓𝑛𝑎𝑡 𝑣, 𝐼, 𝜆 = 𝑓(𝑣, 𝐼) [𝜙 𝒗 − 𝜆 𝜓(𝑣)]Our work

0.08M

Page 31: From Large Scale Image Categorization to Entry-Level Categories

SBU Captioned Dataset100万枚のキャプション付き画像データセット(from Flickr)

2.2 Supervised Learning

キャプションからEntry-Level Categories生成

POS Taggerでノイズ除去

ImageNetの葉ノードとEntry-Level Categoriesとの関係を学習

Page 32: From Large Scale Image Categorization to Entry-Level Categories

2.2 Supervised Learning

𝑋 =

Grizzly bear

Homing pigeon

Ball-peen hammer

Steel arch bridge

Farmhouse

Soapweed

Brazilian rosewood

Bristlecone pine

Cliffdiving

Crabapple

Grampus griseus

American black bear

King penguin

Cormorant

Spigot

Diskette, floppy

(0.80)

(0.41)

(0.16)

(0.25)

(0.11)

(0.56)

(0.26)

(0.06)

(0.07)

(0.06)

(0.16)

(0.03)

(0.12)

(0.13)

(0.04)

(0.19)

画像認識結果

ImageNetの葉ノード

Page 33: From Large Scale Image Categorization to Entry-Level Categories

2.2 Supervised Learning

𝑋 =

Grizzly bear

Homing pigeon

Ball-peen hammer

Steel arch bridge

Farmhouse

Soapweed

Brazilian rosewood

Bristlecone pine

Cliffdiving

Crabapple

Grampus griseus

American black bear

King penguin

Cormorant

Spigot

Diskette, floppy

(0.80)

(0.41)

(0.16)

(0.25)

(0.11)

(0.56)

(0.26)

(0.06)

(0.07)

(0.06)

(0.16)

(0.03)

(0.12)

(0.13)

(0.04)

(0.19)

Bear

Dog

Building

House

Bird

Penguin

Tree

Palm tree

SBU Captioned Photo Dataset1 million captioned images!

𝑓𝑠𝑣𝑚 𝒗𝒊, 𝐼, Θ =1

1 − exp(𝑎Θ𝑇𝑋 + 𝑏)

training from weak annotations

学習

Page 34: From Large Scale Image Categorization to Entry-Level Categories

snag

shade tree

bracket fungus, shelf fungus

bristlecone pine, Rocky Mountain bristlecone

pine, Pinus aristata

Brazilian rosewood, caviuna wood, jacaranda,

Dalbergia nigra

redheaded woodpecker, redhead, Melanerpes

erythrocephalus

redbud, Cercis canadensis

mangrove, Rhizophora mangle

chiton, coat-of-mail shell, sea cradle,

polyplacophore

crab apple, crabapple

papaya, papaia, pawpaw, papaya tree, melon

tree, Carica papaya

frogmouth

Extracting Meaning from Data

Mammals Birds Instruments Structures Plants Other

“tree”を学習させた結果

Page 35: From Large Scale Image Categorization to Entry-Level Categories

water dog

surfing, surfboarding, surfriding

manatee, Trichechus manatus

punt

dip, plunge

cliff diving

fly-fishing

sockeye, sockeye salmon, red salmon,

blueback salmon, Oncorhynchus nerka

sea otter, Enhydra lutris

American coot, marsh hen, mud hen, water

hen, Fulica americana

booby

canal boat, narrow boat, narrowboat

Extracting Meaning from Data

Mammals Birds Instruments Structures Plants Other

“water”を学習させた結果

Page 36: From Large Scale Image Categorization to Entry-Level Categories

2.3 Joint

2.1 Propagated Visual Estimatesと

2.2 Supervised Learning

の統合

𝑓𝑗𝑜𝑖𝑛𝑡 𝑣, 𝛼 = 𝛼 𝑓𝑛𝑎𝑡 𝑣 + 𝑓𝑠𝑣𝑚 𝑣

Page 37: From Large Scale Image Categorization to Entry-Level Categories

Results: Content Naming

Human Labels Flat ClassifierDeng et al.CVPR’12

PropagatedVisual Estimates

SupervisedLearning

Joint

farm, fencefieldhorse, mulekite, dirtpeopletree, zoo

geldingyearlingshireyearlingdraft

horse

equineperissodactylungulatemale

horsetree

equinemalegelding

horse

pasturefield

cowfence

horse

pasturefield

cowfence

Page 38: From Large Scale Image Categorization to Entry-Level Categories

Results: Content Naming

Human Labels Flat ClassifierDeng et al.CVPR’12

PropagatedVisual Estimates

SupervisedLearning

Joint

fence, junksignstop signstreet signtrash cantree

feederHylacleanerboxlarge

woodytree

structureplantvascular

tree

structurebuildingplantarea

logostreetneighborhoobuildingoffice building

logostreetneighborhoodbuildingoffice

Page 39: From Large Scale Image Categorization to Entry-Level Categories

Evaluation: Content Naming

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

20%

22%

24%

26%

FlatClassifier

Deng et al.CVPR'12

PropagatedVisual

Estimates

SupervisedLearning

Combined

Test Set A – Random Images

Precision Recall

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

20%

22%

24%

26%

FlatClassifier

Deng et al.CVPR'12

PropagatedVisual

Estimates

SupervisedLearning

Combined

Test Set B – High Confidence Prediction Scores

Precision Recall

Page 40: From Large Scale Image Categorization to Entry-Level Categories

Conclusions/Future Work

画像へ名前付けを行うための新しい方法を検討

画像に対し、人間のような名前付けを行うための方法を示した

名詞以外にもアクションや属性の推定などにも今後拡張していきたい

Page 41: From Large Scale Image Categorization to Entry-Level Categories

Questions?