from large scale image categorization to entry-level categories
DESCRIPTION
CV勉強会@関東 ICCV2013読み会資料TRANSCRIPT
2014/02/23 CV勉強会@関東ICCV2013読み会 発表資料
takmin
紹介する研究
“From Large Scale Image Categorization to Entry-Level Categories”
Vicente Ordonez
Jia Deng
Yejin Choi
Alexander C. Berg
Tamara L. Berg
はじめに
この資料は、First AuthorのOrdonezさんが自身のサイトに発表スライドをアップしていたため、それを元に和訳と若干の追加説明を加えたものです。 http://www.cs.unc.edu/~vicente/entrylevel/index.
html
Best Paperに選ばれた理由 大規模一般物体認識に対して新しい問題提起を
行ったこと?
この絵をなんて呼びますか?
Dolphin (イルカ)
Grampus griseus (ハナゴンドウ)
この絵をなんて呼びますか?
Object (物体)
Organism (生物)
Animal (動物)
Chordate (脊索動物)
Vertebrate (脊椎動物)
Aquatic bird (水鳥)
Bird (鳥)
Swan (白鳥)
Cygnus Columbianus(コハクチョウ)
Whistling swan (アメリカコハクチョウ)
基礎知識
WordNet 英語の概念辞書
約15万語
synsetと呼ばれる同義語のグループに分類され、synset同士の関係が記述されている
名詞と動詞は階層構造を持つ
ImageNet WordNetの名詞に対応した画像データセット
1つのsynsetあたり平均1000枚の画像
画像の内容に名前をつける
Vision
Input Image
Grizzly bear
Homing pigeon
Ball-peen hammer
Steel arch bridge
Farmhouse
Soapweed
Brazilian rosewood
Bristlecone pine
Cliffdiving
Crabapple
Grampus griseus
American black bear
King penguin
Cormorant
Spigot
Diskette, floppy
(0.80)
(0.83)
(0.16)
(0.25)
(0.11)
(0.56)
(0.26)
(0.06)
(0.07)
(0.06)
(0.16)
(0.03)
(0.12)
(0.13)
(0.04)
(0.19)
Thousands of Noisy Category Predictions
Grampus griseus
Pick the Best
画像の内容に名前をつける
Vision
Input Image
Grizzly bear
Homing pigeon
Ball-peen hammer
Steel arch bridge
Farmhouse
Soapweed
Brazilian rosewood
Bristlecone pine
Cliffdiving
Crabapple
Grampus griseus
American black bear
King penguin
Cormorant
Spigot
Diskette, floppy
(0.80)
(0.83)
(0.16)
(0.25)
(0.11)
(0.56)
(0.26)
(0.06)
(0.07)
(0.06)
(0.16)
(0.03)
(0.12)
(0.13)
(0.04)
(0.19)
Thousands of Noisy Category Predictions
Grampus griseus
Pick the BestDolphin
What Should I Call It?
Naming
Entry-Level Category
物体の画像を見せられた時、人がつけるであろう名前
Rosch et al, 1976Jolicoeur, Gluck & Kosslyn, 1984
上位後: animal, vertebrateエントリーレベル: bird下位語: Black-capped chickadee (アメリカコガラ)
Entry-Level Category
上位語: animal, birdエントリーレベル: penguin下位語: Chinstrap penguin (アゴヒモペンギン)
物体の画像を見せられた時、人がつけるであろう名前
Rosch et al, 1976Jolicoeur, Gluck & Kosslyn, 1984
それって難しいの?
Cormorant
Daisy
Frog Orchid
King penguin
Living thing
Seabird
Penguin
Angiosperm
Flower
Orchid
Plant, FloraBird
Daffodil
Narcissus
Bulbous Plant
wordnet hierarchy
鵜
被子植物 球根植物
スイセン
ラッパスイセン
どこまで上位語に遡れば良い?
上位語にEntry Categoryを含まない場合は?
さて、どうやろう?
Linguistic resources Lots of text
Wordnet Google Web 1T
Our dog Zoe in her bed
Interior design of modern white and brown living room furniture hanging.
The Egyptian cat statue by the floor clock and perpetual motion
Man sits in a rusted car buried in the sand on Waitarere beach
Emma in her hat looking super cute
Little girl and her dog in northern Thailand. They both seemed.
Lots of images with text
SBU Captioned Dataset
Labeled Images
Imagenet
Computer Vision
Ground Truthデータ作成
Friesian, Holstein, Holstein-Friesian
cowcattlepasturefence
WordNetの葉ノード(x500)
Amazon Mechanical Terk
x8
対応する画像x10
投票
1. Goal: Category TranslationDetailed Category
Grampus griseus
𝑑
What should I Call It?(Entry-Level Category)
dolphin
𝑒
2. Goal: Content Naming
What should I Call It?(Entry-Level Category)
dolphin
𝑒
Input Image
1. Goal: Category TranslationDetailed Category
Grampus griseus
𝑑
What should I Call It?(Entry-Level Category)
dolphin
𝑒
2. Goal: Content Naming
What should I Call It?(Entry-Level Category)
dolphin
𝑒
Input Image
1. Goal: Category TranslationDetailed Category
Grampus griseus
𝑑
What should I Call It?(Entry-Level Category)
dolphin
𝑒
1.1 Text BasedWebコーパスとWordNetの階層構造のみから推定
1.2 Image Based画像特徴からWordNetの単語へ投票して推定
Cormorant
Sperm whale
Grampus griseus
King penguin
Sem
antic
Distance
𝜓(𝑑, 𝑒) 𝜙(𝑒)
n-gramFrequency
656M
366M
128M
88M 1.2M
22M
15M
0.9M
55M
30M
6.4M
0.08M
1.1 Category Translation: Text-based
Animal
Seabird
Penguin
Cetacean
Whale
Dolphin
MammalBird
wordnet hierarchy
Cormorant
Sperm whale
Grampus griseus
King penguin
Sem
antic
Distance
𝜓(𝑑, 𝑒) 𝜙(𝑒)656M
366M
128M
88M 1.2M
22M
15M
0.9M
55M
30M
6.4M
0.08M
1.1 Category Translation: Text-based
Animal
Seabird
Penguin
Cetacean
Whale
Dolphin
MammalBird
wordnet hierarchy
Naturalness
𝜏 𝑑, 𝜆 = argmax𝑤
[𝜙 𝑒 − 𝜆𝜓(𝑑, 𝑒)]
1.2 Category Translation: Image-based
Friesian, Holstein, Holstein-Friesian
(1.9071) cow(1.1851) orange_tree(0.6136) stall(0.5630) mushroom(0.3825) pasture(0.3156) sheep(0.3321) black_bear(0.3015) puppy(0.2409) pedestrian_bridge(0.2353) nest
Vision System
WordNetの葉ノード
対応する画像
アノテーション
TF-IDFでランク付け
cactus wren bird bird bird
buzzard, Buteo buteo hawk hawk bird
whinchat, Saxicola rubetra bird chat bird
Weimaraner dog dog dog
numbat, banded anteater, anteater anteater anteater cat
rhea, Rhea americana ostrich bird grass
Europ. black grouse, heathfowl bird bird duck
yellowbelly marmot, rockchuck Squirrel marmot rock
HUMANS IMAGEBASED
TEXTBASED
Category Translation: Examples
cactus wren bird bird bird
buzzard, Buteo buteo hawk hawk bird
whinchat, Saxicola rubetra bird chat bird
Weimaraner dog dog dog
numbat, banded anteater, anteater anteater anteater cat
rhea, Rhea americana ostrich bird grass
Europ. black grouse, heathfowl bird bird duck
yellowbelly marmot, rockchuck Squirrel marmot rock
HUMANS IMAGEBASED
TEXTBASED
Category Translation: Examples
cactus wren bird bird bird
buzzard, Buteo buteo hawk hawk bird
whinchat, Saxicola rubetra bird chat bird
Weimaraner dog dog dog
numbat, banded anteater, anteater anteater anteater cat
rhea, Rhea americana ostrich bird grass
Europ. black grouse, heathfowl bird bird duck
yellowbelly marmot, rockchuck Squirrel marmot rock
HUMANS IMAGEBASED
TEXTBASED
Category Translation: Examples
「chat」は鳥の種類。コーパスに「おしゃべり」の意味で頻出
cactus wren bird bird bird
buzzard, Buteo buteo hawk hawk bird
whinchat, Saxicola rubetra bird chat bird
Weimaraner dog dog dog
numbat, banded anteater, anteater anteater anteater cat
rhea, Rhea americana ostrich bird grass
Europ. black grouse, heathfowl bird bird duck
yellowbelly marmot, rockchuck Squirrel marmot rock
HUMANS IMAGEBASED
TEXTBASED
Category Translation: Examples
アメリカダチョウ
1. Goal: Category TranslationDetailed Category
Grampus griseus
𝑑
What should I Call It?(Entry-Level Category)
dolphin
𝑒
2. Goal: Content Naming
What should I Call It?(Entry-Level Category)
dolphin
𝑒
Input Image
2. Goal: Content Naming
What should I Call It?(Entry-Level Category)
dolphin
𝑒
Input Image
2.1 Propagated Visual Estimates画像認識結果にWebコーパスの情報を加味して、“自然な”WordNet上の単語を選択
2.2 Supervised LearningImageNetの葉ノードに対する認識結果をEntry-Level Categoriesへ変換
Coding(LLC),Wang et al. CVPR 2010
Spatial pooling
Flat Classifiers
Local descriptors
Grizzly bear
Homing pigeon
Ball-peen hammer
Steel arch bridge
Farmhouse
Soapweed
Brazilian rosewood
Bristlecone pine
Cliffdiving
Crabapple
Grampus griseus
American black bear
King penguin
Cormorant
Spigot
Diskette, floppy
(0.80)
(0.41)
(0.16)
(0.25)
(0.11)
(0.56)
(0.26)
(0.06)
(0.07)
(0.06)
(0.16)
(0.03)
(0.12)
(0.13)
(0.04)
(0.19)
Selective Search Windows. van De Sande et al. ICCV 2011
Large Scale Categorization
2.1 Propagated Visual Estimates
Accuracy 𝑓(𝑣, 𝐼)
Cormorant
Sperm whale
King penguin
(0.15)
(0.05)
(0.2)
(0.6)Grampus griseus
2.1 Propagated Visual Estimates
Accuracy
𝑓(𝑣, 𝐼)
(0.2)
(0.8)
(0.2)
(0.8)
(0.8)
(1.0)
(0.15)
(0.6)
Animal
Seabird
Penguin
Cetacean
Whale
Dolphin
MammalBird
Cormorant
Sperm whale
King penguin
(0.15)
(0.05)
(0.2)
(0.6)Grampus griseus
2.1 Propagated Visual Estimates
Accuracy
𝑓(𝑣, 𝐼)
(0.2)
(0.8)
(0.2)
(0.8)
(0.8)
(1.0)
Specificity
- 𝜓(𝑣)
Deng et al. CVPR 2012
𝑓 𝑣, 𝐼, 𝜆 = 𝑓(𝑣, 𝐼) [− 𝜓 𝑣 + 𝜆]
(0.15)
(0.6)
Animal
Seabird
Penguin
Cetacean
Whale
Dolphin
MammalBird
Cormorant
Sperm whale
King penguin
(0.15)
(0.05)
(0.2)
(0.6)Grampus griseus
葉か
らの
距離
2.1 Propagated Visual Estimates
Accuracy
𝑓(𝑣, 𝐼)
(0.2)
(0.8)
(0.2)
(0.8)
(0.8)
(1.0)
Specificity
- 𝜓(𝑣)
Deng et al. CVPR 2012
𝑓 𝑣, 𝐼, 𝜆 = 𝑓(𝑣, 𝐼) [− 𝜓 𝑣 + 𝜆]
(0.15)
(0.6)
Animal
Seabird
Penguin
Cetacean
Whale
Dolphin
MammalBird
Cormorant
Sperm whale
King penguin
(0.15)
(0.05)
(0.2)
(0.6)Grampus griseus
𝜙(𝑣)
Naturalness
656M
366M
128M
88M 1.2M
22M
15M
0.9M
55M
30M 6.4M
𝑓𝑛𝑎𝑡 𝑣, 𝐼, 𝜆 = 𝑓(𝑣, 𝐼) [𝜙 𝒗 − 𝜆 𝜓(𝑣)]Our work
0.08M
SBU Captioned Dataset100万枚のキャプション付き画像データセット(from Flickr)
2.2 Supervised Learning
キャプションからEntry-Level Categories生成
POS Taggerでノイズ除去
ImageNetの葉ノードとEntry-Level Categoriesとの関係を学習
2.2 Supervised Learning
𝑋 =
Grizzly bear
Homing pigeon
Ball-peen hammer
Steel arch bridge
Farmhouse
Soapweed
Brazilian rosewood
Bristlecone pine
Cliffdiving
Crabapple
Grampus griseus
American black bear
King penguin
Cormorant
Spigot
Diskette, floppy
(0.80)
(0.41)
(0.16)
(0.25)
(0.11)
(0.56)
(0.26)
(0.06)
(0.07)
(0.06)
(0.16)
(0.03)
(0.12)
(0.13)
(0.04)
(0.19)
画像認識結果
ImageNetの葉ノード
2.2 Supervised Learning
𝑋 =
Grizzly bear
Homing pigeon
Ball-peen hammer
Steel arch bridge
Farmhouse
Soapweed
Brazilian rosewood
Bristlecone pine
Cliffdiving
Crabapple
Grampus griseus
American black bear
King penguin
Cormorant
Spigot
Diskette, floppy
(0.80)
(0.41)
(0.16)
(0.25)
(0.11)
(0.56)
(0.26)
(0.06)
(0.07)
(0.06)
(0.16)
(0.03)
(0.12)
(0.13)
(0.04)
(0.19)
Bear
Dog
Building
House
Bird
Penguin
Tree
Palm tree
SBU Captioned Photo Dataset1 million captioned images!
𝑓𝑠𝑣𝑚 𝒗𝒊, 𝐼, Θ =1
1 − exp(𝑎Θ𝑇𝑋 + 𝑏)
training from weak annotations
学習
snag
shade tree
bracket fungus, shelf fungus
bristlecone pine, Rocky Mountain bristlecone
pine, Pinus aristata
Brazilian rosewood, caviuna wood, jacaranda,
Dalbergia nigra
redheaded woodpecker, redhead, Melanerpes
erythrocephalus
redbud, Cercis canadensis
mangrove, Rhizophora mangle
chiton, coat-of-mail shell, sea cradle,
polyplacophore
crab apple, crabapple
papaya, papaia, pawpaw, papaya tree, melon
tree, Carica papaya
frogmouth
Extracting Meaning from Data
Mammals Birds Instruments Structures Plants Other
“tree”を学習させた結果
water dog
surfing, surfboarding, surfriding
manatee, Trichechus manatus
punt
dip, plunge
cliff diving
fly-fishing
sockeye, sockeye salmon, red salmon,
blueback salmon, Oncorhynchus nerka
sea otter, Enhydra lutris
American coot, marsh hen, mud hen, water
hen, Fulica americana
booby
canal boat, narrow boat, narrowboat
Extracting Meaning from Data
Mammals Birds Instruments Structures Plants Other
“water”を学習させた結果
2.3 Joint
2.1 Propagated Visual Estimatesと
2.2 Supervised Learning
の統合
𝑓𝑗𝑜𝑖𝑛𝑡 𝑣, 𝛼 = 𝛼 𝑓𝑛𝑎𝑡 𝑣 + 𝑓𝑠𝑣𝑚 𝑣
Results: Content Naming
Human Labels Flat ClassifierDeng et al.CVPR’12
PropagatedVisual Estimates
SupervisedLearning
Joint
farm, fencefieldhorse, mulekite, dirtpeopletree, zoo
geldingyearlingshireyearlingdraft
horse
equineperissodactylungulatemale
horsetree
equinemalegelding
horse
pasturefield
cowfence
horse
pasturefield
cowfence
Results: Content Naming
Human Labels Flat ClassifierDeng et al.CVPR’12
PropagatedVisual Estimates
SupervisedLearning
Joint
fence, junksignstop signstreet signtrash cantree
feederHylacleanerboxlarge
woodytree
structureplantvascular
tree
structurebuildingplantarea
logostreetneighborhoobuildingoffice building
logostreetneighborhoodbuildingoffice
Evaluation: Content Naming
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
20%
22%
24%
26%
FlatClassifier
Deng et al.CVPR'12
PropagatedVisual
Estimates
SupervisedLearning
Combined
Test Set A – Random Images
Precision Recall
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
20%
22%
24%
26%
FlatClassifier
Deng et al.CVPR'12
PropagatedVisual
Estimates
SupervisedLearning
Combined
Test Set B – High Confidence Prediction Scores
Precision Recall
Conclusions/Future Work
画像へ名前付けを行うための新しい方法を検討
画像に対し、人間のような名前付けを行うための方法を示した
名詞以外にもアクションや属性の推定などにも今後拡張していきたい
Questions?