大腸内視鏡検査における大腸癌認識システム

I�4¢±tq��I��¦¨��"

�l^¾VOIK¿

TP�, Bisser Raytchev, °�Bh, N6CF?�a,, �+/xand many others

http://www.sciencekids.co.nz/pictures/humanbody/braintomography.html

http://www.sciencekids.co.nz/pictures/humanbody/heartsurfaceanatomy.html

http://sozai.rash.jp/medical/p/000154.html


http://medical.toykikaku.com/ħ�źŬŧƂ/ć/

http://www.sciencekids.co.nz/pictures/humanbody/humanorgans.html

|ĉõ

•|ĉCTÔ�

ßÏõ

•ü%ņ

•Û%ņ

Zćõ

•ZćNBI%Ėľ



http://medical.toykikaku.com/ħ�źŬŧƂ/ć/

��ŵźZćÚ

• ÿ~ā: 235,000E (q�21r<Ñ4�Þġſ)†– Őőāƀŷÿ~ÍŻŐŠ

• ²�ā: 42,434E (D�)†– 20r1Ŷ·ſƾ1.7�U3– Ú²Lźî3� (î1�: ĂÚƾî2�: ăÚ)– 7rĲúŵƾ^}źÚ²Lî1�

• 5rÑ`Í: ��űŶ20%��

11† http://www.mhlw.go.jp/toukei/saikin/‡http://www.gunma-cc.jp/sarukihan/seizonritu/index.html

0

20

40

60

80

100

stage1 stage2 stage3 stage4

ZćÚ²�ā�r°�é†

ZćÚ5rÑ`ÍƼƛơƻƚ*Ŏƽ‡su

rviv

al ra

te [

%]

��

stage 1: ZćV%ŹÕƁƊÚstage 2: ZćVƌĥŧÚstage 3: ƵƹƨòĨéŬŰÚstage 4: ĉJžĨéŬŰÚ

stage 1 (��Ú)Ÿƈżƾ100%Ĭŧc¹AĄ

0

10,000

20,000

30,000

40,000

50,000

'90 '91 '92 '93 '94 '95 '96 '97 '98 '99 '00 '01 '02 '03 '04 '05 '06 '07 '08 '09

fatalitiesofcolorectalcancer

year

Ta��U,><`�¤ �o u

http://www.mhlw.go.jp/toukei/saikin/hw/jinkou/geppo/nengai11/kekka03.html#k3_2

N8 �}�ÑÊźŸĹ�/²�ÍƼ�@10�hƽźr°�é

http://ameblo.jp/gomora16610/entry-10839715830.html

�ÄĎ©¥ŵ��ÚťĔųŤƊŤǄ

ÄĎ©¥Ż��Úź9*ƌƂųŨƊŪŶŻŵŦƁůƍŜ

Ş¶rś�ÄĎ©¥ƌ?ŨŴŠŴń}Ÿźŵb|ŬŴŠŰŜŅ}ŹŸŲŴŤƈ©¥ŬŰƈĳďÚűŲŰşŶŠš�ƅŠƁŭŜ

http://daichou.com/ben.htm

Zć%Ėľ©¥

• íŹCCDƌ�ųƛƖƻƬƌ�ĹŤƈ�"Ŭƾć%ƌ©¥

• Zć%Ėľ©¥ƛơƠƬ

�Í100��Ź�ZŬŰćVđŉź¬ıŤƈ½ĵsƌ�¿

I think this is a cancer…

1(Ð

%Ėľ©¥

ĆØź`Oåğƾ�v

¹Ù�Ľź:ģ�vūƋŰĆØŤƈĕ®Ɔ½ĵsƌ�d

:5�Á�>��z{;

http://www.ajinomoto-seiyaku.co.jp/newsrelease/2004/1217.html

I�4¢±tq L¶ƇųżƓƵƥƠƓƫƸƔhttp://yotsuba-clinic.jp/WordPress/?p=63

áĮƓƵƥƠƓhttp://www.oiya-clinic.jp/inform3.html

ǁ*ǂǀè#Zć%Ėľ�"Æ×ƺ¸»º https://www.youtube.com/watch?v=40L-y9rNOzw

Capture ~Setup~

17

NBI内視鏡

処理用PC

スコープ

光源(NBI)

ビデオプロセッサ

レコーダースコープ接続口

įpź%Ėľ �Z%ĖľƼ70��100�ƽ

©S 4¢±¾¸eI¿

Æ¤ċÛċ!

ƐƹƚƗƒƶưƹ¤ċÛċ!

ƓƵƛƞƶƧƐƑƷƠƣ¤ċÛċ!

Æ¤ċNBI

eI4¢±¾75�1000¿

Æ¤ċÛċ!

ƐƹƚƗƒƶưƹ¤ċÛċ!

ƓƵƛƞƶƧƐƑƷƠƣ¤ċÛċ!

Æ¤ċNBI

NBI(Narrow Band ImagingÁ�RE3£M¿

ÓlŇþň,ŞË³!ŹƇƊ%ĖľƎƣƴƛ–NBIƺAFIƺIRIĞ�ź�1ý–ş,î1È10,��ƲƢƏƒƶƝƹƞƻ,2006.

R

B

G

415nm

540nm

Color Transform

NBIfilter

Xenon lampRGBrotaryfilter

mucosal

CCD

Monitor

LightsourceunitVideoprocessor

ON

OFF

Normallight

Normallight

NBI

NBI

ƭƳƔƸƩƹźG=Ë}

http://www.olympus.co.jp/jp/technology/technology/luceraelite/

�ÄĎ©¥

øĴėg %ĖľÜ+Ń

ZćÚ©¥

8oŁŵź:ģ

%Ėľ©¥

-�©¥

http:/ /cancernavi.nikkeibp.co.jp/daicho/worry/post_2.html

ü ¹ÙŻůŮƾ©¥źƂ

ü ÖWĹź�v

ü ©¥ù¢ƌSŹĒ�ź8oŵ:ģ

ü ÖW�ŝŹhŬŴƾ¹Ù�Ľƌ©ĚIthinkthisisacancer…

'©¥

~āžźĠ�wƾ¹ÙƛơƠƬž

or

or

�Ñ©ŹųŠŴŻƾĆØźĳďƺĨéƌ�ŭŰƄƾ«2ĸŨƈƋƊ

or or

¹Ù�µ

Xç�Đ

/ź©¥ƼÑ©Ÿŷƽž

�ÄĎ©¥

øĴėg %ĖľÜ+Ń

%Ėľ©¥

'©¥


or

or


or or

¹Ù�µ

Xç�Đ


äìÝÜ

-�©¥



ü ÖWĹź�v

8oŁŵź:ģ


ü ÖW�ŝŹhŬŴƾ¹Ù�Ľƌ©Ě Ñ©

ğĢIthinkthisisacancer…

©)

ğĢ

ƪƷƻƱķ�

Ğ��ƌÝÜŶŬŰƵƎƶƞƐƱğĢƙƛơƱź��

Ø 8oźėÜ.�Ź�`Ø iĿ8ź�Ħ

�ÄĎ©¥

øĴėg %ĖľÜ+Ń

'©¥


or

or


or or

¹Ù�µ

Xç�Đ

łĲäì

8oŁŵź:ģ

%Ėľ©¥

-�©¥



ü ÖWĹź�v


ü ÖW�ŝŹhŬŴƾ¹Ù�Ľƌ©ĚIthinkthisisacancer…


ğĢ

©) Ñ©ğĢ

ƪƷƻƱķ�

Oh,MIA,‘07

Sundaram etal.,MIA,‘08

Diaz&Rao ,PRL,‘07

Al-Kadi,PR,‘10

Gunduz-Demir etal.,MIA,‘10

Tosun,PR,‘09

Pit-Pattern*ŎǃHäfner etal.,PAA,‘09

Häfner, ICPR,‘10

Häfner,PR,‘09

Kwitt &Uhl,ICCV,‘07

Tischendrof etal.,Endoscopy, ‘10

NBI�Z�Ĕ*ŎǃStehle,MI,‘09

Gross,MI,‘08

CÉÓƈ,PRMU,‘10

Tamakietal.,ACCV,‘10

pit-pattern7»• I��¹ ²>®��¾pit¿ [��7»

– �G®�p�|�A�-�Àpit[��Y§– }¬X� ´._ ½��W��L¥~�

29mĹ¶ź¤ċŻ�ŁƌēŬƾ~āžźĤčťZŦŠ

mÚ smÚ ĳďÚ

±p � ª&uź±pĈñpit

�ČÌpit

�QƇƉƅjūŠñÌǅª&upit

�QƇƉƅZŦŠñÌǅª&upit

Âöƾ¯£ƾąKĨÌpit

�

�

�

�

�

�

S

L

I

N

�ƾ� ƾ� ƾ�Qpitźŀ@ĹZŦūƆĺ,ź��Ÿpit

pitť¾kǅ¼]Æ¬ı�Ĕź)Îƌ�šÌ�

S L

pit-pattern*Ŏ [S.Tanaka etal.,‘06]

NBIeIb¡7» (NBI: Narrow-band Imaging)

• pit [�À=��vª��7»– �RE 3�Z��ºE�Y§��– }¬X� ´«_�L¥~�

TypeA

TypeB

TypeC

1

2

3

zjĎñŻ�AĖ

ĈñźHMƌ>ƉMƃ÷ŠzjĎñƌğƄƾŁ�ÜŹ�àŵ�Ÿpit¬ıťėgūƋƊ

��ŸûÝ®ƌ¬�ƾŁ�ÜŹ��Ÿpit¬ıťėgAĄƿĎñź[ū/*nť·ĩÜP�ƿ

��ŸûÝ®ƌ¬�ƾŁ�ÜŹ��źtŠpit¬ıťėgAĄƿĎñź[ū/*nť�P�ƿ

��pit¬ıƅ��àŵƾėg�Ąƿ��Ďñź[ū/*nŻ�P�ŵ��ƿÆĎñŋR(AVA)ź)Îƿ�Ç7ŬŰzjĎñť�OŭƊƿ

NBI�Z�Ĕ*Ŏ [H.Kanao et al., ‘09]

smÚǅĳďÚ

ĈĆǅĳďÚ

±p

texture analysis approach

Yoshito Takemura,Shigeto Yoshida,ShinjiTanaka,KeiichiOnji,Shiro Oka,ToruTamaki,Kazufumi Kaneda,Masaharu Yoshihara,KazuakiChayama:

"Quantitativeanalysisanddevelopmentofacomputer-aidedsystemforidentificationofregularpitpatternsofcolorectallesions,"Gastrointestinal

Endoscopy,Vol.72,No.5,pp.1047-1051(201011).

Bag-of-Visual Words Approach

Type A Type B Type C3

12, 55, 63, …87, 49, 21, …ǃ

32, 20, 73, …67, 6, 0, …ǃ

79, 5, 40, …11, 36, 87, …ǃ

27, 64, 25, …, 8793, 41, 75, …, 8

…

12, 55, 63, …87, 49, 21, …ǃ

32, 20, 73, …67, 6, 0, …ǃ

79, 5, 40, …11, 36, 87, …ǃ

65, 33, 19, …, 10152, 51, 32, …, 89

…

12, 55, 63, …87, 49, 21, …ǃ

32, 20, 73, …67, 6, 0, …ǃ

79, 5, 40, …11, 36, 87, …ǃ

66, 95, 47, …, 8511, 82, 3, …, 124

…

ŕ

Ŗ

ŗ

Ř

ř

ŕ Ŗ ŗŘř

Type A

ŕ Ŗ ŗŘř

Type B

ŕ Ŗ ŗŘř

Type C3

84, 99, 40, …, 1215, 26, 91, …, 150

…

ŕ Ŗ ŗŘř

Vector quantizationVector quantization

Feature space

Classifier

Histogram

Test image

Learning

Classification result

Description of Local features

m�Ë{ļ + Bag-of-features

Object Bag of œwordsŔ

Slide by Li Fei-Fei at CVPR2007 Tutorial http://people.csail.mit.edu/torralba/shortCourseRLOC/

Analogy to documentsOf all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.

sensory, brain, visual, perception,

retinal, cerebral cortex,eye, cell, optical

nerve, imageHubel, Wiesel

China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value.

China, trade, surplus, commerce,

exports, imports, US, yuan, bank, domestic,

foreign, increase, trade, value


ヒストグラムType C3Type BType A

Type A Type C3

学習画像Type B

特徴量:

Bag-of-Visual Words Approach病変部の画像パッチを分類[Tamaki et al., 2013]

・908枚のNBI画像(Type A: 359,Type B:462,Type C3:87)で学習・Type C1,Type C2は不明瞭な部分が多いため省かれている・最大認識率96%認識の流れ

特徴量抽出特徴量をクラスタリング，代表値をVisual Wordsとする

Visual Wordsヒストグラムを作成

認識画像

Type ?Type B


SVM学習認識

Visual Words

Bag-of-Visual Wordsの枠組み

m�Ë{ļ: gridSIFT• Scale Invariant Feature Transform (SIFT) [Lowe, ‘99]

– m�ŋRŹţŨƊĪs6ĺ�Tƌ128°�ƮƓƣƶŵđÎ– DoGŹƇƊË{Å©)ŹƇƊğĢÍŻ90[%]1w

ƔƷƻƛƕƻƶW� Ë{Å©)ƼDoGƽ Ë{ļěĭ

• grid samplingŹƇƊSIFTË{ļěĭ (gridSIFT)– ğĢŹţŠŴƾfŹƘƹƬƵƹƔŭƊŶ}ĄťF�– ¦_ÌŹƘƹƬƵƹƔŬƾSIFTË{ļƌěĭ

grid sampling

grid space

scale size

Ģ/J: Support Vector Machine (SVM)

• ƒƻƦƶƞƐƬ– Radial basis function (RBF)

– linear

– χ2

• YƓƴƛ*Ŏ : One-Versus-One

vuvu =),(lineark

)exp(),( 2vuvu =RBFk

( )+

=vuvuvu2

2 2exp),(k

Ưƻƚƹ�Z7źSÁŵƨƴƲƻƞƌaĀŭƊ2ƓƴƛĢ/�º

2

1max ww

subject to yiw (xi ) 121 w

21 w

�ÒÔ�

• ƴƐơƏƹƔƺ�vĘsƺ�ÍŻDês• %Ėľź�Ô�Ťƈ$QÜŸTypeƌƣƵưƹƔ• Ô�ƘƐƜ: 100Œ300�900Œ800[pix.]• iĿ82E��ŹƇŲŴƴƮƶ�Ũ• �1ng�907n

(Type A: 359, Type B: 462, Type C3: 87)

Type A:

Type B:

Type C3:

< �ÒÔ� >

Results <10-fold Cross Validation>

60

65

70

75

80

85

90

95

100

10 100 1000 10000 100000

CorrectRate[%]

#ofvisual-words[-]

CorrectRate

96.00%

0

10

20

30

40

50

60

70

80

90

100

10 100 1000 10000 100000

RecallRate[%]

#ofvisual-words[-]

RecallRate

TypeA

TypeB

TypeC3

0

10

20

30

40

50

60

70

80

90

100

10 100 1000 10000 100000

PrecisionRate[%]

#ofvisual-words[-]

PrecisionRate

TypeA

TypeB

TypeC3

Results <Holdout Testing>

60

65

70

75

80

85

90

95

100

10 100 1000 10000 100000

CorrectRate[%]

#ofvisual-words[-]

CorrectRate

0

10

20

30

40

50

60

70

80

90

100

10 100 1000 10000 100000

RecallRate[%]

#ofvisual-words[-]

RecallRate

TypeA

TypeB

TypeC3

0

10

20

30

40

50

60

70

80

90

100

10 100 1000 10000 100000

PrecisionRate[%]

#ofvisual-words[-]

PrecisionRate

TypeA

TypeB

TypeC3

92.86%

MOTIVATION

• Ô�ğĢźôsF�� Zĕ®ŸaĀƢƻƞƝƠƣ

YëYŸ�âźNBIÔ�ƌğĢŭƊŰƄźZĕ®ŸƢƻƞƮƻƛź¬ó

� NBIÔ�ƢƻƞƝƠƣź�Ħ

� ZļźÔ�ŹƴƮƶ�Ũ× Ɩƛƣ× �Ł× iĿâĢ

AB C3

ABSTRACT

• Self-trainingn ƴƮƶźŸŠŋRŹhŬŴƴƮƶ�Ũ

n ğĢÍƌ�ũƊaĀƢƻƞƝƠƣƌ¬ó

x �º [Yoshimuta et al., ‘10] �§�º

Key Idea : ƴƮƶźŸŠŋR�TƌÒŠƊ

Self-training

• ƴƮƶŸŬƢƻƞƌÒŠŴ}ĄƌF�• aĀƢƻƞƌĊ5Ñ�

'aĀ

-�aĀ

ğĢ

ƴƮƶ�ŦƢƻƞ

ƴƮƶŸŬƢƻƞ

�Ō}©¥Accept

Reject

POINT

1. ƴƮƶź�Ţ�

2. Į3ŭƊƘƹƬƶźķŽ�

labeled samples

• iĿ8ŹƇŲŴƣƵưƹƔƺƴƮƵƹƔ• Ô�ƘƐƜǃ100Œ300ǅ900Œ800 [pix.]• Ô�¡�ź%ĝ

TypeA TypeB TypeC3 Total

359 462 87 908

AB C3

Unlabeled samples

• B�Ô�Ťƈ10¡Ůų+Ɖ)Ŭ• Ô�ƘƐƜǃ30Œ30ǅ250Œ250 [pix.]• +Ɖ)Ŭ�

– �Ô�ŤƈƴƹƟƱŹ+Ɖ)Ŭ– ƴƮƶ�ŦŋRźHīŤƈ+Ɖ)Ŭ

•Ô�¡�ź%ĝ

*ƴƮƶ�ŦƘƹƬƶź%�ųűŨ�Ô�ť`OŬŸŠŰƄśƴƮƶŸŬƘƹƬƶť10¡kŸŠ

TypeA TypeB TypeC3 Total

3590 4610 870 9070

+Ɖ)Ŭ�

Result

0.9

0.91

0.92

0.93

0.94

0.95

0.96

x �º Algorithm1 Algorithm2 Algorithm3

RecognitionRate

ƴƮƶ�ŦŋRHīƌ+Ɖ)ŬŰƘƹƬƶƌ�Ò

* p=0.013314

ヒストグラムType C3Type BType A

Type A Type C3

学習画像Type B

特徴量:

2��

病変部の画像パッチを分類[Tamaki et al., 2013]

・908枚のNBI画像(Type A: 359,Type B:462,Type C3:87)で学習・Type C1,Type C2は不明瞭な部分が多いため省かれている・最大認識率96%認識の流れ

特徴量抽出特徴量をクラスタリング，代表値をVisual Wordsとする


認識画像

Type ?Type B


SVM学習認識

Visual Words

Bag-of-Visual Wordsの枠組み

格子間隔 15[pixel] 10[pixel] 5[pixel]最高認識率 92.11[%] 93.89[%] 96.00[%]学習時間約13分約30分約3時間

2/3 1/2

+1.78% +2.11%

特徴量数の増加による学習時間の増加が問題

H¯ �]¯d6

格子間隔

Ø 抽出する特徴量数を増やすと認識率は向上する[Jurie et al., 2005]Ø 特徴量抽出の間隔(格子間隔)を狭くして認識率向上を確認[吉牟田ら,2011]

特徴量数:2.25倍

特徴量数:4倍

格子間隔:2/3

格子間隔:1/2

学習画像:NBI画像908枚(Type A: 359,Type B:462,Type C3:87)

\mcy

特徴空間ヒストグラム

1. 全ての学習画像から(格子状に)特徴量を抽出する

2. 抽出した特徴量をクラスタリングする

3. 学習画像1枚から(格子状に)特徴量を抽出する

4.特徴量をベクトル量子化してVisual Wordsヒストグラムを求める

格子間隔

全学習画像I = {In | 1, . . . , N}

学習画像In 2 I

Visual Words

fscy(k³9�&¦¨�@*)

特徴空間ヒストグラム

1. 全ての学習画像から少量の特徴量を抽出する

3. 学習画像1枚から(格子状に)多くの特徴量を抽出する

4. 特徴量をベクトル量子化してVisual Wordsヒストグラムを求める

2. 少量の特徴量をクラスタリングする格子間隔

全学習画像I = {In | 1, . . . , N}

学習画像In 2 I

Visual Words

学習時間の削減と認識率の向上を確認するVisual Words作成

ヒストグラム作成

特徴量数:削減

¦¨L¼

実行環境OS:Linux Fedora 18CPU:Intel Xeon CPU E-5 2620Memory:128GB

識別器Ø Linear SVM

学習画像Ø ラベルありNBI画像908枚(Type A: 359,Type B:462,Type C3:87)

Visual Wordsを作成する特徴量数を減らす

特徴量数:増加

Visual Words作成に使用する特徴量数格子間隔5[pixel],2[pixel],1[pixel]19,742個8,678,198個

ヒストグラムを作成する特徴量数を増やす

Ø 特徴量数 vs学習時間合計

Ø 格子間隔 vs認識率

学習時間の削減を確認する

認識率の向上を確認する

�]¯g�K�k³

10233.45

680.72

4167.89

16471.8

0 2000 4000 6000 8000 10000 12000 14000 16000 18000

従来手法(格子間隔:5)

提案手法(格子間隔:5)



CPU時間[sec]

6.6%

40.7%

160.9%

格子間隔:5[pixel],2[pixel]の時，学習時間が削減できている格子間隔:1[pixel]の時，学習時間が増えている

Visual Words数:32

rJ³µ�¦¨�

0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96 0.98

32 1024 4096 16384

Co

rrec

t R

ate

Visual Words数従来手法（格子間隔:5）提案手法（格子間隔:5）提案手法（格子間隔:2）提案手法（格子間隔:1）

格子間隔:5[pixel]と格子間隔:2[pixel],1[pixel]には差がある格子間隔:2[pixel]と格子間隔:1[pixel]には大きな差がない

Problem

58

光学系が異なる撮影画像が異なる

特徴量分布が異なる

旧内視鏡と新内視鏡が混在している

Old endoscopy(EVIS LUCERA)

New endoscopy(EVIS LUCERA

ELITE)Viewing angle

140°(WIDE)，80°(TELE) 170°(WIDE)，90°(TELE)

Resolution 1440*1080 1980*1080 Old endoscopy New endoscopy

Ø 新内視鏡が広角・高解像度で明るい

Old endoscopy New endoscopy

新内視鏡での認識性能の低下学習画像を新旧同時に使えない

新内視鏡の学習画像を収集するのは困難Ø 認識と学習は分布が同じことが前提

Ø がん患者は多くないØ 検査時しか撮影できないØ ラベル付けは医師しかできない

Ø 最新のデバイスが登場し，過渡期にある

jD(iD �

x �Q

http://www.olympus.co.jp/jp/technology/technology/luceraelite/

Objective

60

Solution:新内視鏡の特徴量を旧内視鏡の特徴量に変換し，学習する

Framework of Transfer Learning2つの画像は関連がある

5

10

New endoscopyOld endoscopy

5

10

特徴量を変換する

学習：旧内視鏡認識：旧内視鏡

学習：旧内視鏡認識：新内視鏡

認識率低下

Related Work

61

Adapting Visual Category Models to New Domains[Saenko et al., ECCV2010]

SourceとTargetの同時認識をする問題

Source:x Target:y

Targetのみを認識する問題Ø 本手法はハイパーパラメータが存在しない

Ø この手法はハイパーパラメータが存在し，調整が必要

TargetをSourceに変換する行列Wを求める

Our Approach

Source

Target

W

Source-Target間の条件を満たす行列を求めるA

SourceTarget

+Target

For each class

(xi � yj)TA(xi � yj) upper bound

(xi � yj)TA(xi � yj) � lower bound

Same class:Different class: A1/2 A1/2

! arg minW

kx�Wyk2F

W

y1

Convert Histogram

62

ynyN

x1xn

xN

Source Target

1. Visual Wordsヒストグラムをベクトルとして扱い，行列とする

2. ヒストグラム同士の誤差を最小にする変換行列WをADMM*で求める

*ADMMによる解法 (For each row n=1, …, N)

arg minW

PNn=1 ||xn �W nyn||22

+ 12 ||W n � zn + un||22

+PN

n=1(zkn � uk

n))

手順

以下の双対問題を手順を繰り返すことで解く

Y =�y1, · · · ,yN

�X =

�x1, · · · ,xN

�

Subject to. W ij � 0

arg minW

kX �WY k2F

zk+1n = ⇡c(W

k+1n + uk

n)

uk+1n = uk

n +W k+1n � zk+1

n

W k+1n = (

PNn=1 yny

Tn +E)�1(

PNn=1 yny

Tn

How to Make Pseudo Dataset

63

l 新内視鏡はくっきり，鮮やかに見えると思われるため

を適用する①コントラスト強調②先鋭化フィルタ

Source Target

Output

Input 0 255 42 213

0

255 19

19

19

19

19

19

19

19

259

コントラスト強調先鋭化フィルタ

l この手法は学習画像同士の対応がないと使えないØ 現実には対応のある画像を得るのは難しい

Result

64転移することで旧内視鏡と同等に認識率を得た

Almost same

Training Testn ŕ Source Sourcen Ŗ Source Targetn ŗ Source+Target Targetn Ř Source

+W�ŬŰTargetTarget

①④

②

③

Related Works

65

Cross-Domain Transform[Saenko et al., ECCV2010]

Max-Margin Domain Transfer(MMDT)[Hoffman et al., ICLR2013]

min tr(W )� log detW

s.t. W ⌫ 0

kxsi � x

tjkw upper bound, (x

si ,x

tj) 2 the same class

kxsi � x

tjkw lowe rbound, (x

si ,x

tj) 2 di↵erent class

Ø Estimate transformation matrix which minimize Mahalanobis distance.Ø Consider in only transformed feature distributions.Ø Not ensure classification result.

minW ,✓,b

1

2kW k2F +

1

2

KX

k=1

k✓kk22 + Cs

nX

i=1

KX

k=1

⇠si,k + Ct

mX

j=1

KX

k=1

⇠tj,k

s.t. ysi,k✓Tk x

si � bk � 1� ⇠si.k

ytj,k✓Tk Wx

tj � 1� ⇠tj,k

⇠si,k � 0, ⇠tj,k � 0

Ø Optimize transformation matrix and SVM parameters at same time.Ø Ensure classification result.Ø Not guarantee transformed feature distributions.

W : Transform matrix

✓k : SVM parameter ⇠s, ⇠t : Slack variable

yi,k : Indicator function

Propose Method

66

minW ,✓,b

1

2kW k2F +

1

2

KX

k=1

k✓kk22 + Cs

nX

i=1

KX

k=1

⇠si,k

s.t. ysi,k✓Tk x

si � bk � 1� ⇠si.k

ytj,k✓Tk Wx

tj � 1� ⇠tj,k

⇠si,k � 0, ⇠tj,k � 0

Constraint of close transformed target to source.

+Ct

mX

j=1

KX

k=1

⇠tj,k +1

2D

MX

i=1

NX

j=1

yi,jk(Wx

ti � x

sj)k22

Ø Add L2 distance constraints to MMDT.Ø Our method ensures classification result

and transformed feature distributions.

Max-Margin Domain Transfer with L2 Distance Constraints(MMDTL2)

Decompose to Sub-problem

67

Hoffman et al. decompose objective function to 2 sub-problem in MMDT.Our method as well decomposes objective functions in below.Objective function optimize by iterate (1) and (2).

min✓,⇠s,⇠t

1

2

KX

k=1

k✓kk22 + Cs

NX

i=1

KX

k=1

⇠si,k + Ct

MX

j=1

KX

k=1

⇠tj,k(1)

Constraint of close transformed target to source.

(2) minW ,⇠t

1

2kW k2F + Ct

MX

j=1

KX

k=1

⇠tj,k +1

2D

MX

i=1

MX

j=1

yi,jkWx

ti � x

sjk22

Objective function for optimize SVM parameter.

Objective function for optimize transform matrix.

s.t. ysi,k✓Tk x

si � bk � 1� ⇠si.k

ytj,k✓Tk Wx

tj � 1� ⇠tj,k

⇠si,k � 0, ⇠tj,k � 0

Primal Problem

68

U(x) =

2

6664

xx

T

xx

T

. . .xx

T

3

7775vi,j = vec(xs

j(xti)

T )

w = vec(W )�(x) = vec(✓xT )

minw,⇠t

1

2kwk22 + Ct

MX

j=1

KX

k=1

⇠tj,k +1

2D

MX

i=1

MX

j=1

�w

TU(xt

i)w � 2vTijw + (xt

i)Tx

sj

�(2)

s.t. ⇠ti � 0

yti,k�Tk (x

ti)w � 1� ⇠ti,k

Derivate from objective function for optimize transform matrix.

This is standard quadratic programming but…p High computational costs.p Need to huge memory.p Depend on dimensions of data.

Derivate dual problem.

Dual Problem

69

s.t. 0 ai CT

MX

i=1

aiyti,k = 0

max

a�1

2

KX

k1=1

KX

k2=1

MX

i=1

MX

j=1

aiajyti,k1

ytj,k2�

Tk1(x

ti)V

�1�k2(x

tj)

+KX

k=1

MX

i=1

ai

1�D

�

Tk (x

ti)V

�1MX

m=1

NX

n=1

ym,nvi,j

!!(2)

p Low computational cost.p Defined by sparse problem.p Depend on number of target data.

ai : Lagrange multiplier

Dual problem has many advantages.

V =

0

@I +D

MX

i=1

NX

j=1

yi,jU(xti)

1

A

Comparison Primal with Dual of Computation Time

70

SetupTime: computation time for coefficients(e.g. and ). OptimizationTime: optimization time for solving quadratic programmingCalculationTime: computation time in from (dual only).

U(x) vi,j

w a

3riPDO DuDO0

1000

2000

3000

4000

5000

6000

7000

CoP

Su

tDti

on

7iP

e

6etuS7iPe

2StiPizDtion7iPe

CDOcuODtion7iPe

Visual Words:128

About 14 times faster

Result

71

MMDTL2 achieve good performance as equivalent with baseline.

But Not transfer is the best performance.

8 16 32 64 128 256 512 1024# Rf 9LVuDl WRrdV

0.4

0.5

0.6

0.7

0.8

0.9

1.05

ecR

gn

LWLR

n r

DWe

BDVelLne

6Rurce Rnly

1RW WrDnVfer

00D7

00D7L2

(Ðİs:14.7[fps]

AB

C3

ğĢƙƛơƱ

Ô�>y

• Ô�ź�\Ĺ*+Ɖ)Ŭ

Ģ/

Visual Word Histogram��

• Ģ/ù¢(A or B or C3)• A, B, C3źåÍ

đæ

• Ģ/J: SVM

ƵƎƶƞƐƱğĢƙƛơƱ

22 6 … 91 87 …

Ë{ļęð

• ƔƵƠƤƘƹƬƵƹƔ• SIFT

120[pix.]

120[

pix

.]

Ģ/ù¢

AźåÍBźåÍC3źåÍ

eŏĜd

73time

prob

abilit

y

0

1

AźåÍBźåÍ

C3źåÍ

åÍŹŬŦŠ�ƌĜd

ŬŦŠ�ŹÀŰŸŨƋż¨;

Objective

74

処理用PCをNBI内視鏡と接続し，オンラインでの認識を可能とする

システム構成

*2 http://www.genkosha.com/vs/news/entry/sdi.html*1 http://www.olympus.co.jp/jp/news/2012b/nr121002luceraj.jsp

開発環境Visual Studio 2012(製品版)，OpenCV 3.0-devel，VLFeat 0.9.18, Boost 1.55.0，DeckLink SDK 10.0

OS:Windows 7 Home Premium SP1 64bitCPU:Intel Core i7-4470 3.40GHzMemory:16GB

OLYMPUS製EVIS LUCERA ELITE

Blackmagic製DeckLink SDI

NBI内視鏡*1

キャプチャボード*2 処理用PC

SDI PCI Express

Capture ~Setup~

75

NBI内視鏡

処理用PC

NBIスコープ

Capture ~demo & performance~

76

NBI内視鏡の画面

処理用PCの画面

色変換処理特徴量抽出

eŏĜd

77time

prob

abilit

y

0

1

AźåÍBźåÍ

C3źåÍ

åÍŹŬŦŠ�ƌĜd

ŬŦŠ�ŹÀŰŸŨƋż¨;

IōÅ [ƵƎƶƞƐƱğĢƙƛơƱ]

0

0.5

1

0 50 100 150 200

Prob

abilit

y

フレーム番号

Type A

Type B

• ƵƎƶƞƐƱğĢƙƛơƱź)2ù¢

• BƪƷƻƱźĢ/ūƋŰƴƮƶ

Type AType BType C3

bdŬŰğĢù¢ƌyƊŪŶťŵŦŸŠ

MRF/HMMƳƢƶ

f x y( )∝ exp A xi, yi( )i∑#

$%

&

'(⋅exp I xi, x j( )

j∈Ni

∑#

$%%

&

'((

ƢƻƞŊ qÃ7Ŋ

x: BƪƷƻƱŹţŨƊ�dŭſŦƴƮƶõ,y: SVMƇƉ�dūƋƊÔ�źË{ļõ,

x1 x50………… x100 x150 x200………… ………… …………B B BC3 C3

i0 50 100 200150

…… …… …… ……

y1 y50 y100 y150 y200

ĶÒù¢

Type B (original)

frame number0 20 40 60 80 100 120 140 160 180 200

Type B (DP_0.8)

frame number0 20 40 60 80 100 120 140 160 180 200

Type B (DP_0.9)

frame number0 20 40 60 80 100 120 140 160 180 200

Type B (DP_0.99)

frame number0 20 40 60 80 100 120 140 160 180 200

Type B (DP_0.999)

frame number0 20 40 60 80 100 120 140 160 180 200

Type B (original)

frame number0 20 40 60 80 100 120 140 160 180 200

Type B (DP_0.8)

frame number0 20 40 60 80 100 120 140 160 180 200

Type B (DP_0.9)

frame number0 20 40 60 80 100 120 140 160 180 200

Type B (DP_0.99)

frame number0 20 40 60 80 100 120 140 160 180 200

Type B (DP_0.999)

frame number0 20 40 60 80 100 120 140 160 180 200

Type B (original)

frame number0 20 40 60 80 100 120 140 160 180 200

Type B (Gibbs_p4=0.6)

frame number0 20 40 60 80 100 120 140 160 180 200


frame number0 20 40 60 80 100 120 140 160 180 200


frame number0 20 40 60 80 100 120 140 160 180 200


frame number0 20 40 60 80 100 120 140 160 180 200

0

0.5

1

BAC

20 40 60 80 100 120 140 160 180 200

Type B��

0

0.5

1

ABC

20 40 60 80 100 120 140 160 180 200

Type A_1 (original)

frame number0 20 40 60 80 100 120 140 160 180 200

Type A_1 (DP_0.99)

frame number0 20 40 60 80 100 120 140 160 180 200

Type A_1 (Gibbs_p4=0.9)

frame number0 20 40 60 80 100 120 140 160 180 200

Type Aź5ÔŹhŭƊĶÒù¢


MAP�dƌĶÒƾƴƮƶƌđæŬŰƅź

ƴƮƵƹƔù¢ (C3ƌ´ŭƇšŸĻƂ�Ũ)

ƴƮƵƹƔù¢ (qïŸĻƂ�Ũ)

5Ô��\Ĺ*ƌ+Ɖ)ŬƾğĢƌďš

Ģ/ù¢AźåÍBźåÍC3źåÍ


MRFźĶÒù¢ƌãuźċŵđæ0

0.5

1

0 50 100 150 200

Probability

TypeA

TypeB

TypeC3

Type A Type B

Type A Type B


ColorectalTumorClassificationSystem

inMagnifyingEndoscopicNBIImages

[Tamakietal.,MedIA2013]Recognizingcolorectalimage

p Feature:Bag-of-Visual-Words

ofdenselysampledSIFT

p Classifier:LinearSVM

83

Extendedtovideoframes

Displayposteriorprobabilitiesateachframe.

0

0.5

1

251 271 291 311 331 351 371 391 411 431

Pro

babi

lity

Frame number

A

B

C0 20 40 60 80 120100 140 160 180 200

Highlyunstableclassificationresults

Possible Cause of Instability

84

p Classificationresultswouldbe

affectedbyoutoffocus.

number of visual words

Rec

ogni

tion

Rat

e [%

]

●

●●

●● ● ●

●● ● ● ●

● ● ●

●

●

●●

● ●

● ● ●

●

●no defoucsSD = 0.5SD = 1

SD = 2SD = 3SD = 5

SD = 7SD = 9SD = 11

10 100 1000 10000

0.0

0.2

0.4

0.6

0.8

1.0

p Testimage:1191

Ø TestimagesareaddedGaussianblurwithdifferentSD.

p Trainimage:480

Ø 160imagesforeachclass

SmallerSDLarger

Recognitionresultsforoutoffocusimages

Particle Filter (Online Bayesian Filtering)

85

Statevector:

Observationvector:

� t :time

p (xt | y1:t�1) =

Zp (xt | xt�1, ✓1) p (xt�1 | y1:t�1) dxt�1

Prediction

State transition

We use Dirichlet distribution for state transition and likelihood.

Update

Likelihood

p (xt | y1:t) / p (yt | xt, ✓2) p (xt | y1:t�1)

yt =⇣y(A)t , y(B)

t , y(C3)t

⌘, y(A)

t + y(B)t + y(C3)

t = 1

xt =⇣x

(A)t , x

(B)t , x

(C3)t

⌘, x

(A)t + x

(B)t + x

(C3)t

Dirichlet distribution

86

(0.50, 0.50, 0.50)

(0.85, 1.50, 2.00)

(1.00, 1.00, 1.00)

(1.00, 1.76, 2.35)

(4.00, 4.00 ,4.00)

(3.40, 6.00, 8.00)

low

high

Dirx

[↵] =�(

PNi=1 ↵i)QN

i=1 �(↵i)

NY

i=1

x

↵i�1i

parameterofdistribution:

↵ (x) = ax+ b

Problem & Our Approach

87

xt�1 xt xt+1

yt�1 yt yt+1zt+1zt�1 zt

�t �t+1�t�1

xt�1 xt+1xt

ytyt�1 yt+1

✓2

DirichletParticleFilter(DPF)Defocus-awareDirichletParticleFilter(D-DPF)

Prediction

p (xy | y1:t�1, �1:t�1, z1:t�1) =Zp (xt | xt�1, ✓1)p (xt�1 | y1:t�1, �1:t�1, z1:t�1)dxt�1

State transition

p (xt | y1:t, �1:t, z1:t) /p (yt, �t, zt | xt) p (xt | y1:t�1, �1:t�1, z1:t�1)

Update

Likelihoodp (yt, �t, zt | xt) = p (yt,xt, �t) p (zt | �t)

Isolated Pixel Ratio (IPR) [Oh et al., MedIA2007]

88

Endoscopicimage EdgespixelsbyCannyedgedetector

Clearedge Defocusedge

Edgepixel

Non-edgepixel

Edgeandisolatedpixel

IPR:thepercentageofisolatedpixelineveryedgepixels

Isolated pixel value (IPR)

frequ

ency

0.00

0.02

0.04

0.06

0.08

0.10

0 0.005 0.01 0.015

γt

Den

sity

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

1.2

sigma = 0.5sigma = 1sigma = 2sigma = 3sigma = 4

Dirichlet distribution�

Modeling with Rayleigh dist. and IPR

89

Ray

x

[�] =

x

�

2exp

✓� x

2

2�

2

◆

Defocus Clear

γt

0.000 0.005 0.010 0.015

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

zt

σ(z t)

●

●

� (zt) = 4 exp(100 log(0.25)zt)

p (zt | �t) = Ray�t[� (zt)]

The performance for defocus frames

91

0 100 200 300 400 500 6000.0

0.5

1.0

0 100 200 300 400 500 600

0.0

0.5

1.0

0 100 200 300 400 500 600

0.000

0.005

0.010

0 100 200 300 400 500 600

0.0

0.5

1.0

0 100 200 300 400 500 600

0.0

0.5

1.0

Framenumber

Groundtruth

Observation

IPR

ResultbyDPF

Resultby

D-DPF

Smoothing result for an actual NBI video

92

Nosmoothingresult

Smoothingresult

Ś TypeA

Ś TypeB

Ś TypeC3

Summary

• I�NBI4¢±�1 ¦¨• Baseline: SIFT + Bag-of-Visual Words• $�%��" ¦¨��"v�• ·w�1

– self-training– sampling– domain adaptation / transfer learning

• <�1/k�85�– MRF/HMM��k�8#!$'�– ��$�&7Q��)��% �%�

大腸内視鏡検査における大腸癌認識システム

Technology