deep learning vs multidimensional classification in human-guided text mining

Post on 04-Aug-2015

336 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Deep Learning

2015/07/04

Marat Zhanikeev maratishe@gmail.com GI研@天神イムズ

PDF: http://bit.do/150704

in Human-Guided

Text Mining vs

Multidimensional Classification

.

Deep Learning vs MD Classifiers

• Deep Learning 08 10

◦ Feature-based: image → features → NN◦ Raw/Pixels : image → raw pixels → NN

• Multi-Dimentional Classification 04 05

◦ assigning classes to items in multiple dimensions• Human-Guided Text Mining 02

◦ Folksonomy + BigData◦ learning from empty state with gradually diminishing human feedback

08 A.Nguyen+2 "Deep Neural Networks are Easily Fooled..." IEEE CVPR (2015)

10 G.Goos+2 "Neural Networks: Tricks of the Trade" Springer LNCS vol.7700, 2nd edition (2012)

04 X.Zhu+1 "Introduction to Semi-Supervised Learning" Morgan and Claypool Publishers (2009)

05 D.Koller+1 "Probabilistic Graphical Models: Principles and Techniques" MIT Press (2009)

02 myself+0 "Multidimensional Classification Automation with Human Interface based on Metromaps" 4th AAI (2015)

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 2/26...

2/26

.

Deep Learning

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 3/26...

3/26

.

Deep Learning (1) Feature-Based• many feature extraction libraries, normally specific to environments/targets

• problem 1: wide range of errors, can be from 50% up to 96%

• problem 2: who decides on the features?

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 4/26...

4/26

.

Deep Learning (2) Raw Pixels• just feed the raw pixels to the Neural Network and let it sort it out for itself

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 5/26...

5/26

.

Deep Learning (3) Google Faces• a feature-based method, extremely specific, recently acquired by Google

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 6/26...

6/26

.

Deep Learning (4) Google Cats

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 7/26...

7/26

.

Deep Learning (5) Raw/Pixel Method• a standard process for a pixel-based learning 12

• CSV files are traditional, one image becomes one line

0 1 1 … 0

1 …

0 …

… …

1 …

Handwriting

Black -n-white

Pixel map

Matrix in a CSV file 3

Deep Learning

3

Training Testing

12 "MNIST Dataset of Handwritten Digits" http://yann.lecun.com/exdb/mnist/ (2015)

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 8/26...

8/26

.

Multi-Dimensional Classification

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 9/26...

9/26

.

MDC : Binary Relevance (BR) Classes

• single dimension• not practical today, when most things exists in multi-dimensional space 06

Training Tuples x1 x2 Y1 Y2 Y3

1 0.7 0.4 1 1 0 2 0.6 0.2 1 1 0 3 0.1 0.9 0 0 1 4 0.3 0.1 0 0 0

h1: X → Y1 h2: X → Y2 h3: X → Y3

06 J.Ortigosa-Hernandez+3 "A Semi-supervised Approach to Multi-dimensional Classification..." 6th TAMIDA (2010)

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 10/26...

10/26

.

MDC : PairWise (PW) Sets

• define classes as pairs of base BR classes 06

• lower complexity, higher error rate

Training Tuples x1 x2 Y1 Y2 Y3

1 0.7 0.4 1 1 0 2 0.6 0.2 1 1 0

0.1 0.9 0 0 1 0.3 0.1 0 0 0

h1: X → Z1 h2: X → Z2

Z1 Z2 1 0 0 1 0 0 0 0

06 J.Ortigosa-Hernandez+3 "A Semi-supervised Approach to Multi-dimensional Classification..." 6th TAMIDA (2010)

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 11/26...

11/26

.

MDC : Label Combination (LC) Method

• a class for all combinations of base BR 06

• very high complexity, still high error rate

Training Tuples x1 x2 Y1 Y2 Y3

1 0.7 0.4 1 1 0 2 0.6 0.2 1 1 0 3 0.1 0.9 0 0 1 4 0.3 0.1 0 0 0

h: X → Z

Z 1 0 0 0

06 J.Ortigosa-Hernandez+3 "A Semi-supervised Approach to Multi-dimensional Classification..." 6th TAMIDA (2010)

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 12/26...

12/26

.

MDC : The CC Method• CC: Classifier Chains method 07 -- literally, a chain of BR classes

• controlled complexity, much better error rate, but the main problem is which order?

Training Tuples x1 x2 Y1 Y2 Y3

1 0.7 0.4 1 1 0 2 0.6 0.2 1 1 0 3 0.1 0.9 0 0 1 4 0.3 0.1 0 0 0

h1: X → Y1 h2: Y1 → Y2 h3: Y2 → Y3

h2 h1 h3

07 J.Read+3 "Classifier chains for multi-label classification" Machine Learning, Springer (2011)

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 13/26...

13/26

.

The MetroMap Classifier (MMC)

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 14/26...

14/26

.

The Metromap Concept• like a map of a train network 01

• main advantage: e2e paths in (ontology) graphs

01 myself+0 "On Context Management Using Metro Maps" 7th SOCA (2014)

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 15/26...

15/26

.

MMC : A Practical Setting

Human judgment

Auto judgement

Folksonomy

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 16/26...

16/26

.

MMC : Processing Logic• processing based on human-defined metromap, the function is similar tochaining BR classes, but with higher performance

Metromap Classifier

Human

Check Metromap

Fuzzy?

Cold? Hot?

Robot (Automatic Classification)

Bad

Input

No Yes

No

No

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 17/26...

17/26

.

MDC (MMC) vs DL(pixels)

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 18/26...

18/26

.

DL : graphics vs Text

• graphics◦ pixels are already numeric◦ images can be resized to provide same-size input -- DL needs fixed-size input

• text requires complex processing1. tokenize text (words)2. frequency distribution -- variable size

3. sample distribution -- finally, the same/fixed size!

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 19/26...

19/26

.

Experimental Setup (1) Humans• 2 main cases: hot + cold = picked but not used, hot - cold = picked and used(blackswans) 03

03 myself+0 "Black Swan Disaster Scenarios" IEICE PRMU研 (2014)

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 20/26...

20/26

.

Experimental Setup (2) Process• the text is not numeric by nature, has to be converted into sampledfrequency distribution

• calculations in R, used h2o package 11 for deep learning

0 1 1 … 0

1 …

0 …

… …

1 …

Text

Matrix in a CSV file

Deep Learning

Tokenize Frequency Distribution Sample

Bayes Many (Chains, Metromap , etc.)

Path 1

Path 2

11 "H2O: R Package for Learning Algorithms" http://cran.r-project.org/web/packages/h2o (2015)

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 21/26...

21/26

.

Results (1) MMC vs BR

0 20 40 60 80 100 120Time sequence

0102030405060708090

Goo

d c

ount

Dumb ClassifierMetromap ClassifierHits on a timeline

title

0

10

20

30

40

50

60

70

80

Goo

d c

ount

title:keywords

0102030405060708090

Goo

d c

ount

title:keywords:abstract

0 20 40 60 80 100 120Time sequence

0 20 40 60 80 100 120Time sequence

02 myself+0 "Multidimensional Classification Automation with Human Interface based on Metromaps" 4th AAI (2015)

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 22/26...

22/26

.

Results (2) DL Results

0 20 40 60 80 100Time sequence

0

20

40

60

80

100

Dee

p le

arni

ng h

its

Diagonal/humanDeep learningkeys(title)rule(cold#yes hot#yes)

0 20 40 60 80 100Time sequence

0

20

40

60

80

100

Dee

p le

arni

ng h

its

keys(title:keywords:abstract)rule(cold#yes hot#yes)

0 20 40 60 80 100Time sequence

0

20

40

60

80

100

Dee

p le

arni

ng h

its

keys(title:keywords:abstract)rule(cold#no hot#yes)

0 20 40 60 80 100Time sequence

0

20

40

60

80

100

Dee

p le

arni

ng h

its

keys(title:keywords:abstract)rule(cold#yes hot#yes)

• compared to x = ycase

• DL performs verybadly

• best performswhen abstract isused, even then about25% hits

• same performance forhot + cold and hot- cold cases

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 23/26...

23/26

.

That’s all, thank you ...

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 24/26...

24/26

.

MDC and Social Robotics Go Together

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 25/26...

25/26

.

Social Robotics in Text Mining Context

Rebot

(careless) Input

Human Human

{structure}

(pinpoint) Select

Browse (or use otherwise)

Some Knowledge

(folksonomies, knowledge bases, databases, indexes, ontologies, etc.)

(metromaps )

M.Zhanikeev -- maratishe@gmail.com -- Deep Learning vs Multidimensional Classification in Human-Guided Text Mining -- http://bit.do/150704 26/26...

26/26

top related