deepdocclassifier: document classification with deep convolutional...

10
DeepDocClassifier: Document Classification with Deep Convolutional Neural Network \ : t` April 26, 2018 1/9

Upload: others

Post on 17-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DeepDocClassifier: Document Classification with Deep Convolutional …stat.snu.ac.kr/idea/seminar/20180426/deepdoc.pdf · 2018-04-27 · DeepDocClassifier: Document Classification

DeepDocClassifier: Document Classificationwith Deep Convolutional Neural Network

발표자 :이상엽

April 26, 2018

1/9

Page 2: DeepDocClassifier: Document Classification with Deep Convolutional …stat.snu.ac.kr/idea/seminar/20180426/deepdoc.pdf · 2018-04-27 · DeepDocClassifier: Document Classification

2/9

목표및특징

I 구조기반문서분류

I 기존방법보다더넓은범위를더정확히분류하기

I AlexNet의영향을많이받음

Page 3: DeepDocClassifier: Document Classification with Deep Convolutional …stat.snu.ac.kr/idea/seminar/20180426/deepdoc.pdf · 2018-04-27 · DeepDocClassifier: Document Classification

3/9

사용한자료: Tobacco dataset

I 10종류(Ad, Email, Form, Letter, Memo, News, Note,Report, Resume, Scientific), 3482개의문서이미지

I 담배회사를둘러싼소송에서사용된자료

Page 4: DeepDocClassifier: Document Classification with Deep Convolutional …stat.snu.ac.kr/idea/seminar/20180426/deepdoc.pdf · 2018-04-27 · DeepDocClassifier: Document Classification

4/9

전처리

I 해상도를 227×227로통일I 모든이미지에서 ImageNet데이터의평균을뺐음I 가중치의초기값은 ImageNet데이터로미리학습시킨모형에서가져옴(마지막단계제외)

I AlexNet을거의그대로I 데이터를늘릴때 PCA로 RGB값을상쇄시킨방법은쓰지않음I bias의초기값으로 1대신 0.1을이용

Page 5: DeepDocClassifier: Document Classification with Deep Convolutional …stat.snu.ac.kr/idea/seminar/20180426/deepdoc.pdf · 2018-04-27 · DeepDocClassifier: Document Classification

5/9

구조

단계 필터크기 필터수 maxpoolconv 1 & pool 1 11×11×3(4) 96 3×3(2)conv 2 & pool 2 5×5×48 256 3×3(2)conv 3 3×3×256 384conv 4 3×3×192 384conv 5 & pool 3 3×3×192 256 3×3(2)

I 각 pooling에앞서정규화시행, fc 6, 7에서 dropout(0.5)사용I 모든 conv, fc단계에서활성함수로 ReLU사용

Page 6: DeepDocClassifier: Document Classification with Deep Convolutional …stat.snu.ac.kr/idea/seminar/20180426/deepdoc.pdf · 2018-04-27 · DeepDocClassifier: Document Classification

5/9

구조

Page 7: DeepDocClassifier: Document Classification with Deep Convolutional …stat.snu.ac.kr/idea/seminar/20180426/deepdoc.pdf · 2018-04-27 · DeepDocClassifier: Document Classification

6/9

모수및초모수

I 배치크기가 10인 SGDI 학습률은 0.0001,모멘텀은 0.9, weight decay는 0.0005로고정

I 가중치의초기값은 ImageNet데이터로미리학습시킨값을이용함(마지막단계제외)

Page 8: DeepDocClassifier: Document Classification with Deep Convolutional …stat.snu.ac.kr/idea/seminar/20180426/deepdoc.pdf · 2018-04-27 · DeepDocClassifier: Document Classification

7/9

학습과정

I 학습및검증에는각문서종류당최소 20개,최대 100개의자료를사용

I 각종류당 80%는학습, 20%는검증에이용I 학습및검증에사용되는자료개수를임의로변화시켜 100회의학습을시행

Page 9: DeepDocClassifier: Document Classification with Deep Convolutional …stat.snu.ac.kr/idea/seminar/20180426/deepdoc.pdf · 2018-04-27 · DeepDocClassifier: Document Classification

8/9

결과

각문서종류당 100개를학습및검증에이용했을때

Page 10: DeepDocClassifier: Document Classification with Deep Convolutional …stat.snu.ac.kr/idea/seminar/20180426/deepdoc.pdf · 2018-04-27 · DeepDocClassifier: Document Classification

9/9

결과

[2] Le Kang. Jayant Kumar, P eng Y e, Yi Li, and David Doermann, "Convolutional Neural Networks for DocumentImage Classification," in ICPR, 2014.[7] S. Chen, Y. He, J. Sun, and S. Naoi, "Structured document classification by matching local salient features," inICPR, Nov 2012, pp. 653-656.