deepdocclassifier: document classification with …2018/04/26 · 5/9 lp Ł˜ d0 l0 d0 ˘ maxpool...
TRANSCRIPT
DeepDocClassifier: Document Classificationwith Deep Convolutional Neural Network
발표자 :이상엽
April 26, 2018
1/9
2/9
목표및특징
I 구조기반문서분류
I 기존방법보다더넓은범위를더정확히분류하기
I AlexNet의영향을많이받음
3/9
사용한자료: Tobacco dataset
I 10종류(Ad, Email, Form, Letter, Memo, News, Note,Report, Resume, Scientific), 3482개의문서이미지
I 담배회사를둘러싼소송에서사용된자료
4/9
전처리
I 해상도를 227×227로통일I 모든이미지에서 ImageNet데이터의평균을뺐음I 가중치의초기값은 ImageNet데이터로미리학습시킨모형에서가져옴(마지막단계제외)
I AlexNet을거의그대로I 데이터를늘릴때 PCA로 RGB값을상쇄시킨방법은쓰지않음I bias의초기값으로 1대신 0.1을이용
5/9
구조
단계 필터크기 필터수 maxpoolconv 1 & pool 1 11×11×3(4) 96 3×3(2)conv 2 & pool 2 5×5×48 256 3×3(2)conv 3 3×3×256 384conv 4 3×3×192 384conv 5 & pool 3 3×3×192 256 3×3(2)
I 각 pooling에앞서정규화시행, fc 6, 7에서 dropout(0.5)사용I 모든 conv, fc단계에서활성함수로 ReLU사용
5/9
구조
6/9
모수및초모수
I 배치크기가 10인 SGDI 학습률은 0.0001,모멘텀은 0.9, weight decay는 0.0005로고정
I 가중치의초기값은 ImageNet데이터로미리학습시킨값을이용함(마지막단계제외)
7/9
학습과정
I 학습및검증에는각문서종류당최소 20개,최대 100개의자료를사용
I 각종류당 80%는학습, 20%는검증에이용I 학습및검증에사용되는자료개수를임의로변화시켜 100회의학습을시행
8/9
결과
각문서종류당 100개를학습및검증에이용했을때
9/9
결과
[2] Le Kang. Jayant Kumar, P eng Y e, Yi Li, and David Doermann, "Convolutional Neural Networks for DocumentImage Classification," in ICPR, 2014.[7] S. Chen, Y. He, J. Sun, and S. Naoi, "Structured document classification by matching local salient features," inICPR, Nov 2012, pp. 653-656.
%d0%9d%d0%b0%d0%b2%d1%87%d0%b0%d0%bb%d1%8c%d0%bd%d0%be %d0%bc%d0%b5%d1%82%d0%be%d0%b4%d0%b8%d1%87%d0
%d0%9a%d0%a3%d0%a8%d0%9d%d0%86%d0%a0%d0%95%d0%9d%d0%9a%d0%9e%20%d0%9e %d0%90 %d0%9e%d0%91%d0%9b%d0%8
%d0%a1%d0%b1%d0%be%d1%80%d0%bd%d0%b8%d0%ba%20%d0%9c%d0%b5%d0%b4 %20%d0%b7%d0%b0%d0%b2 %20%d0%b4%d0%b
%d0%9f%d1%80%d0%be%d0%b3%d1%80%d0%b0%d0%bc%d0%bc%d0%b0 %d0%a1%d1%8a%d0%b5%d0%b7%d0%b4%d0%a2%d0%b5%d1
%d0%a3%d1%87%d0%b5%d0%b1%d0%bd%d1%8b%d0%b9%d0%9f%d0%bb%d0%b0%d0%bd %d0%a4%d1%83%d0%bd%d0%b4%d0%9c%d0
%d0%a1%d0%b1%d0%be%d1%80%d0%bd%d0%b8%d0%ba %d0%a1%d1%8a%d0%b5%d0%b7%d0%b4%d0%a2%d0%b5%d1%80%d0%b0%d0
%d0%9a%d0%b0%d1%82%d0%b0%d0%bb%d0%be%d0%b3 %d0%9c%d0%be%d0%bb %d0%a1%d0%bf%d0%b5%d1%86 %d1%83%d0%ba%
%d0%a3%d1%87%d0%b5%d0%b1%d0%bd%d0%b8%d0%ba%d0%b8%20%d0%98%d0%b7%d0%b4 %d0%b2%d0%b0%20%d0%9c%d0%98%d0
%d0%9c%d0%b5%d1%82 %d0%b2%d0%ba%d0%b0%d0%b7%d1%96%d0%b2%d0%ba%d0%b8 %d0%b7%20%28%d1%81%d0%b0%d0%bc %
%d0%bf%d1%80%d0%be%d0%b5%d0%ba%d1%82%d0%b8%d1%80%d0%be%d0%b2%d0%b0%d0%bd%d0%b8%d0%b5 %d0%b2%d0%bd%d1
%d0%93%d0%a3%d0%97%20%d0%93%d0%b0%d0%b7 %d0%97%d0%b0%d0%b2%d0%be%d0%b4%d1%81%d0%ba%d0%b0%d1%8f%20%d0
%d0%9c%d0%b0%d0%b3 %d0%95%d0%ba%d0%b1 %d0%91%d1%96%d0%be%d1%82%d0%b5%d1%85%d0%bd%d0%be%d0%bb%d0%be%d
%d0%b5 %d0%b5%d0%ba%d1%80%d0%b0%d0%bd %d0%b4%d0%be%d1%80%d0%b0%d0%bd%d0%b8%d0%ba%d0%b8 %d0%bb%d1%8e%
%d0%bb%d0%b8%d1%81%d1%82%d0%be%d0%b2%d0%ba%d0%b0 %d0%bf%d0%be rtm %d1%84%d0%b8%d1%82%d0%b8%d0%bd%d0%
%d0%9c%d0%b0%d0%b3 %d0%95%d0%ba%d0%b1 %d0%a2%d0%b5%d1%85%d0%bd%d0%be%d0%bb%d0%be%d0%b3%d1%96%d1%97%2