bigdata2012ml okanohara

65
規模データ分析基盤Jubatusによる リアルタイム機械学習とその活 岡野原 株式会社Preferred Infrastructure [email protected] 2012/11/19@情報処学会連続セミナー: ビッグデータにち向かう機械学習

Upload: preferred-infrastructure-preferred-networks

Post on 05-Dec-2014

25.829 views

Category:

Automotive


0 download

DESCRIPTION

2012/11/19 情報処理学会連続セミナー「ビッグデータとスマートな社会」の第5回:ビッグデータに立ち向かう機械学習での講演資料です。

TRANSCRIPT

  • 1. 2012/11/19@ Jubatus Preferred Infrastructure [email protected]
  • 2. l l l Jubatusl Edge-Heavy Data 2
  • 3. Preferred Infrastructurel PFIl l 20063l 26l 113-0033 2-40-1l l /, , l 3
  • 4. Preferred Infrastructurel 2623/ l // l //// l ICPC = 7 l 5 l TopCoderl l Hadoop, l , Haskel, , UI/UX 4
  • 5. 5
  • 6. BigData !l l 3V Volume, Velocity, Variety l l PC EC 6
  • 7. l l / l l l l SNS(TwitterFacebook l l l l 7
  • 8. 1/5SNSTwitterFacebook // 8
  • 9. 2/5 9
  • 10. 2/5 http://google.org/crisismap/2012-sandy 10
  • 11. 3/5l l , l l l , , l l c.f. waze TomtomHD tracl l / / l Ford + Google Prediction APIPHV l Ford OpenXC CANJSON 11
  • 12. 3/5 waze.com 12
  • 13. 4/5 / TAP 13
  • 14. 5/5l NY l 1500 l l 300realtime/semi-realtime/static l l l MTBF (Mean time between failures)Machine Learning for the New York City Power Grid, J. IEEE Trans. PAMI, to appear, Con Edison 14
  • 15. 5/5l l l 15
  • 16. 16
  • 17. l l l l l l Regretl (SGD) l 17
  • 18. f(x; w) := sign(wTx) l x Rm y = {-1, +1} f(x; w) l wRm lSVMs x1 x2 x3 .xm l * * * * w1 w2 w3 . w m sum sign(z) := +1 if z >= 0 and -1 otherwise
  • 19. f(x; w) := arg maxy wTyxl x Rm y = {1, 2, , k} f(x) l ly(i)wTx(i) := wy(i)Tx(i) wyTx(i) l y y := argmax yy(i) wyTx(i) largmax yy(i) wyTx(i) l c.f. CRFs,
  • 20. l 1. (x, y) 2. ywiTx > E ?3. wi wi+1 := wi + yAx >0R, ARmxm l E, , A lE: l : lA : c.f.
  • 21. l > 0A ywi+1Tx = y(wi+yAx)Tx y > 0 wt y=-1 x -Ax = ywiTx + y2(Ax)Ty < 0 x wt+1 = ywiTx + xTAx ywiTx l
  • 22. l Perceptron 1958 l Passive-Aggressive 2002 l Condence Weighted Learning 2006 l Adaptive Regularization of Weight Vector 2009 l Normal HERD 2010 l Soft Condence Weighted Learning 2012
  • 23. Perceptron [Rosenblatt Psy. Rev. 58], [Collins EMNLP 02]l (x, y)wi wi+1 := wi + yx l E = 0, = 1, A = I l l wwa := wi Averaged Perceptronl Perceptron l w [Collins 02]
  • 24. {(x(i), y(i))}Ni=1N, |x(i)|2
  • 25. Passive Aggressive[Crammer, JMLR 06]l SVM l Gmail [Aberdeen LCCC 2010] l w l (x, y) l wi (= l wi+1 = argminw |w-wi|2/2 + C L(x, y, w)2 l L(x, y, w) = [1 ywTx] (hingeloss) l
  • 26. Passive Aggressive (wi+1 := wi + y l(x, y, w)/(|x|2 + 1/C) xlPA l E = 1 l = L(x, y, w) / (|x|2 + 1/C) wi+1 := wi + Ax l A = I lL(x, y, w)
  • 27. Confidence Weighted Algorithm (CW)[K. Crammer, et. al, EMNLP 09]l wN(, ) l Rm l Rmxm wi 1.7 0.6 CW wi N(1.7, 0.5) N(0.6, 0.4)
  • 28. CWl PA, ) l l KL-Divergencel arg min, DKL(N(, ) || N(i, i)) s.t. PrwN(,)[ywiTx 0] l l E, , Ax, y, , l PerceptronPAl l
  • 29. [K. Crammer, et. al, EMNLP 09] News Groups Amazon Amazon EnronUser A10 EnronUser B10 NewYork Times
  • 30. Adaptive Regularization ofWeight Vectors (AROW) [Crammer NIPS+ 09]l CW l l 1: 2: KL-Divergence 3: Confidencearg min, DKL(N(, ) || N(i, i)) + 1L(x, y, ) + 2 xTx CW1 2 1 3l E, , A
  • 31. AROW 0% 10% 30%l AROW > CWl AROW>CW
  • 32. NHERD[Crammer+ NIPS 10]l wN(, )l PA l CWKL-divergence NHERD l NHERD(HERD)l , E, A l AROW
  • 33. NHERD = (0, 0) , = I x = (1, 2), y = 1 |w|=1, |w|=2 dash1 [Crammer+ NIPS 10]
  • 34. Given (x, y) v = xTx If ywTx < E then b = 1.f + 2ywTxC = (-b + (b2-8C(ywTx Cv))1/2) / w := w + Ax 4vC Update(A) = (v + r)-1 E Update(A) (arr :=)Perceptron 0 1 1PA (PA-II) 1 [1-ywTx] / (|x|2 + 1/C) 1CW (arr-1 + 2 xr)-1AROW 1 [1-ywTx] arr (arrxr)2NHERD 1 [1-ywTx]/(v + 1/C) (arr-1 + (2C + C2v)xr2)-1O(|x|0)
  • 35. Jubatus35
  • 36. l CEP()Hadoop() l ??? Hadoop CEP http://web.mit.edu/rudin/www/TPAMIPreprint.pdf http://www.computerworlduk.com/news/networking/3302464/
  • 37. l Jubatus Online ML alg. Jubatus 2011- Structured Perceptron 2001 PA 2003, CW 2008 WEKA Mahout 1993- 2006- SPSS 1988- Batch 37
  • 38. Jubatusl NTT SIC*Preferred Infrastructurel 201110OSS http://jubat.us/ * NTT 38
  • 39. 1: / l l l l l twitter6000QPS l 39 Jubatus 39
  • 40. 2: l l l l l 40
  • 41. 3: l l l l l 41
  • 42. Learn Learn Model update Learn Model Update Model update Learn Learn Model update Learn Time Model Update Model updatel l 42
  • 43. l Jubatusl l l l l mix 43
  • 44. Jubatus M2M l Jubatus Jubatus Hadoop CEP RDBMS HDFS/HBase Mahout SPSS Mahout 44
  • 45. CEPJubatus CEP Jubatusl l l l l l l 45
  • 46. HadoopJubatus Hadoop Jubatusl l l l l l l 46
  • 47. MahoutJubatus Mahout Jubatusl l l l l l 47
  • 48. Jubatusl Jubatus l l l l l l l l l l l 48
  • 49. l xy l {(x, y)}xy x y or or Twitter Tweet 49
  • 50. l l l . l 50
  • 51. l l : l : 5 4 3 2 1 0 -5 -3 -1 1 3 5 -1 -2 -3 -4 -5 51
  • 52. l l l l SNShttp://idc.7-dj.com/sns/feat.html http://www.soumu.go.jp/ 52 www.mlit.go.jp/common/000057575.pdf main_sosiki/joho_tsusin/security/kiso/ illust/internet.gif 52
  • 53. Jubatus UPDATE, ANALYZE, and MIX1. UPDATE l 2. ANALYZE l 3. MIX (automatically executed in backend) l l C.f. Map-Shuffle-Reduce operations on Hadoopl l l l 53
  • 54. UPDATE l l l Distributedrandomly Localor consistently Initial model model 1 Local model Initial model 2 54
  • 55. MIXl l l Local Model ModelInitial Merged Initial Mixedmodel - model = diff diff diff + model = model 1 1 1 Merged + = diff Local Model ModelInitial Merged Initial Mixedmodel - 2 model = diff diff diff + model = model 2 2 55
  • 56. UPDATE (iteration) l MIX l MIX Distributedrandomly Localor consistently Mixed model model 1 Local model Mixed model 2 56
  • 57. ANALYZE l l l l Distributedrandomly Mixed model Return prediction Mixed model Return prediction 57
  • 58. Jubatus1. l 2. UPDATE l UPDATE 3. MIX l MIX 4. ANALYZEMIX l 5. l 58
  • 59. Jubatusl l l l HEMS / BEMS l l M2Ml l l l l Jubatus 59
  • 60. Edge-Heavy Data, Edge-Heavy Data: CPSGICTF 2012, http://www.gictf.jp/doc/20120709GICTF.pdf 60
  • 61. Edge-Heavy Data1/3l l exhaust data l l , , , 61
  • 62. Edge-Heavy Data2/3l 7502011 l 100GB/750PBl 2010 l 10GB/12000200PBl 400PB 62
  • 63. Edge-Heavy Data3/3)Edge-Heavy Data1.SSD2.3. l GFS + MapReduce4.5.Jubatus 63
  • 64. l l l l l l 5l Edge-Heavy Data 64
  • 65. Copyright 2006-2012 Preferred Infrastructure All Right Reserved.