jubatusが目指すインテリジェンス基盤

49
Jubatusが指す インテリジェンス基盤 比戸 将平 株式会社Preferred Infrastructure

Upload: shohei-hido

Post on 01-Nov-2014

3.310 views

Category:

Technology


0 download

DESCRIPTION

IEICEソサエティ大会2013 "知的環境を実現するビッグデータ解析と通信行動分析"セッションでの講演内容です。

TRANSCRIPT

  • 1. Jubatus Preferred Infrastructure
  • 2. l NTT SIC*Preferred Infrastructure l 201110OSS http://jubat.us/ Jubatus * NTT
  • 3. l l l Jubatus l l Jubatus Agenda
  • 4. Preferred Infrastructure (PFI) 4 IR l l 20063 l l Sedue: l Bazil: l Jubatus:
  • 5. l 2622/ l /// l Ex- Sony IBM Yahoo! Sun mixi GREE l IPA 5 l ICPC7ICFP l TopCoder RedCoder 3 (25) l l Hadoop, l Hadoop, Haskell, l l 2 l 2013 5
  • 6. l l l Jubatus l l Jubatus Agenda
  • 7. 7 l l /Web//Twitter l l Web l l // l l l M2M l
  • 8. Volume Variety Velocity 8 Complex Event Processing Hadoop NoSQL M2M
  • 9. 9 SQL DWH BI CEP M/RCQL (Machine Learning)
  • 10. 10
  • 11. 11 DB
  • 12. l IBM2012 l 24% l 47% l 6% l or IBM Institute of Business Value Analytics: The real-world use of big data, 2013
  • 13. 3 1. l l 2. l DB l DB 3. l l
  • 14. Jubatus l (MapReduce/Hadoop l l l l / (CEP l l 14 1. 3. 2.
  • 15. 15 WEKA 1993- SVM light 1998- Mahout 2006- Jubatus // 2011 Structured Perceptron [Collins, EMNLP 2002] Passive Aggressive / MIRA 2004 online-learning library [, 2008]
  • 16. Google GFS/MapReduce (Hadoop) [Google 2004] + MapReduce Chubby (Zookeeper) [Google 2006] , DB/ BigTable (HBase) [Google 2006] KVS Dynamo [Amazon 2007] KVS MegaStore [Google 2011] KVS OLAP/ Hive [Facebook 2009] SQLHadoop Dremel (Apache Drill) [Google 2010] OLAP, + PowerDrill [Google 2012] OLAP + + + OSS, Spanner [Google 2012]
  • 17. l l l Jubatus l l Jubatus Agenda
  • 18. l l l l l l l l l l 18 Dimensionality Reduction by Learning an Invariant Mapping Raia Hadsell, Sumit Chopra, Yann LeCun, CVPR, 2006
  • 19. Jubatus l () l Perceptron / PA / CW / AROW / NHERD l l PA-based regression l l LSH / MinHash / Euclid LSH l l l l LOF l l / (PageRank)
  • 20. l xy l {(x, y)}xy 20 x y or or Twitter Tweet
  • 21. 21 Jeopardy!
  • 22. JubatusTwitter l NTT DataTwitter Japan l FirehoseTweet l JubatusAPI 22 http://blog.jp.twitter.com/2012/09/twitter.html http://www.nttdata.com/jp/ja/news/release/2012/092700.html
  • 23. JubatusNEDO IT l l 23 NEDO: IT
  • 24. Jubatus l l l l l l l l l Jubatus l l 24
  • 25. Jubatus l l l MRI l l l l l 25
  • 26. 26
  • 27. Overview 27 On-Disk Instance On-Memory Instance Fluentd Realtime Analysis Server JubatusData Source Web Server + Visualization Tool Kit
  • 28. l l l l l l l l 28
  • 29. 29
  • 30. l l Sedue for BigData 30 2013/08/15 12:08:30.200
  • 31. l l l Jubatus l l Jubatus Agenda
  • 32. l l l y = a x + b l l l (y =+1, x=+2) a x + b = 2a + b y > 0 ab l l (x = -5) y = a x + b = -5a + b Model x y Model x y
  • 33. w1 w2 wn 33
  • 34. lLSHMin Hash 011010010 110001100 110010111 000100101 110101011 000010110 1 2 3 4 5 6 34
  • 35. LLLL LLLL L Update LLL Update Update Update time = 1 2 3
  • 36. l Jubatus l l l l l MIX UPDATE ANALYZE
  • 37. UPDATE l l 1 or 2 l l Local model 1 Local model 2 Initial model Initial model 37
  • 38. MIX l l l Local model 1 Local model 2 Mixed model Mixed model Initial model Initial model = = Model diff 1 Model diff 2 Initial model Initial model - - Model diff 1 Model diff 2 Merged diff Merged diff Merged diff + + = = = + 38
  • 39. ANALYZE l l l Mixed model Mixed model 39
  • 40. JubatusMIX w1 w2 wn MIX w w w w = 1 n w1 ++ wn( ) MIX 40
  • 41. JubatusMIX LSHMin Hash 011010010 110001100 110010111 000100101 110101011 000010110 1 2 3 4 5 6 011010010 000010110 1 6 : 011010010 000010110 1 6 : 011010010 000010110 1 6 : Mix 41
  • 42. l l l Jubatus l l Jubatus Agenda
  • 43. Edge-heavy 43 , Edge-Heavy Data: CPS GICTF 2012, http://www.gictf.jp/doc/20120709GICTF.pdf
  • 44. Edge-Heavy Data: 44 l l l exhaust data l l l l , ,
  • 45.
  • 46. Edge-Heavy Data(1) Jubatus on OpenBlocks (ARM) 46 http://obdnmagazine.blogspot.jp/2012/11/jubatusopenblocks-ax3_21.html
  • 47. Edge-Heavy Data(2) l l l ZigBee l l l LIDAR l l FPGA l GPGPU l CUDA l Xeon Phi l x86
  • 48. Edge-Heavy Data Edge-Heavy Data HW 1.SSD 2. 3. l GFS + MapReduce 4. 5. l BI-4-1Krill: PFI 48
  • 49. l l l l P2P l HW/NWSW l SSD l l HW/NW/SW l l GoogleIBMOracleIntel + l