今年のkddベストペーパーを実装・公開しました

30
今のKDDベストペーパーを 実装してみました 株式会社プリファードインフラストラクチャー 将平

Upload: shohei-hido

Post on 04-Dec-2014

4.161 views

Category:

Technology


4 download

DESCRIPTION

2013/09/01 第4回データ構造と情報検索と自然言語処理勉強会(DSIRNLP, http://partake.in/events/76854228-ba38-4f6e-87b9-f79e30add75c# )での発表内容です。 同一内容の会社ブログはこちら→ http://research.preferred.jp/2013/08/sketch/

TRANSCRIPT

  • 1. KDD

2. l HIDO Shohei l TwitterID: Vapnik@sla l l 2006-2012: IBM l () l 2012-: l Jubatus 2013-: Preferred Infrastructure America, Inc. l Chief Research Ocer 2 3. l l l Frequent-directions l l + Agenda 4. SIGKDD: Intl Conf. on Knowledge Discovery and Data Mining l ACM l l 8 l Best Research Paper Award: Edo Liberty (Yahoo! Labs, Haifa) Simple and Deterministic Matrix Sketching 4 5. github.com/hido/frequent-direction 5 6. l l l Frequent-directions l l + Agenda 7. n x mA B l BSVDA l PCA k-means LSI 7 8. l l l Frequent-directions l l + Agenda 9. n x mA x mBn >> l A sketch l l sketchBA BSVD B l 9 10. 10 : BSVD : : Ai : O(nm) 11. PythonNumPy 11 12. l l l Frequent-directions l l + Agenda 13. USPS PCA2 http://www.ibis.t.u-tokyo.ac.jp/RyotaTomioka/Teaching/enshu13 13 14. Pythonnumpy.linalgSVDA 14 l A7291 x 256n=7291m=256 l ASVD 15. = 3 l 10 15 ASVD = 3BSVD 16. = 4 l 013 16 ASVD = 4BSVD 17. = 5 l 03 17 ASVD = 5BSVD 18. = 6 l 18 ASVD = 6BSVD 19. = 8 l 19 ASVD = 8BSVD 20. = 16 l 20 ASVD = 16BSVD 21. = 32 l l 21 ASVD = 32BSVD 22. l = 16 l 10 2222 7291 x 256ASVD16 x 256BSVD 23. l l l Frequent-directions l l + Agenda 24. l l 24 0" 0.2" 0.4" 0.6" 0.8" 1" 1.2" 1.4" 1.6" 1.8" 3" 4" 5" 6" 8" 16" 32" 25. 25 l BSVD l 0" 500" 1000" 1500" 2000" 2500" 3000" 3500" 4000" 0" 0.5" 1" 1.5" 2" 2.5" 3" 3.5" 4" 3" 4" 5" 6" 8" 16" 32" 64" 256" B SVD 26. @tmaehara l 26 27. Edo Liberty l This indeed can be used but I thought it will be less ecient in practice and more complicated to code. So, I did not include it in the paper. l Theoretically though, it can reduce the space usage by a factor of 2, which theoretical CS people think is not important :) l That said, I received quite a few questions about that so I will say something about it in the journal version. l incremental rank-1 SVD updates l l SIGKDD l l l 27 28. l 28 29. Frequent-directions l l Python l l A =A l l etc 29 30. 30 http://jubat.us/OSS