bigdata week moscow 2013 - case: personalization

55
Персонализация: наш случай Горохов Антон Мезенцев Павел Рамблер 2013

Upload: anton-gorokhov

Post on 15-Jun-2015

407 views

Category:

Education


3 download

TRANSCRIPT

  • 1. : 2013

2. : ?, : 3. : ?, : ? .. 4. 5. hadoop 6. hadoop 7. hadoop 8. 9. ? 10. : 20 300 . 11. 12. - : , , -, + API expire o o Online : Redis 13. ( nginx) Online : 14. MapReduce (!) user_id Redis mapper redis Redis 15. ? (URL, , , ) Train/test test Offline : 16. 13 , 13 17. 13 , 13 0.750.90.6 0.40.80.970.2 18. 13 , 13 13 9 => = 9/13 = 69% 12 9 => = 9/12 = 75% 0.750.90.6 0.40.80.970.2 19. /0.50.60.70.80.910 0.2 0.4 0.6 0.8 1 (recall)(precision) 20. /0.50.60.70.80.910 0.2 0.4 0.6 0.8 1 (recall)(precision) 21. /0.50.60.70.80.910 0.2 0.4 0.6 0.8 1 (recall)(precision) 22. : , (1-3% ) MapReduce (weka) 23. 00.10.20.30.40.50.60.70.80.910 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 24. 00.10.20.30.40.50.60.70.80.910 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 prec_m =0.55prec_f =0.45 :45% + 55% 25. 00.10.20.30.40.50.60.70.80.910 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 c 0.7540%9% 26. ( ): train/test: (>3000 ) ( , ) : , , ? 27. test: / / ? 28. + train set test set/ 29. + + + 30. 1. 2. 3. 4. (train/test)5. , 6. 7. Big Data? 31. (grid): 2 , 1: 32. : user_id 30 : datanode RAM HDFS, : MR : 2: 33. - 3: 34. 1 000 000 000 300 . 2 ? 35. "" ? ? 36. ~400 10 1000 37. / N Grid. X 38. M Grid. Y 39. 40. 1. 2. 3. 4. 5. 6. mapReduce 41. map (id, data):write (feature, sex)reduce (feature, sexList):write (feature, length (males(sexList))/(length (sexList))# 42. # map (feature, ratio):write (count (feature))reduce (countsList):write (sum (countsList))# 1map (id, data):write (feature, sex)reduce (feature, sexList):write (feature, length (males)/(length (sexList))count ("features", +1) 43. # N- map (feature, ratio):write (ratio)reduce (ratioList):for i in 1 .. N-1:write (ratioList [i / N * count]) 44. Train: 2 Full: 300 mahout random forest 45. Random Forest 46. Random Forest= + + 47. : Mahout : Random Forest 48. : % Random Forest 49. FreeBSD 70 8-16 cores, 64RAM, 4HDD NN JT HDFS Python Java, R shell Hive streaming pydoit 50. (/ HDFS) firewall HDFS java heap size FreeBSD/Linux 51. (TNS): c 82% 83% c 79% - (3 ) 55% 52. trainpython + hive 53. ()---- ()-- 54. [email protected]@mezentsev.org