russir 2010 final

Download Russir 2010 final

Post on 23-Dec-2014

493 views

Category:

Documents

5 download

Embed Size (px)

DESCRIPTION

 

TRANSCRIPT

  • 1. RuSSIR 2010 Russian Summer School in Information Retrieval 1 3 -1 8 2010

2. RuSSIR 2007 RuSSIR 200 8 RuSSIR 200 9 3. 5 ,2 ( 10) (4 ) - 945 ( ) 1 10 : 2 4. Web Data Mining (WDM) Ricardo Baeza-Yates,Yahoo! Research Barcelona, Spain 5. ?: : , : , , , , Web Data Mining . , . 6. Crawling , , 7. Crawling : PageRank 8. Heavy Long Tail , Query Log Mining AOL (2006.) as is= > ( + ) SCANDAL ONLINE? 9. Background : n, k , k-1 ? : k-1 ( token-based hashing ) q k-1f q u i q , ( u i, f q(u i)) , k q , .Online Query Log Mining 10. Graph Mining : - Preferential attachment ( rich get richer ) Prestige Centrality Co-citation PageRank (+ enhancements) HITS 11. : , : Spam : , 2), Supporters - What is in the Web? Information Porn Get rich now now now!!! + On-line casinos + Free movies + Cheap software + Buy a MBA diploma + Prescription - free drugs + V!-4-gra + 12. Multimedia Information Retrieval (MMIR) Stefan Rger, The Open University 13. ? - 14. (, . ) 15. : 16. : : 17. : : PROFIT! 18. Distributed Information Retrieval (DIR) Fabio Crestani & Ilya Markov, University of Lugano 19. - , , . = + . ? : , , (, , , ,etc. ) 20. : P2P 21. () : , ( , ..) . : . : , . : , , , . : . : , : ! ( tabbed ) side-by-side 22. : : , , , , , - . : ( ): , , ; ( ). 23. 2. : . , = = . : , ( ; , ), , ,etc. 3. : . 4. : (, ). 24. NLP@ Google overvie w M ulti-Sentence Compression Katja Filippova Google Inc 25. : = U { Start, End } Start End . , : 1. Hillary Clinton wanted to visit China last month but postponed her plans till Monday last week. 2. Hillary Clinton paid a visit to the Peoples Republic of China on Monday. 3. The wife of a former U.S. president Bill Clinton Hillary Clinton visited China last Monday. 4. Last week the Secretary of State Ms. Clinton visited Chinese officials. 26. : - - - - - , , . S last Hillary to week Clinton visited paid Chinese Officials Clinton Monday of wanted Month till China E Ms visit the 1 2 3 4 last on 27. U V freq(e) freq(v) freq(u) k , : 8 , , , ! 28. 80 40 : System Gram-2 Gram-1 Gram-0 Avg. Len. Baseline (EN)21% 15% 65%8 / 28 Shortest path (EN)52%16% 32% 10 / 28 Shortest path++ (EN)64% 13% 23% 12 / 28 Baseline (ES) 12%15% 74% 8 / 35 Shortest path (ES)58% 21% 21% 10 / 35 Shortest path++ (ES)50% 21% 29% 12 / 35 System Info-2 Info-1 Info-0Avg. Len. Baseline (EN)18% 10%73% 8 / 28 Shortest path (EN)36% 33% 31% 10 / 28 Shortest path++ (EN)52% 32% 16% 12 / 28 Baseline (ES) 9% 19% 72% 8 / 35 Shortest path (ES)23% 26% 51% 10 / 35 Shortest path++ (ES)40% 40% 20% 12 / 35 29. , 30. : Machine learning 31. ( -> ,-> ) ( -> ) ( Bosch-> , ->Y andex ) ( -> ) ( -> ,colour->color ) ( -> ) 32. Open Source + Machine Learning 33. : , . .., - , -, . . . ., , , , .., / , , . , , /, , 34. Link Graph Analysis for Adult Images Classification Evgeny Kharitonov et al., , , , Unsupervised Query Segmentation Using Click Data and Dictionaries Information Julia Kiseleva, C, -, Could we automatically reproduce semantic relations of an Information Retrieval thesaurus? Alexander Panchenko, Center for Natural Language Processing, Catholic University of Louvain, --, Tapping Into Sociological Lexicons for Sentiment Polarity Classification Yelena Mejova, University of Iowa, Iowa City, IA, USA , , , .., .., , --, 35. http://romip.ru/russir2010/program.html