tweetool (0. 1 100 version) final report
DESCRIPTION
A Twitter Recommend System based on Topic Modeling. Tweetool (0. 1 100 version) Final Report. Yilei Qian Computer Science University of Southern California [email protected]. Ideas. Following too many points on Twitter Too many news every day - PowerPoint PPT PresentationTRANSCRIPT
Tweetool (0. 1 100 version)Final Report
Yilei QianComputer Science University of Southern [email protected]
A Twitter Recommend System based on Topic Modeling
Ideas
• Following too many points on Twitter• Too many news every day• Cannot find the interested and valued news
• Don’t know the name which user want to follow• Need someone to recommend who to follow• Need someone to recommend the hottest news
• Use topic modeling to re-rank all the user
Traditional Method
Traditional Method
Traditional Method
Topic Modeling
Topic Modeling
Topic Modeling
• a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents.
• Always used in natural language processing.
Reference Papers:Steyvers,m. and Griffiths, T., “Probabilistic topic models,” Hand book of latent semantic analysisBlei, D.M and Ng, A.Y and Jordan, M.I, “Latent Dirichlet Allocation”, The Journal of Machine Learning Research 2003
Label based LDA
Step:1. Build the LDA Model2. Train the model instance by train document3. Run the LDA for all the data based on trained model
instance
Problem:4. Punctuation marks. E.g. “”,.={}() …5. Frequent words. E.g I , you…. 6. Other Noise
Result Generate
1. By Angle
Value = 2. By Distance
Value =
13-Dimension Topics
1. Art & Design2. Book3. Business4. Charity5. Entertainment6. Family7. Fashion8. Food & Drink9. Health10. Music11. News12. Science & Technology13. Sports
Languages & Tools
• Web UI: HTML + AJAX(Unfinished) +CSS(unfinished)+Twitter REST API
• Android UI: Java, Android 2.1(unfinished)• Server Side: Java 1.6, Servlet 2.0, Spring 3.0, Hibernate 3.3• Twitter API: Twitter4j 2.2.1 (300 request per hour)• Server: Tomcat 7.08• Database: MySQL 5.5• Data Package: JSON• Develop Platform: Eclipse 3.4• Total code lines: 2000(+) + 2421 + 462 = 5000(+)• Subversion:
• http://tweetool-yilei.googlecode.com/svn/trunk/tweetool-yilei-read-only
Architecture
DB
Twitterfetch
LLDATweetool
Hibernate DAO
Work Flow
Servlets
Work Flow
Work Flow
Mobile DeviceHTML
APPLICATIONCONTEXT
Distributed Crawler & Computing
Problems(endless T_T)
1. High noise in topic model• Few words, Odd marks, Abbreviation
2. Unfamiliar with Twitter API, A lot of bugs3. Transaction Problems4. The Ugly UI5. Poor performance6. Don’t have enough time. Many functions are
unfinished7. Tweetool system should be reconstructed !!!Environment: 7000+Users 22,0000+Tweets
Future Work
1. Try to finish it2. Debug3. Build a better train file4. Add feedback function5. Better topics classification
Web UI (Design Version)
Android UI
FunctionButton
FunctionButton
FunctionButton
FunctionButton
Titile
Main Menu News Menu
Title
News
News
News