jinchao demo v3
TRANSCRIPT
Motivation
• Twitter represents a rich flow of information• Lack of an effective way to query the twitter• Hard to monitor interested topics at real time
Search Tweets Like a Professional
A Real Time Twitter Search Engine That Allows you to Search based on:•Keywords◦ Country◦ Language◦Negative words
Demo(http://searchyourtweet.info:5000/input)
Keep an eye on your interested topic•Not just searching the historical tweets•Express your interest, we will keep you update on the newest event•More technical detail on this later•Video (https://youtu.be/GdRmXNfukos)
Data pipeline
Query Controller
Backend Database
percolator
Logic Layer Frontend
Searching database
Data Backup
Pub/Sub
PublishMatching query
Register query
searching
ChallengeConnect backend data pipeline:◦How to connect Kafka with ElasticSearch?
◦ Try with elasticsearch-‐river-‐kafka plugin,notsuccessful
◦ Solution:using Logstash!◦ Advantage:
◦ Easy to use◦ Highly Scalable◦ Work with different data sources anddestinations
An example of logstash and queue In production environment
ChallengePercolator:◦Use Case: Altering and monitoring documents◦ Think it as “search in reverse”
◦ User register queries into percolator◦ Percolator match incoming documents with registered queries
◦ How to design the percolator data pipeline?◦How to decouple the backend database with frontend server?
◦ Use publish / subscribe design pattern
Percolator Pipeline
PercolatorQuery database
Twitter database
Controller
Pub/Sub
New incoming tweets
publish
subscribe
Open channel
•query_controllerwill construct the percolator query based on it, and pass it to ElasticSearch percolator. The query_controllerwill also open an Redis channel for this topic.•Query_controllerwill keep fetching the latest tweets from ElasticSearch for every 5s (current setting) and sending them to percolator for matching.•For each tweet, percolator will tell us if it matches any registered query. Query_controllerwill push tweet to the right Redis channel based this information.•In frontend, Flask server will subscribe to the Redis channel and receive percolator's update.•For this demo, in order to keep frontend UI simple, all tweets will be directed to the default Redis channel.
Data flow of percolator
Challenge• Real time update on frontend:
◦ How to keep posting Redis messages from Flask server to client at real time (solved a very hacky solution)
• Construct ElasticSearch query• Fine tuning on ElasticSearch (not enough time to fine tuning elasticsearch mapping)
About MeM.Math, University of Waterloo◦ Field: Statistics and Machine Learning
B.S., University of Toronto◦ Field: Applied Mathematics
Data Scientist Intern, Neon Inc., San Francisco
Back-‐end Model Developer, MetricAid Inc., Toronto
Strong interest in Deep Learning: ◦ Convolutional Network, Recurrent Network◦ Applying Deep Learning in NLP