elastic @deezer
Post on 08-Apr-2017
122 Views
Preview:
TRANSCRIPT
Elastic @DeezerAurélien Saint Requier, Search Data Scientist
ELASTIC @DEEZER
/01
/02
/03
/04
Where?
Elasticsearch architecture
Querying Elasticsearch
ELK stack for analysis
Table of contents
ELASTIC #DEEZER
Where?
/01
ELASTIC @DEEZER
For search features
ELASTIC #DEEZER 4
For chart and new release features
ELASTIC #DEEZER 5
For recommendation features
ELASTIC #DEEZER 6
Elasticsearch Architecture
/02
ELASTIC @DEEZER
Elasticsearch architectureOur needs
ELASTIC #DEEZER 8
● Search and recommend
○ 3 millions of artists
○ 5 millions of albums
○ 50 millions of tracks
○ 2 millions of playlists
● Search and recommend content based on
○ metadata and other features
○ tag description
● New releases should become available in less than 2 hours
● Queries have to respond in less than 100ms
Elasticsearch architectureOverview
ELASTIC #DEEZER 9
Elasticsearch architectureData workflow
ELASTIC #DEEZER 10
Elasticsearch architectureData workflow
ELASTIC #DEEZER 11
How we deploy full indexes in production ?
ELASTIC #DEEZER
1. Get json data from Hadoop cluster (using WebHDFS)2. Index documents on mastersearch (using ES bulk api)3. Package the new index :
3.1. compress the ES index directory3.2. generate a deployment script
4. Copy the package on the temporary node of each cluster (using assassin, an homemade rsync deploy script)
5. Run deployment script : 5.1. Start a temporary ES instance and load the new index5.2. Set the required number of replica 5.3. Wait until data is replicated and then shutting down the
temporary ES instance5.4. Warm the new index5.5. Switch alias on the new index and close the old index
12
Querying Elasticsearch
/03
ELASTIC @DEEZER
How we analyze musical data?
ELASTIC #DEEZER 14
Use custom analyzers
Black Pearl (He's A Pirate) [feat. Sidney Housen] - EP
The Black Eyed Peas
● Lowercase asciifolding and char filters, music field synonyms :
● Edge_ngram tokenizer :
How we search in our data?
ELASTIC #DEEZER 15
● Using a Java internal Elasticsearch plugin :
How we search in our data?
ELASTIC #DEEZER 16
● Using Multi Search API and Query DSL:
How we recommend our data?
ELASTIC #DEEZER 17
● Using function score queries :
How we explore our data?
ELASTIC #DEEZER 18
● Using aggregation:
Some feedbacks
ELASTIC #DEEZER
● In numbers: ○ More 25 millions queries a day, around 5000 queries / minute○ Around 95% queries respond in less 100ms
● In lessons :○ Be careful with fielddata usage○ Big jvm ES instance = Long gc time○ Avoid prefix queries : use edge-ngram tokenizer and do match
queries*
● In future : ○ Use a dedicated client/data/master architecture○ Stop fuzzy queries (replaced by a “Did you mean“ approach)*○ Migrate to Elasticsearch v2
19
*https://www.elastic.co/blog/elasticsearch-queries-or-term-queries-are-really-fast
ELK for analysis
/04
ELASTIC @DEEZER
Use of ELK
ELASTIC #DEEZER
● Elasticsearch v1.7.5 :○ cluster of 3 nodes○ index logs from Logstash and homemade scripts○ Around 2 billions of documents
● Logstash 1.5● Kibana v 4.1.1
○ 26 dashboards / 189 visualisations● Tools:
○ curator for index retention○ elasticdump for saving kibana settings
21
Use casesMonitoring
ELASTIC #DEEZER 22
Use casesAnalysis what our users search
ELASTIC #DEEZER 23
Thanks for your attention
We are hiring !
jobs.deezer.com
Questions?
top related