elastic @deezer

Post on 08-Apr-2017

122 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Elastic @DeezerAurélien Saint Requier, Search Data Scientist

ELASTIC @DEEZER

/01

/02

/03

/04

Where?

Elasticsearch architecture

Querying Elasticsearch

ELK stack for analysis

Table of contents

ELASTIC #DEEZER

Where?

/01

ELASTIC @DEEZER

For search features

ELASTIC #DEEZER 4

For chart and new release features

ELASTIC #DEEZER 5

For recommendation features

ELASTIC #DEEZER 6

Elasticsearch Architecture

/02

ELASTIC @DEEZER

Elasticsearch architectureOur needs

ELASTIC #DEEZER 8

● Search and recommend

○ 3 millions of artists

○ 5 millions of albums

○ 50 millions of tracks

○ 2 millions of playlists

● Search and recommend content based on

○ metadata and other features

○ tag description

● New releases should become available in less than 2 hours

● Queries have to respond in less than 100ms

Elasticsearch architectureOverview

ELASTIC #DEEZER 9

Elasticsearch architectureData workflow

ELASTIC #DEEZER 10

Elasticsearch architectureData workflow

ELASTIC #DEEZER 11

How we deploy full indexes in production ?

ELASTIC #DEEZER

1. Get json data from Hadoop cluster (using WebHDFS)2. Index documents on mastersearch (using ES bulk api)3. Package the new index :

3.1. compress the ES index directory3.2. generate a deployment script

4. Copy the package on the temporary node of each cluster (using assassin, an homemade rsync deploy script)

5. Run deployment script : 5.1. Start a temporary ES instance and load the new index5.2. Set the required number of replica 5.3. Wait until data is replicated and then shutting down the

temporary ES instance5.4. Warm the new index5.5. Switch alias on the new index and close the old index

12

Querying Elasticsearch

/03

ELASTIC @DEEZER

How we analyze musical data?

ELASTIC #DEEZER 14

Use custom analyzers

Black Pearl (He's A Pirate) [feat. Sidney Housen] - EP

The Black Eyed Peas

● Lowercase asciifolding and char filters, music field synonyms :

● Edge_ngram tokenizer :

How we search in our data?

ELASTIC #DEEZER 15

● Using a Java internal Elasticsearch plugin :

How we search in our data?

ELASTIC #DEEZER 16

● Using Multi Search API and Query DSL:

How we recommend our data?

ELASTIC #DEEZER 17

● Using function score queries :

How we explore our data?

ELASTIC #DEEZER 18

● Using aggregation:

Some feedbacks

ELASTIC #DEEZER

● In numbers: ○ More 25 millions queries a day, around 5000 queries / minute○ Around 95% queries respond in less 100ms

● In lessons :○ Be careful with fielddata usage○ Big jvm ES instance = Long gc time○ Avoid prefix queries : use edge-ngram tokenizer and do match

queries*

● In future : ○ Use a dedicated client/data/master architecture○ Stop fuzzy queries (replaced by a “Did you mean“ approach)*○ Migrate to Elasticsearch v2

19

*https://www.elastic.co/blog/elasticsearch-queries-or-term-queries-are-really-fast

ELK for analysis

/04

ELASTIC @DEEZER

Use of ELK

ELASTIC #DEEZER

● Elasticsearch v1.7.5 :○ cluster of 3 nodes○ index logs from Logstash and homemade scripts○ Around 2 billions of documents

● Logstash 1.5● Kibana v 4.1.1

○ 26 dashboards / 189 visualisations● Tools:

○ curator for index retention○ elasticdump for saving kibana settings

21

Use casesMonitoring

ELASTIC #DEEZER 22

Use casesAnalysis what our users search

ELASTIC #DEEZER 23

Thanks for your attention

We are hiring !

jobs.deezer.com

Questions?

top related