Download - Spotify Teknikdagarna

How to make sense of 150 TB of data, every daySebastian [email protected]

Spotify75 million users20 million paying usersFounded in 20083+ billion dollars paid to rights holders30+ million tracks1.5+ billion playlists1500+ employees

Svrt att definiera exakt hur mnga ltar. Deduplicering, vi har ett eget team som jobbar med det30 miljoner faktiska ltar, nr man rknat bort dubbletter.

Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate.What is big data?

High Volume, High Velocity, High VarietyKatalog, 30 miljoner rader (inte s mycket). 30^2 (n^2) jmfrelse.

14 TB of user/service-related log data per dayStreams/clicks/interactions are being loggedExpands to 150 TB every day Combining data sources

billions of lines of data every dayAnonymizing data, making sure that all data is according to privacy concerns. One machine, 160mb/s, 10 days to read in 150 TB of dataSo how do we do it?

We utilise a cluster of 1600 nodes

60 PB of disk space68 TB of RAM (42gb per server)30k CPU cores

Hadoop Data Center

European Data Center

American Data Center

Internet

Internet7 TB/day7 TB/day

ClientClient

Logs sent from clients- Sent to EU/US data centre

Spotify data architecture

Very complexLots of different servicesApp is a small part of everythingThe app does not just work

Example of the Discovery Logs come from the client, pass through hadoopinto a service that recommends music and surfaces back to user

Approximate 60M users x 4M songs with40 latent factors, ALS

In short, minimise the cost function:

Example of the Discovery Logs come from the client, pass through hadoopinto a service that recommends music and surfaces back to user

Saturday before lunchWeekday evenings

Track usage, but important for breakdown.The reason we save a lot of data

Download - Spotify Teknikdagarna

Top Related