Download - Liveperson DLD 2015
![Page 1: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/1.jpg)
DLD. Tel-Aviv. 2015
Making Scale a Non-Issue for Real-Time Data Apps
Vladi Feigin, LivePersonKobi Salant, LivePerson
![Page 2: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/2.jpg)
Agenda
Intro About LivePerson Digital Engagements Call Center Use Case Architecture Zoom-In
![Page 3: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/3.jpg)
Bio
Vladi Feigin System Architect in LivePerson 18 years in software development Interests : distributed computing, data, analytics and martial arts
![Page 4: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/4.jpg)
Bio
Kobi Salant Data Platform Tech Lead in LivePerson 25 years in software development Interests : Application performance, traveling and coffee
![Page 5: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/5.jpg)
LivePerson
We do Digital Engagements
Agile and very technological
Real Big Data and Analytics company
Really cool place to work in
One of the SaaS pioneers
6 Data Centers across the world
Founded in 1995, a public company since 2000 (NASDAQ: LPSN)
More than 18,000 customers worldwide
More than 1000 employees
![Page 6: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/6.jpg)
LivePerson technology stack
![Page 7: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/7.jpg)
We are Big Data
1.4 Million concurrent visits
1 Million events per second
2 billion site visits per month
27 million live engagements per month
Data freshness SLA (RT flow): up to 5 seconds
![Page 8: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/8.jpg)
Visitor
![Page 9: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/9.jpg)
Agent
![Page 10: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/10.jpg)
Visitor
![Page 11: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/11.jpg)
Agent
![Page 12: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/12.jpg)
Call Center Operating
Digital engagement requires operating a call center in the most efficient way
How to operate a call center in the most efficient way? Provide operational metrics … In real-time
What are the challenges? Huge scale, load peaks, real-time calculations, high data freshness SLA
![Page 13: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/13.jpg)
Call Center Operating
![Page 14: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/14.jpg)
Architecture. Real-Time data flow
producer
(agent)
producer
(sess.)
producer
(chat)
Kafka
Storm
Cassandra
Storm
Fast topic
ElasticSearch CouchBase
API
Consistent topic
Batch layer
(Hadoop)
producer
(conv.)
producer
(other)
Custom Apps.
![Page 15: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/15.jpg)
Chat History. Example
producer
(agent)
producer
(sess.)
producer
(chat.)
Kafka
Storm
Fast topic
ElasticSearch
API
Consistent topic
MR job Very low latency
99.5% of data High latency99.999% of data
![Page 16: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/16.jpg)
Data Producers. Requirements Real time “Five nines” persistence Small footprint No interference with service Multiple producers & platforms Monolithic to service oriented
ManyMore
Services
![Page 17: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/17.jpg)
Data Producers. Lessons learned
Hundreds of services Complex rollouts Minimal logic to avoid painful fixes Audit streaming? Split to buckets
Real time and “five nines” persistence are incompatible
In House
1
Bucket Bucket
![Page 18: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/18.jpg)
Consistent Topic
Send message to Kafka
local file
Persist message to local disk
Kafka Bridge
Send message to Kafka
Fast Topic
Kafka Resilience
Real-time Customers
Offline Customers
Kafka
Data Producers. Flow
![Page 19: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/19.jpg)
Data Model Framework
Why Avro: Schema based evolution Performance - Untagged bytes HDFS ecosystem support
Lessons Learned: Schema evolution breaks Big schema (ours is over 65k) not recommended Avoid deep nesting and multiple unions Need a framework
Chaos – Non-Schema space delimited
Order – Avro Schema
![Page 20: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/20.jpg)
Framework Flow
1. Event is created according to Avro Schema version 3.5
2. Schema is registered into the repository (once)
3. Value 3.5 is written to header4. Event is encoded with schema
version 3.5 and added to message5. Message is sent to Kafka6. Message is read by consumer7. Header is read from message8. Schema is retrieved from repository
according to scheme version9. Event decoded using the proper Avro
schema10.Decoded event is processed
3.5
3.5
Consumer
Repository
![Page 21: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/21.jpg)
Apache Kafka More than 15 billion events a day More than 1 million events per second Hundreds of producers & consumers
Why Kafka? Scale where traditional MQs fail Industry standard for big data log messaging Reliable, flexible and easy to use
Deployment: We have 15 clusters across the world Our biggest cluster has 8 nodes with more than 6TB (Avro + Kafka
compression) Maximum retention of 72 hours
![Page 22: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/22.jpg)
Apache Kafka. Lessons Learned Scale horizontally for hardware resources and vertically for throughput
Look at trends of network & IO & Kafka's JMX statistics
Partitions Servers
Bytes in
![Page 23: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/23.jpg)
Apache Kafka. Lessons Learned cont. Know your data and message sizes:
Large messages can break you Data growth can overfill your capacity Set the right configuration
Adding or removing a broker is not trivial
Decide on single or multiple topics
![Page 24: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/24.jpg)
Apache Storm
Why Storm? Growing community with good integration to Kafka At the time, it was the leading product Easy development and customization The POC was successful
Deployment: We have 6 clusters across the world Our biggest cluster has more then 30 nodes We have 20 topologies on a single cluster Uptime of months for a single topology
![Page 25: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/25.jpg)
Apache Storm. Typical topology
Storm Topology
KAFKA SPOUT FILTER BOLT WRITER BOLT
emit emit
ack ack
fetch
Zookeeper
Kafka Fast topic
writecommit
![Page 26: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/26.jpg)
Apache Storm. Lessons learned Develop SDK and educate R&D Where did my topology run last week? What is my overtime capacity?
Know your bolts, must return a timely answer Coding is easy, performance is hard Use isolation
Capacity
![Page 27: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/27.jpg)
Apache Storm. Lessons learned cont. Use local shuffling Use Ack
KAFKA SPOUT FILTER BOLT WRITER BOLT
KAFKA SPOUT FILTER BOLT WRITER BOLT
Local emit
ACKER BOLT
ACKER BOLT
COMM BOLT
COMM BOLT
Worker A
Worker B
Local emit
Local emit
Local emit
![Page 28: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/28.jpg)
Summary
No one-size-fits-all solution Ask product for a clearly defined SLA Separate between fast and consistent data flows - they don’t merge!
Use schema for a data model - keep it flat and small Kafka rules! It’s reliable and fast - use it Storm has it’s toll. For some use-cases we would be using Spark Streaming today
![Page 29: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/29.jpg)
THANK YOU!
We are hiring
http://www.liveperson.com/company/careers
Q/A
![Page 30: Liveperson DLD 2015](https://reader035.vdocuments.pub/reader035/viewer/2022081414/589c57261a28abc4358b4e3d/html5/thumbnails/30.jpg)
YouTube.com/LivePersonDev
Twitter.com/LivePersonDev
Facebook.com/LivePersonDev
Slideshare.net/LivePersonDev