cassandraatebay-120809022318-phpapp01
TRANSCRIPT
-
8/18/2019 cassandraatebay-120809022318-phpapp01
1/30
Cassandra at eBa
Jay Patel Architect, Platform Sys
@pateljay3001
Time left: 29m 59s
-
8/18/2019 cassandraatebay-120809022318-phpapp01
2/30
eBay Marketplaces
97 million active buyers and sellers
200+ million items
2 billion page views each day
80 billion database calls each day
5+ petabytes of site storage capacity
80+ petabytes of analytics storage capacity
-
8/18/2019 cassandraatebay-120809022318-phpapp01
3/30
-
8/18/2019 cassandraatebay-120809022318-phpapp01
4/30
We like Cassandra
Multi-datacenter (active-active)
Availability - No SPOF Scalability
We also utilize MongoDB & HBase
Write perfo
Distributed Hadoop sup
-
8/18/2019 cassandraatebay-120809022318-phpapp01
5/30
Are we replacing RDBMS with No
Some use cases don’t fit well - sparse data, big data,
optional, real-time analytics, …
Many use cases don’t need top-tier set-ups - logging,
Not at all! But, complementing.
-
8/18/2019 cassandraatebay-120809022318-phpapp01
6/30
A glimpse on our Cassandra deploym
Dozens of nodes across multiple clusters
200 TB+ storage provisioned
400M+ writes & 100M+ reads per day, and g
QA, LnP, and multiple Production clusters
-
8/18/2019 cassandraatebay-120809022318-phpapp01
7/30
Use Cases on Cassandra
Social Signals on eBay product & item page
Hunch taste graph for eBay users & items
Time series use cases (many):
Mobile notification logging and tracking
Tracking for fraud detection SOA request/response payload logging
RedLaser server logs and analytics
-
8/18/2019 cassandraatebay-120809022318-phpapp01
8/30
-
8/18/2019 cassandraatebay-120809022318-phpapp01
9/30
-
8/18/2019 cassandraatebay-120809022318-phpapp01
10/30
Why Cassandra for Social Signals?
Need scalable counters
Need real (or near) time analytics on collected socia
Need good write performance
Reads are not latency sensitive
-
8/18/2019 cassandraatebay-120809022318-phpapp01
11/30
Deployment
Topology - NTS
RF - 2:2
Read CL - ONE
Write CL – ONE
Data is backe
to protect aga
software erro
User request has no d
Non-sticky
-
8/18/2019 cassandraatebay-120809022318-phpapp01
12/30
Data Model
depends on query patterns
-
8/18/2019 cassandraatebay-120809022318-phpapp01
13/30
Data Model (simplified)
-
8/18/2019 cassandraatebay-120809022318-phpapp01
14/30
Wait…
Duplicate
Oh, toggle button!
Signal --> De-signal --
-
8/18/2019 cassandraatebay-120809022318-phpapp01
15/30
Yes, eventual consistency!
One scenario that produces duplicate signals in UserLik
1. Signal
2. De-signal (1st operation is not propagated to all re
3. Signal, again (1st operation is not propagated yet
So, what’s the solution? Later…
-
8/18/2019 cassandraatebay-120809022318-phpapp01
16/30
Social Signals, next phase: Real-time Ana
Most signaled or popular items per affinity groups (category, etc
Aggregated item count per affinity group
Example affinity
-
8/18/2019 cassandraatebay-120809022318-phpapp01
17/30
Initial Data Model for real-time analytics
Update counters for both
and all the affinity groups
belongs to
Items
is phy
sorte
coun
-
8/18/2019 cassandraatebay-120809022318-phpapp01
18/30
Deployment, next phase
Topology - NTS
RF - 2:2:2
-
8/18/2019 cassandraatebay-120809022318-phpapp01
19/30
item1
user2
user1
item2
buy
bid
sellwatch
G h i C d
-
8/18/2019 cassandraatebay-120809022318-phpapp01
20/30
Graph in Cassandra
Event consumers listen for site events (sell/bid/buy/watch) & populate gra
30 million+ writes daily
14 billion+ edges already
Batch-oriented reads
(for taste vector updates)
-
8/18/2019 cassandraatebay-120809022318-phpapp01
21/30
Mobile notification logging and tracking
Tracking for fraud detection
SOA request/response payload logging
RedLaser server logs and analytics
li d l
-
8/18/2019 cassandraatebay-120809022318-phpapp01
22/30
A glimpse on Data Model
R dL ki & i i l
-
8/18/2019 cassandraatebay-120809022318-phpapp01
23/30
RedLaser tracking & monitoring console
Th t’ ll b t th
-
8/18/2019 cassandraatebay-120809022318-phpapp01
24/30
Remember the duplicate problem in Use
Let’s see some options we considered to sol
That’s all about the use cases..
O ti 1 M k ‘Lik ’ id t t f U
-
8/18/2019 cassandraatebay-120809022318-phpapp01
25/30
Option 1 – Make ‘Like’ idempotent for U
Remove time (timeuuid) from the composite column
Multiple signal operations are now Idempotent
No need to read before de-signaling (deleting)
X Need timeuuid for orderinAlready have a user with more than 1300 signa
O ti 2 U t i t
-
8/18/2019 cassandraatebay-120809022318-phpapp01
26/30
Option 2 – Use strong consistency
Local Quorum
– Won’t help us. User requests are not geo-load ba(no DC affinity).
Quorum
–Won’t survive during partition between DCs (or, oDC is down). Also, adds additional latency.
X Need to survive!
Option 3 Adapt to eventual consistency
-
8/18/2019 cassandraatebay-120809022318-phpapp01
27/30
Option 3 – Adapt to eventual consistency
If desire survival!
http://www.strangecosmos.com/content/item/101254.html
Adjustments to eventual consistency
-
8/18/2019 cassandraatebay-120809022318-phpapp01
28/30
Adjustments to eventual consistency
De-signal steps:
– Don’t check whether item is already signaled by a user, or not
–Read all (duplicate) signals from UserLike_unordered (new CF to whole row from UserLike)
– Delete those signals from UserLike_unordered and UserLike
Still, can get duplicate signals or false positives as there is a ‘read be
Not a full storyTo shield further, do ‘repair on read’.
Lessons & Best Practices
-
8/18/2019 cassandraatebay-120809022318-phpapp01
29/30
Lessons & Best Practices
• Choose proper Replication Factor and Consistency Level.
– They alter latency, availability, durability, consistency and cost.
– Cassandra supports tunable consistency, but remember strong cons
• Consider all overheads in capacity planning.
– Replicas, compaction, secondary indexes, etc.
• De-normalize and duplicate for read performance.
– But don’t de-normalize if you don’t need to.
• Many ways to model data in Cassandra.
– The best way depends on your use case and query patterns.
More on http://ebaytechblog.com?p=1308
http://ebaytechblog.com/?p=1308http://ebaytechblog.com/?p=1308
-
8/18/2019 cassandraatebay-120809022318-phpapp01
30/30
@pateljay3001
#cassandra12
Thank You