building your own distributed system the easy way - cassandra summit eu 2014

84

Upload: alprema

Post on 11-Jul-2015

381 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Building your own Distributed System The easy way - Cassandra Summit EU 2014
Page 2: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Building Your Own Distributed SystemThe Easy Way

Kévin Lovato - @alprema

Page 3: Building your own Distributed System The easy way - Cassandra Summit EU 2014

What this presentation will NOT talk about

• Gazillions of inserts per second

• Hundreds of nodes

• Migrations from old technology to C* that now go 100 times faster

Page 4: Building your own Distributed System The easy way - Cassandra Summit EU 2014

What this presentation will talk about

• Servers that synchronize their state

• Out of order messages

• CQL Schema design

• Time measurement madness

Page 5: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Introduction

Page 6: Building your own Distributed System The easy way - Cassandra Summit EU 2014

• Hedge fund specialized in algorithmic trading

• ~80 employees

• Our C* usage• Historical data (6+ Tb)• Time series (Metrics)• Home made Service Bus (Zebus)

Page 7: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Service Bus 101

• Network abstraction layer

• Allows communication between services (SOA)

• Communication is enabled using Business level messages (events)

• Usually relies on a broker

Page 8: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Zebus 101• Developed in .Net

• P2P

• Lightweight

• CQRS oriented

• 1+ year of production experience

• ~150M messages / day

Page 9: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Architecture overview

Page 10: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Terminology• Peer: A program connected to the Bus

• Subscription: A message type a Peer is interested in

• Directory server: A Peer that knows all the Peers and their Subscriptions

Page 11: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1 Peer 2

Directory 1 Directory 2

Peer 3

Peer 1 is not connected and needs to register on the bus

Page 12: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1 Peer 2

Directory 1 Directory 2

Peer 3

Register Peers list +Subscriptions

Page 13: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1 Peer 2

Directory 1 Directory 2

Peer 3

New Peer information

Page 14: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1 Peer 2

Directory 1 Directory 2

Peer 3

Direct communication

Page 15: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Design requirements

Page 16: Building your own Distributed System The easy way - Cassandra Summit EU 2014

The Directory servers must be identical (no master)

Page 17: Building your own Distributed System The easy way - Cassandra Summit EU 2014

The Directory servers must be identical (no master)

A peer can contact any of the Directory servers at any time

Page 18: Building your own Distributed System The easy way - Cassandra Summit EU 2014

The Directory servers must be identical (no master)

A peer can contact any of the Directory servers at any time

Directory servers can be updated/restarted at any time

Page 19: Building your own Distributed System The easy way - Cassandra Summit EU 2014

The Directory servers must be identical (no master)

A peer can contact any of the Directory servers at any time

Directory servers can be updated/restarted at any time

Peers have to be able to add Subscriptions one at a time if needed

Page 20: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Option 1: Design a resilient distributed system

Page 21: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Option 2: Let Cassandra do the heavy lifting

Pick me!Pick me!

Page 22: Building your own Distributed System The easy way - Cassandra Summit EU 2014

How ?

Page 23: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Make the Directory Servers stateless

I

Page 24: Building your own Distributed System The easy way - Cassandra Summit EU 2014

• Allows to offload state synchronization to Cassandra (Quorum everywhere)

• Makes restart / crash recovery easy

• Only « business » code in the Directory Server

Page 25: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Handle out of order subscriptions

II

Page 26: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1 Directory 2

Timestamps:Naive implementation (server side)

Peer 1 is already registered on the Bus and will need to do multiple Subscription updates

Page 27: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1 Directory 2

Subscriptions update A

Timestamps:Naive implementation (server side)

Page 28: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1 Directory 2

Subscriptions update B

Timestamps:Naive implementation (server side)

Page 29: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1 Directory 2

A delay (network, slow machine, etc.) causesDirectory 1 to process the update after Directory 2

Timestamps:Naive implementation (server side)

Page 30: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1 Directory 2

Subscriptions update BTimestamp: 00:00:01

Timestamps:Naive implementation (server side)

Page 31: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1 Directory 2

Subscriptions update ATimestamp: 00:00:02

Timestamps:Naive implementation (server side)

Page 32: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1 Directory 2

Stored: Subscriptions update ATimestamps:Naive implementation (server side)

Page 33: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Stored: Subscriptions update ATimestamps:Naive implementation (server side)

Page 34: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1 Directory 2

Timestamps:Zebus implementation (client side)

Same scenario, but this time using client sidetimestamps

Page 35: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1 Directory 2

Subscriptions update ATimestamp: 00:00:01

Timestamps:Zebus implementation (client side)

Page 36: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1 Directory 2

Subscriptions update BTimestamp: 00:00:02

Timestamps:Zebus implementation (client side)

Page 37: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1 Directory 2

Timestamps:Zebus implementation (client side)

The delay voodoo happens again

Page 38: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1 Directory 2

Subscriptions update BTimestamp: 00:00:02

Timestamps:Zebus implementation (client side)

Page 39: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1 Directory 2

Subscriptions update ATimestamp: 00:00:01

Timestamps:Zebus implementation (client side)

Page 40: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1 Directory 2

Timestamp resolution is handled by C*Stored: Subscriptions update B

Timestamps:Zebus implementation (client side)

Page 41: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Timestamp resolution is handled by C*Stored: Subscriptions update B

Timestamps:Zebus implementation (client side)

Page 42: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Handle subscriptions efficiently

III

Page 43: Building your own Distributed System The easy way - Cassandra Summit EU 2014

A Peer is already registered on the bus, and has subscribed to one event type

Peer 1

Directory 1

Peer ID MessageType Sub. Info

Peer.1 CoolEvent { misc. Info }Initial subscriptions

Page 44: Building your own Distributed System The easy way - Cassandra Summit EU 2014

It now needs to add a new subscription

Peer 1

Directory 1

Peer ID MessageType Sub. Info

Peer.1 CoolEvent { misc. Info }Initial subscriptions

Page 45: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1

Peer ID MessageType Sub. Info

Peer.1 CoolEvent { misc. Info }

Peer.1 OtherEvent(new) { misc. Info }

It will send all its current subscriptions + the new one

Page 46: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1

Now imagine that the peer adds 10 000 subscriptions

Page 47: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1

Now imagine that the peer adds 10 000 subscriptions, one at a time

Page 48: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1

Peer ID MessageType Sub. Info

Peer.1 CoolEvent { misc. Info }

Peer.1 OtherEvent(new) { misc. Info }

…10 000 other events…

Peer.1 NthEvent { misc. Info }

10 000x times

Page 49: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1

Solution: Transfer subscriptions by message type

Page 50: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1

Peer ID MessageType Sub. Info

Peer.1 NewEvent (1st) { misc. Info }

Page 51: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1

Peer ID MessageType Sub. Info

Peer.1 NewEvent (2nd) { misc. Info }

And so on…

Page 52: Building your own Distributed System The easy way - Cassandra Summit EU 2014

But then, how do we store that?

Page 53: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Pick the proper row granularity

IV

Page 54: Building your own Distributed System The easy way - Cassandra Summit EU 2014

• We want to only do upserts (no read-before-write)

• We want Cassandra to use client timestamps to resolve out of order updates

• Subscriptions have to be updatable one by one

Page 55: Building your own Distributed System The easy way - Cassandra Summit EU 2014

One subscription per rowPeer ID MessageType Subscription Info

Peer.18 CoolEvent { misc. Info }

… … …

• Primary Key (Peer Id, MessageType)

Page 56: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Directory

Peer 1 Peer 2

Peer 1 and 2 need to register on the Bus

Page 57: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer ID MessageType Sub. Info

Peer.1 CoolEvent { misc. Info }

Peer.1 OtherEvent { misc. Info }

Directory

Peer 1 Peer 2

• Peer 1 registers with 2 Subscriptions

Page 58: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Directory

Peer 1 Peer 2

• Peer 1 registers with 2 Subscriptions

• Directory starts to write to C*Writing

Page 59: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Directory

Peer 1 Peer 2

• Peer 1 registers with 2 Subscriptions

• Directory starts to write to C*• Peer 2 registers during the write

Register

Still writing

Page 60: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Directory

Peer 1 Peer 2

• Peer 1 registers with 2 Subscriptions

• Directory starts to write to C*• Peer 2 registers during the write• Since insertion was not over,

Peer 2 gets an incomplete state

Still writing

Peer ID MessageType Sub. Info

Peer.1 CoolEvent { misc. Info }

Page 61: Building your own Distributed System The easy way - Cassandra Summit EU 2014

All subscriptions in one rowPeer ID All Subscriptions Blob

Peer.18 { blob }

… …

• Primary Key (Peer Id)

Page 62: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1 Directory 2

Peer 1 is already registered on the Bus and needs to add two Subscriptions

Page 63: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Add Subscription 1

Directory 1 Directory 2

• Peer 1 adds Subscription 1

Page 64: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Add Subscription 2

Directory 1 Directory 2

• Peer 1 adds Subscription 1• Peer 1 adds Subscription 2

Page 65: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Peer 1

Directory 1 Directory 2

A delay (again!) slows down Directory 1, causing bothSubscriptions to be added simultaneously

Page 66: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Directory 1 Directory 2

State:No subscriptions

Peer 1

• Peer 1 adds Subscription 1• Peer 1 adds Subscription 2• Directory 1 gets the state to add Subscription 1• Directory 2 gets the state to add Subscription 2

State:No subscriptions

Page 67: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Directory 1 Directory 2

Store:Subscription 1

Peer 1

• Peer 1 adds Subscription 1• Peer 1 adds Subscription 2• Directory 1 gets the state to add Subscription 1• Directory 2 gets the state to add Subscription 2• They both store the updated state to C*

Store:Subscription 2

Page 68: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Directory 1 Directory 2

Peer 1

• Peer 1 adds Subscription 1• Peer 1 adds Subscription 2• Directory 1 gets the state to add Subscription 1• Directory 2 gets the state to add Subscription 2• They both store the updated state to C*• Both store only their new subscription

Stored:Either Subscription 1 or 2 depending on which was the slowest

Page 69: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Solution: Compromise• We split subscriptions into Static and Dynamic subscriptions

• Static subscriptions cannot be updated one-by-one

• The Dynamic subscriptions list cannot be handled as atomic

• Each type has its own Column Family

Page 70: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Static subscriptions schema

Peer ID Endpoint IsUp […] StaticSubscriptions

Peer.18 tcp://1.2.3.4:123 true […] { blob }

… … … […] …

• Primary Key (Peer Id)

Page 71: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Dynamic subscriptions schema

Peer ID MessageType Subscription info

Peer.18 UserCreated { misc. Info }

… … …

• Primary Key (Peer Id, MessageType)

Page 72: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Miscellaneous bits of “fun”V

Page 73: Building your own Distributed System The easy way - Cassandra Summit EU 2014

DateTime.Now• Calling DateTime.Now twice in a row can (and will) return the same value

• Its resolution is around 10ms

• We had to create a unique timestamp provider (add 1 tick when called in same « time bucket »)

Page 74: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Cassandra timestamp• .Net’s DateTime.Ticks is more precise than Cassandra’s timestamps (100

ns vs. 1 µs)

• Our custom time provider ensured uniqueness by adding 1 tick at a time, which was lost in translation

Page 75: Building your own Distributed System The easy way - Cassandra Summit EU 2014

« UselessKey »

• The Directory CF is really small and needs to be retrieved entirely and frequently

• We used a « bool UselessKey » PartitionKey to force sequential storage and squeeze the last bits of speeds we needed

Page 76: Building your own Distributed System The easy way - Cassandra Summit EU 2014

« UselessKey »

• Primary Key (UselessKey, Peer Id, MessageType)

• You should bench (after a flush) with your real data

UselessKey Peer ID MessageType Subscription info

false Peer.18 UserCreated { misc. Info }

… … …

Page 77: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Summary

Page 78: Building your own Distributed System The easy way - Cassandra Summit EU 2014

When you have multiple servers sharing a state, Cassandra can save you some headaches

Page 79: Building your own Distributed System The easy way - Cassandra Summit EU 2014

When you have multiple servers sharing a state, Cassandra can save you some headaches

The schema design is very critical, think it thoroughly and make sure you understand what is atomic and what is not

Page 80: Building your own Distributed System The easy way - Cassandra Summit EU 2014

When you have multiple servers sharing a state, Cassandra can save you some headaches

The schema design is very critical, think it thoroughly and make sure you understand what is atomic and what is not

Client provided timestamps can be very useful, but be sure to generate unique timestamps

Page 81: Building your own Distributed System The easy way - Cassandra Summit EU 2014

When you have multiple servers sharing a state, Cassandra can save you some headaches

The schema design is very critical, think it thoroughly and make sure you understand what is atomic and what is not

Client provided timestamps can be very useful, but be sure to generate unique timestamps

If you are not using Java, be well-aware of data types differences between your language and Java

Page 82: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Want to see the code ?www.github.com/Abc-Arbitrage

Page 83: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Want to see more code [email protected]

Page 84: Building your own Distributed System The easy way - Cassandra Summit EU 2014

Questions ?