cassandra meetup 20150331

Post on 30-Jul-2015

776 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

@WrathOfChris github.com/WrathOfChris . blog.wrathofchris.com

Time Series Metrics with Cassandra

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

About Me

• Chris Maxwell

• @WrathOfChris

• Sr Systems Engineer @ Ubiquiti Networks

• Cloud Guy

• DevOps

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Mission

• Metrics service for internal services

• Deliver 90 60 30 days of system and app metrics

• Gain experience with Cassandra

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

HistoryAncient Designs

Aging Tools

Pitfalls

https://flic.kr/p/6pqVnP

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Graphite (v1)

• Single instance

• carbon-relay + (2-4) carbon-cacheprocesses (=cpu)

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Graphite (v1)

Problems:

• Single point of SUCCESS!

• Can grow to 16-32 cores, but I/O saturation

• Carbon write-amplifies 10x (flushes every 10s)

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Graphite (v2)

• Frontend: carbon-relay

• Backend: carbon-relay + 4x carbon-cache

• m3.2xlarge ephemeral SSD

• Manual consistent-hash by IP

• Replication 3

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Graphite (v2)Problems:

• Kind of like a Dynamo, but not

• Replacing node requires full partition key shuffle

• Adding 5 nodes took 6 days on 1Gbps to re-replicate ring

• Less than 50% disk free means pain during reshuffle

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Limitations

• Cloud Native

• Avoid Manual Intervention

• Ephemeral SSD > EBS

https://flic.kr/p/2hZy6P

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

DesignWhat we set out to build

https://flic.kr/p/2spiXb

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Graphite (v3)…it got complicated…

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Graphite (v3)

Ingest:

• carbon-c-relayhttps://github.com/grobian/carbon-c-relay

• cyanitehttps://github.com/pyr/cyanite

• cassandra

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Graphite (v3)

Retrieval:

• graphite-apihttps://github.com/brutasse/graphite-api

• grafanahttps://github.com/grafana/grafana

• cyanitehttps://github.com/pyr/cyanite

• elasticsearch(metric path cache)

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

JourneyLessons learned along the way

https://flic.kr/p/hjY15L

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Size Tiered Compaction

• Sorted String Table (SSTable) is an immutable data file

• New data written to small SSTables

• Periodically merged into larger SSTables

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Size Tiered Compaction

• Merge 4 similarly sized SSTables into 1 new SSTable

• Data migrates into larger SSTables that are less-regularly compacted

• Disk space required:Sum of 4 largest SSTables

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Size Tiered Compaction

• Updating a partition frequently may cause it to be spread between SSTables

• Metrics workload writes toall partitions,every period

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Size Tiered Compaction

• Metrics workload writes toall partitions,every period

• Range queries that spanned 50+ SSTables !!!

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Size Tiered Compaction

• Getting to the older data…

• Ingest 25% more data

• Major Compaction:

• Requires 50% free space

• Compacts all SSTables into 1 large SSTable

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Aside: DELETE

• DELETE is the INSERT of a TOMBSTONE to the end of a partition

• INSERTs with TTL become tombstones in the future

• Tombstones live for at least gc_grace_seconds

• Data is only deleted during compaction

https://flic.kr/p/35RACf

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

gc_grace_secondsGrace is getting something you don’t deserve(time to noetool repair a node that is down)

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

gc_grace_secondsdeleted data reappears!

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Time To Live

• INSERT with TTL becomes tombstone after expiry

• 10s for 6 hours

• 60s for 3 days

• 300s for 30 days

https://flic.kr/p/6Fxv7M

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

TTL

• gc_grace_seconds is 10 days(by default)

• 10s for 6 hours 10.25 days

• 60s for 3 days 13 days

• 300s for 30 days 40 days

https://flic.kr/p/gBLHYf

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

https://flic.kr/p/4LNiXg

https://flic.kr/p/35RACf

1.4TBDisks

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Levelled Compaction

based on Google’s LevelDB implementation

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Levelled Compaction

• Data is ingested at Level 0

• Immediately compacted and merged with L1

• Partitions are merged up to Ln

• 90% of partition data guaranteed to be in same level

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Levelled Compaction• Metrics workload writes to

all partitions,every period

• Immediately rolled up to L1

• Immediately rolled up to L2

• Immediately rolled up to L3

• Immediately rolled up to L4

• Immediately rolled up to L5

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Levelled Compaction

• Metrics workload writes toall partitions,every period

• 1 batch of writes —> 5 writes

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Increasing Write rate

Constant Ingest rate

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Increasing Write rate

Constant Ingest rate

https://flic.kr/p/4LNiXg

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

compaction_throughput_mb_per_sec: 128

…then 0 (unlimited)

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Speeding Compactions… Don’t Do This …multithreaded: true

cassandra_in_memory_compaction_limit_in_mb: 256M

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Date Tiered Compaction

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Date Tiered Compaction

• Written by Björn Hegerfors at Spotify

• Experimental!

• Released in 2.0.11 / 2.1.1

• Group data by time

• Compact by time

• Drop expired data by time

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Compact SSTables by date window

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

– but the docs say 8GB maximum heap!

MAX_HEAP_SIZE=16GHEAP_NEWSIZE=2048M

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

– Rick Branson, Instagram

http://www.slideshare.net/planetcassandra/cassandra-summit-2014-cassandra-at-instagram-2014

-XX:+CMSScavengeBeforeRemark

-XX:CMSMaxAbortablePrecleanTime=60000

-XX:CMSWaitDuration=30000

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

All systems normalInadvertently tested 30,000 writes/sec during

launch

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Cloud Native

http://wattsupwiththat.com/2015/03/17/spaceship-lenticular-cloud-maybe-the-coolest-cloud-picture-evah/

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Cloud NativeEc2MultiRegionSnitch

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Cloud NativeEphemeral RAID0

-Djava.io.tmpdir=/mnt/cassandra/tmp

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Disable AutoScaling Terminate Process:

aws autoscaling suspend-processes --scaling-processes Terminate

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Cloud NativeThis design works to 50 instances per region

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Security GroupsIAM instance-profile role

Security Group + (per region) Security Group

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Management (OpsCenter)IAM instance-profile role

Security Group + (per region) Security Group

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Internode Encryption

server_encryption_options: internode_encryption: all

• keytool -genkeypair -alias test-cass -keyalg RSA -validity 3650 \-keystore test-cass.keystore

• keytool -export -alias test-cass -keystore test-cass.keystore \-rfc -file test-cass.crt

• keytool -import -alias test-cass -file test-cass.crt -keystore \ test-cass.truststore

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

SeedsCheated….

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Seeds

• selects first 3 nodes from each region using Autoscale Group order

• ignores (self) as a seed for bootstrapping first 3 nodes in each region

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

General• >= 4 Cores per node always

• >= 8 Cores as soon as feasible

• EC2 sweet spots:

• m3.2xlarge (8c/160GB) for small workloads

• i2.2xlarge (8c/1.6TB) for production

• Avoid c3.2xlarge - CPU:Mem ratio is too high

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Breaking News!Dense-storage Instances for EC2

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

Questions?

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

d2 instancesJoining a node - system/network

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

d2 instancesJoining a node - disk performance

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

GeneralMetrics

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

GeneralCassandra Metrics

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

MetricsCPU - DateTiered

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

MetricsJVM - DateTiered

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris

MetricsCompaction/CommitLog - DateTiered

top related