cassnadra vs hbase

31
Cassandra vs HBase Similarities and differences in the architectural approaches

Upload: dmitri-babaev

Post on 15-Jan-2015

2.836 views

Category:

Technology


0 download

DESCRIPTION

Cassandra Moscow, April 2013 meetup

TRANSCRIPT

Page 1: Cassnadra vs HBase

Cassandra vs HBaseSimilarities and differences in the

architectural approaches

Page 2: Cassnadra vs HBase

Foundation papers

● The Google File System; Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

● Bigtable: A Distributed Storage System for Structured Data; Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber

● Dynamo: Amazon’s Highly Available Key-value Store; Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels

Page 3: Cassnadra vs HBase

Agenda

● Storage: LSM trees● Data distribution in cluster● Fault-tolerance

Page 4: Cassnadra vs HBase

Log-structured merge tree layout

Page 5: Cassnadra vs HBase

Log-structured merge tree

● Writes are aggregated in memory and then flushed to disk in one batch○ Memtable is actually a write-behind cache○ Write-ahead log (disk commit log) is used to protect

in-memory data from node failures● In-memory entries are asynchronously

persisted as a single segment (file) of records sorted by key○ The segments are asynchronously merged together

in order to get log(number of records) segments

Page 6: Cassnadra vs HBase

Why LSM tree is good for HBase

● LSM tree suites well for HDFS○ LSM tree writes data in large batches○ SSTables are immutable

Page 7: Cassnadra vs HBase

LSM tree problems

● Relatively slow read○ The requested key can be in any segment hence all

of them should be checked■ Key cache (Cassandra)■ Bloom filters can be used to ignore some of the

files● They are prone to false-positives

● Early versions of HDFS had no support for an append operation○ Append is required for write ahead log○ hflush in HDFS 0.21 allows to flush written data

without closing the file

Page 8: Cassnadra vs HBase

Agenda

● Storage: LSM trees● Data distribution in cluster● Fault-tolerance

Page 9: Cassnadra vs HBase

Shared nothing architecture

● Each node processes requests for its own shard of data

● It is always known which node is responsible for the particular key

Page 10: Cassnadra vs HBase

Cassandra entry distribution

Page 11: Cassnadra vs HBase

Cassandra distributed storage

● Consistent Hashing (node ring) is used to distribute column family around the cluster nodes

● A node is responsible for storing key range which hashes less than its own number (token)○ Node tokens are set explicitly in config

Page 12: Cassnadra vs HBase

Virtual nodes

● Virtual nodes are available for Cassandra from v1.2

● No need for manual token assignment○ Make data distribute evenly across the physical

nodes○ It is simpler to set which proportion of data is stored

on the particular node

Page 13: Cassnadra vs HBase

Cassandra partition strategies

● Random partitioner○ Node is determined by the key MD5 hash

● Byte-ordered partitioner○ Node is determined by the number constructed from

first bytes of the key○ Allows range queries○ Prone to uneven data distribution

Page 14: Cassnadra vs HBase

Cassandra secondary indexes placement

Page 15: Cassnadra vs HBase

HBase region distribution

Page 16: Cassnadra vs HBase

HBase distributed storage

● Region meta table○ Continuous range of keys is a region○ Root table stores regions for meta table itself○ Master try to evenly distribute regions across

RegionServers■ Regions can be moved between region servers

in order to achieve better distribution● Since actual data is in HDFS no data is moved during the

process

● Secondary attribute queries○ DIY indexes: coprocessors

Page 17: Cassnadra vs HBase

Region splits/merges

● Initially only one region is allocated for a table

● Uneven region sizes● Online region splitting

○ Data is not copied, new region's files just hold links on the data in old region's files

● Region merging is still unstable

Page 18: Cassnadra vs HBase

Agenda

● Storage: LSM trees● Data distribution in cluster● Fault-tolerance

Page 19: Cassnadra vs HBase

HBase cluster nodes

Page 20: Cassnadra vs HBase

HDFS and CAP theorem

● CP○ HDFS replicates data synchronously on write○ DataNode is considered dead if is not visible for the

NameNode■ Lost block replicas will be restored automatically

on live nodes○ DataNode stops serving requests if the NameNode

is lost

Page 21: Cassnadra vs HBase

HDFS block replication in cluster

Page 22: Cassnadra vs HBase

HDFS block replication

● HDFS tends to store one copy of a block on the same server with client○ if there is a DataNode on the same server

● HDFS Rack Awareness○ one copy on the client server○ one on the same rack○ one on different rack

Page 23: Cassnadra vs HBase

HDFS disadvantages (if used as storage for HBase)

● HDFS requires an additional request to NameNode in order to find a DataNode storing required block

● Data should be transferred from the DataNode to RegionServer on reads in some cases○ HBase is not taking region file blocks locations to

account when it assigns regions to RegionSevers

Page 24: Cassnadra vs HBase

HBase inter-cluster replication

● Master-slave inter-cluster synchronous replication

● Request to region server is replicated to slave HBase cluster

Page 25: Cassnadra vs HBase

Cassandra and CAP theorem

● AP○ Gossip style failure detection

■ Failed node is still in the ring● New replica for a data range will be assigned only if failed

node is manually removed○ Async write

■ A node will replicate the write to appropriate nodes but return to client immediately

● Can also be "eventually" consistent○ Quorum write

■ Blocks until certain number of writes is reached■ But there is no distributed commit protocol

○ Quorum reads

Page 26: Cassnadra vs HBase

Lack of distributed commit protocol issue

1. Client writes to all replicas2. Write on one of the replicas is failed3. Write operation is failed4. All of the replicas except one persisted the

failed write

Page 27: Cassnadra vs HBase

Inconsistent write repair measures

● Read repairs○ Difference in results will be detected on read from

multiple replicas● Hinted handoff

○ Failed write is remembered and be retried by the coordinator node

● Anti-Entropy○ Manually started replica reconciliation

Page 28: Cassnadra vs HBase

Cassandra simple replication

Page 29: Cassnadra vs HBase

Cassandra network topology based replication

Page 30: Cassnadra vs HBase

Cassandra replica placement strategies

● Simple○ The closest neighbor down the ring is selected as a

replica● Network topology based

○ Additional replicas are placed by walking the ring clockwise until a node in a different rack is found■ If no such node exists, additional replicas are

placed in different nodes in the same rack○ Server - DC:Rack mappings are set explicitly in

config