cassandra and materialized views

63
Cassandra 2.2 and 3.0 new features DuyHai DOAN Apache Cassandra Technical Evangelist #VoxxedBerlin @doanduyhai

Upload: grzegorz-duda

Post on 17-Feb-2017

460 views

Category:

Software


2 download

TRANSCRIPT

Page 1: Cassandra and materialized views

1

Cassandra 2.2 and 3.0 new features

DuyHai DOAN Apache Cassandra Technical Evangelist

#VoxxedBerlin @doanduyhai

Page 2: Cassandra and materialized views

Datastax

2

•  Founded in April 2010

•  We contribute a lot to Apache Cassandra™

•  400+ customers (25 of the Fortune 100), 450+ employees

•  Headquarter in San Francisco Bay area

•  EU headquarter in London, offices in France and Germany

•  Datastax Enterprise = OSS Cassandra + extra features

Page 3: Cassandra and materialized views

Materialized Views (MV)•  Why ? •  Detailed Impl •  Gotchas

Page 4: Cassandra and materialized views

Why Materialized Views ?•  Relieve the pain of manual denormalization

CREATE TABLE user( id int PRIMARY KEY, country text, …);CREATE TABLE user_by_country( country text, id int, …, PRIMARY KEY(country, id));

4

Page 5: Cassandra and materialized views

CREATE TABLE user_by_country ( country text, id int, firstname text, lastname text, PRIMARY KEY(country, id));

Materialzed View In ActionCREATE MATERIALIZED VIEW user_by_country AS SELECT country, id, firstname, lastnameFROM user WHERE country IS NOT NULL AND id IS NOT NULLPRIMARY KEY(country, id)

5

Page 6: Cassandra and materialized views

Materialzed View SyntaxCREATE MATERIALIZED VIEW [IF NOT EXISTS] keyspace_name.view_name

AS SELECT column1, column2, ...

FROM keyspace_name.table_name

WHERE column1 IS NOT NULL AND column2 IS NOT NULL ...

PRIMARY KEY(column1, column2, ...)

Must select all primary key columns of base table

•  IS NOT NULL condition for now •  more complex conditions in future

•  at least all primary key columns of base table (ordering can be different)

•  maximum 1 column NOT pk from base table

6

Page 7: Cassandra and materialized views

Materialized Views Demo 7

Page 8: Cassandra and materialized views

Materialized View Impl

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

① •  send mutation to all replicas•  waiting for ack(s) with CL

8

Page 9: Cassandra and materialized views

Materialized View Impl

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

② Acquire local lock on base table partition

9

Page 10: Cassandra and materialized views

Materialized View Impl

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

③ Local read to fetch current values

SELECT * FROM user WHERE id=1

10

Page 11: Cassandra and materialized views

Materialized View Impl

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

④ Create BatchLog with

•  DELETE FROM user_by_country WHERE country = ‘old_value’

•  INSERT INTO user_by_country(country, id, …) VALUES(‘FR’, 1, ...)

11

Page 12: Cassandra and materialized views

Materialized View Impl

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

⑤ Execute async BatchLog

to paired view replica with CL = ONE

12

Page 13: Cassandra and materialized views

Materialized View Impl

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

⑥ Apply base table updade locallySET COUNTRY=‘FR’

13

Page 14: Cassandra and materialized views

Materialized View Impl

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

⑦ Release local lock

14

Page 15: Cassandra and materialized views

Materialized View Impl

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

⑧ Return ack to coordinator

15

Page 16: Cassandra and materialized views

Materialized View Impl

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

⑨ If CL ack(s)

received, ack client

16

Page 17: Cassandra and materialized views

MV Failure Cases: concurrent updates

Read base row (country=‘UK’)•  DELETE FROM mv WHERE

country=‘UK’•  INSERT INTO mv …(country)

VALUES(‘US’)•  Send async BatchLog •  Apply update country=‘US’

1) UPDATE … SET country=‘US’ 2) UPDATE … SET country=‘FR’

Read base row (country=‘UK’)•  DELETE FROM mv WHERE

country=‘UK’•  INSERT INTO mv …(country)

VALUES(‘FR’)•  Send async BatchLog •  Apply update country=‘FR’

t0

t1

t2

Without local

lock

17

Page 18: Cassandra and materialized views

MV Failure Cases: concurrent updates

Read base row (country=‘UK’)•  DELETE FROM mv WHERE

country=‘UK’•  INSERT INTO mv …(country)

VALUES(‘US’)•  Send async BatchLog •  Apply update country=‘US’

1) UPDATE … SET country=‘US’ 2) UPDATE … SET country=‘FR’

Read base row (country=‘UK’)•  DELETE FROM mv WHERE

country=‘UK’•  INSERT INTO mv …(country)

VALUES(‘FR’)•  Send async BatchLog •  Apply update country=‘FR’

t0

t1

t2

Without local

lock

18

INSERT INTO mv …(country) VALUES(‘US’)INSERT INTO mv …(country) VALUES(‘FR’)

Page 19: Cassandra and materialized views

MV Failure Cases: concurrent updates

Read base row (country=‘UK’)•  DELETE FROM mv WHERE

country=‘UK’•  INSERT INTO mv …(country)

VALUES(‘US’)•  Send async BatchLog •  Apply update country=‘US’

1) UPDATE … SET country=‘US’ 2) UPDATE … SET country=‘FR’

Read base row (country=‘US’)•  DELETE FROM mv WHERE

country=‘US’•  INSERT INTO mv …(country)

VALUES(‘FR’)•  Send async BatchLog •  Apply update country=‘FR’

With local lo

ck

🔒

🔓 🔒

🔓 19

Page 20: Cassandra and materialized views

MV Failure Cases: failed updates to MV

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

⑤ Execute async BatchLog

to paired view replica with CL = ONE

MV replica d

own

20

Page 21: Cassandra and materialized views

MV Failure Cases: failed updates to MV

C*

C*

C*

C*

C* C*

C* C*

UPDATE user SET country=‘FR’

WHERE id=1

BatchLog replay

MV replica u

p

21

Page 22: Cassandra and materialized views

Materialized View Performance•  Write performance

•  local lock•  local read-before-write for MV à update contention on partition (most of perf hits) •  local batchlog for MV•  ☞ you only pay this price once whatever number of MV •  for each base table update: mv_count x 2 (DELETE + INSERT) extra mutations

22

Page 23: Cassandra and materialized views

Materialized View Performance•  Write performance vs manual denormalization

•  MV better because no client-server network traffic for read-before-write •  MV better because less network traffic for multiple views (client-side BATCH)

•  Makes developer life easier à priceless

23

Page 24: Cassandra and materialized views

Materialized View Performance•  Read performance vs secondary index

•  MV better because single node read (secondary index can hit many nodes)•  MV better because single read path (secondary index = read index + read data)

24

Page 25: Cassandra and materialized views

Materialized Views Consistency•  Consistency level

•  CL honoured for base table, ONE for MV + local batchlog

•  Weaker consistency guarantees for MV than for base table.

•  Exemple, write at QUORUM •  guarantee that QUORUM replicas of base tables have received write•  guarantee that QUORUM of MV replicas will eventually receive DELETE + INSERT

25

Page 26: Cassandra and materialized views

Materialized Views Gotchas•  Beware of hot spots !!!

•  MV user_by_gender 😱

26

Page 27: Cassandra and materialized views

Q & A

! "

27

Page 28: Cassandra and materialized views

User Define Functions (UDF)•  Why ? •  Detailed Impl •  UDAs •  Gotchas

Page 29: Cassandra and materialized views

Rationale•  Push computation server-side

•  save network bandwidth (1000 nodes!)•  simplify client-side code•  provide standard & useful function (sum, avg …)•  accelerate analytics use-case (pre-aggregation for Spark)

29

Page 30: Cassandra and materialized views

How to create an UDF ?

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALL ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnTypeLANGUAGE language AS $$ // source code here$$;

30

Page 31: Cassandra and materialized views

How to create an UDF ?

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnTypeLANGUAGE language AS $$ // source code here$$;

An UDF is keyspace-wide

31

Page 32: Cassandra and materialized views

How to create an UDF ?

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnTypeLANGUAGE languageAS $$ // source code here$$;

Param name to refer to in the code Type = CQL3 type

32

Page 33: Cassandra and materialized views

How to create an UDF ?

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnTypeLANGUAGE language // jAS $$ // source code here$$;

Always called Null-check mandatory in code

33

Page 34: Cassandra and materialized views

How to create an UDF ?

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT RETURN returnTypeLANGUAGE language // javAS $$ // source code here$$;

If any input is null, code block is skipped and return null

34

Page 35: Cassandra and materialized views

How to create an UDF ?

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnType LANGUAGE languageAS $$ // source code here$$;

CQL types •  primitives (boolean, int, …) •  collections (list, set, map) •  tuples •  UDT

35

Page 36: Cassandra and materialized views

How to create an UDF ?

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnTypeLANGUAGE languageAS $$ // source code here$$; JVM supported languages

•  Java, Scala •  Javascript (slow) •  Groovy, Jython, JRuby •  Clojure ( JSR 223 impl issue)

36

Page 37: Cassandra and materialized views

How to create an UDF ?

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnTypeLANGUAGE languageAS $$ // source code here$$;

37

Page 38: Cassandra and materialized views

UDF Demo

38

Page 39: Cassandra and materialized views

UDA•  Real use-case for UDF

•  Aggregation server-side à huge network bandwidth saving

•  Provide similar behavior for Group By, Sum, Avg etc …

39

Page 40: Cassandra and materialized views

How to create an UDA ?

CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS][keyspace.]aggregateName(type1, type2, …)SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction]INITCOND initCond;

Only type, no param name

State type

Initial state type

40

Page 41: Cassandra and materialized views

How to create an UDA ?

CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS][keyspace.]aggregateName(type1, type2, …)SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction]INITCOND initCond;

Accumulator function. Signature: accumulatorFunction(stateType, type1, type2, …)

RETURNS stateType

41

Page 42: Cassandra and materialized views

How to create an UDA ?

CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS][keyspace.]aggregateName(type1, type2, …)SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction]INITCOND initCond;

Optional final function. Signature: finalFunction(stateType)

42

Page 43: Cassandra and materialized views

How to create an UDA ?

CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS][keyspace.]aggregateName(type1, type2, …)SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction]INITCOND initCond;

UDA return type ? If finalFunction •  return type of finalFunction Else •  return stateType

43

Page 44: Cassandra and materialized views

UDA Demo

44

Page 45: Cassandra and materialized views

Gotchas

C* C*

C*

C*

UDA ①

② & ③

⑤ ② & ③

② & ③

45

Page 46: Cassandra and materialized views

Gotchas

C* C*

C*

C*

UDA ①

② & ③

⑤ ② & ③

② & ③

46

Why do not apply UDF/UDA on replica node ?

Page 47: Cassandra and materialized views

Gotchas

C* C*

C*

C*

UDA ①

② & ③

④ •  apply accumulatorFunction•  apply finalFunction

⑤ ② & ③

② & ③

1.  Because of eventual consistency

2.  UDF/UDA applied AFTER last-write-win logic

47

Page 48: Cassandra and materialized views

Gotchas

48

•  UDA in Cassandra is not distributed !

•  Execute UDA on a large number of rows (106 for ex.) •  single fat partition•  multiple partitions•  full table scan

•  à Increase client-side timeout•  default Java driver timeout = 12 secs•  JAVA-1033 JIRA for per-request timeout setting

Page 49: Cassandra and materialized views

Cassandra UDA or Apache Spark ?

49

Consistency Level

Single/Multiple Partition(s)

Recommended Approach

ONE Single partition UDA with token-aware driver because node local

ONE Multiple partitions Apache Spark because distributed reads

> ONE Single partition UDA because data-locality lost with Spark

> ONE Multiple partitions Apache Spark definitely

Page 50: Cassandra and materialized views

Cassandra UDA or Apache Spark ?

50

Consistency Level

Single/Multiple Partition(s)

Recommended Approach

ONE Single partition UDA with token-aware driver because node local

ONE Multiple partitions Apache Spark because distributed reads

> ONE Single partition UDA because data-locality lost with Spark

> ONE Multiple partitions Apache Spark definitely

Page 51: Cassandra and materialized views

Q & A

! "

51

Page 52: Cassandra and materialized views

New Storage Engine•  Data structure •  Disk space usage

Page 53: Cassandra and materialized views

Pre 3.0 data structure

Map<byte[ ], SortedMap<byte[ ], Cell>>

53

CREATE TABLE sensor_data( sensor_id uuid, date timestamp, sensor_type text, sensor_value double, PRIMARY KEY(sensor_id, date));

Page 54: Cassandra and materialized views

Pre 3.0 on disk layout

54

RowKey: de305d54-75b4-431b-adb2-eb6b9e546014 => (column=2015-04-27 10:00:00+0100:, value=, timestamp=1430128800) => (column=2015-04-27 10:00:00+0100:sensor_type, value=‘Temperature’, timestamp=1430128800) => (column=2015-04-27 10:00:00+0100:sensor_value, value=23.48, timestamp=1430128800) => (column=2015-04-27 10:01:00+0100:, value=, timestamp=1430128860) => (column=2015-04-27 10:01:00+0100:sensor_type, value=‘Temperature’, timestamp=1430128860) => (column=2015-04-27 10:01:00+0100:sensor_value, value=24.08, timestamp=1430128860)

Clustering values are repeated for each normal column

Full timestamp storage

Page 55: Cassandra and materialized views

Cassandra 3.0 data structure

Map<byte[ ], SortedMap<ClusteringColumn, Row>>

55

CREATE TABLE sensor_data( sensor_id uuid, date timestamp, sensor_type text, sensor_value double, PRIMARY KEY(sensor_id, date));

Page 56: Cassandra and materialized views

Cassandra 3.0 on disk layout

56

PartitionKey: de305d54-75b4-431b-adb2-eb6b9e546014 => clusteringColumn:2015-04-27 10:00:00+0100 => row_timestamp=1430128800 => (column_value=‘Temperature’, delta_encoded_timestamp=+0) => (column_value=23.48, delta_encoded_timestamp=+0) => clusteringColumn:2015-04-27 10:01:00+0100 => row_timestamp=1430128860 => (column_value=‘Temperature’, delta_encoded_timestamp=+0) => (column_value=24.08, delta_encoded_timestamp=+0)

Delta-encoded timestamp vs row timestamp

Page 57: Cassandra and materialized views

Gains

57

•  No clustering value repetition

•  Column labels are stored only once in meta data

•  Delta encoding of timestamp, 8 bytes saved each time

•  Less disk space used

Page 58: Cassandra and materialized views

Benchmarks

58

CREATE TABLE events ( id uuid, date timeuuid, prop1 int, prop2 text, prop3 float, PRIMARY KEY(id, date));

106 rows

Small string

Page 59: Cassandra and materialized views

Benchmarks

59

CREATE TABLE largetext( key int, prop1 int, prop2 text,PRIMARY KEY(id));

106 rows

Large string (1000)

Page 60: Cassandra and materialized views

Benchmarks

60

CREATE TABLE largeclustering( key int, clust text, prop1 int, prop2 set<float>,PRIMARY KEY(id, clust));

106 rows Medium string (100)

50 items

Page 61: Cassandra and materialized views

Benchmarks

61

CREATE TABLE events ( id uuid, date timeuuid, prop1 int, prop2 text, prop3 float, PRIMARY KEY(id, date)) WITH COMPACT STORAGE ;

Page 62: Cassandra and materialized views

Q & A

! "

62

Page 63: Cassandra and materialized views

@doanduyhai

[email protected]

https://academy.datastax.com/

Thank You

63