nosql, no sweat with jboss data grid

50
Shane K Johnson / Tristan Tarrant 1 NoSQL: No sweat with JBoss Data Grid Shane Johnson Technical Marketing Manager Tristan Tarrant Principal Software Engineer 10/08/2012

Upload: shane-johnson

Post on 07-Jul-2015

1.815 views

Category:

Technology


2 download

DESCRIPTION

How clustered caches evolved in to data grids via NOSQL and Big Data.

TRANSCRIPT

Page 1: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant1

NoSQL: No sweat with JBoss Data Grid

Shane JohnsonTechnical Marketing Manager

Tristan TarrantPrincipal Software Engineer

10/08/2012

Page 2: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant2

NoSQL NOSQL

Page 3: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant3

Agenda

● Data Stores

● Data Grid● NOSQL● Cache

● Big Data

● Use Cases

● Q & A

Page 4: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant4

Data Stores

● Key / Value

● Document

● Graph

● Column Family

● And more...

Page 5: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant5

Data Grid?

Page 6: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant6

Page 7: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant7

Page 8: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant8

Page 9: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant9

NOSQL

● Elasticity

● Distributed Data

● Concurrency

● CAP Theorem

● Flexibility

Page 10: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant10

Elasticity

● Node Discovery

● Failure Detection

Page 11: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant11

How?

Page 12: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant12

JBoss Data Grid is built on a reliable group membership protocol: JGroups.

Page 13: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant13

Distributed Data

Page 14: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant14

Replicated

Page 15: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant15

Distributed

Page 16: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant16

How?

Page 17: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant17

Consistent Hashing

JBoss Data Grid Implementation: MurmurHash3

Page 18: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant18

Hash Wheel

Page 19: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant19

Virtual Nodes

Page 20: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant20

Linear Scaling

Page 21: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant21

Concurrency

Page 22: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant22

How?

Page 23: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant23

Multi Version Concurrency Control

Page 24: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant24

Internals

● Transactions● 2 PC● Isolation Level

● Read Committed● Repeatable Read

● Locking● Optimistic● Pessimistic

● Write Skew● Version – Vector Clocks

Page 25: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant25

Consistency

Page 26: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant26

CAP TheoremEric Brewer

Page 27: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant27

CAP Theorem

● Consistency

● Availability

● Partition Tolerance

Page 28: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant28

JBoss Data Grid + CAP Theorem

● No Physical Partition● Consistent and Available (C + A)

● Physical Partition● Available (A + P)

● Pseudo Partition (e.g. Unresponsive Node)● Consistent or Available (C + P / A + P)

Page 29: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant29

Flexibility

Page 30: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant30

Flexibility

● Replicated Data● Replication Queue● State Transfer – Enable / Disabled

● Distributed Data● Number of Owners● Rehash – Enable / Disable

● Communication – Synchronous / Asynchronous

● Isolation – Read Committed / Repeatable Read

● Locking – Optimistic / Pessimistic

Page 31: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant31

Page 32: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant32

Caching and Data Grids for JEE

Caching Data Grids

JSR-107 JSR-347

Page 33: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant33

Caching in Java

● Developers have been doing it forever● To increase performance● To offload legacy data-stores from unnecessary

requests● Home-brew approach based on Hashtables and Maps

● Many Free and commercial libraries but...

● … no Standard !

Page 34: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant34

JSR-107: Caching for JEE

● Local (single JVM) and Distributed (multiple JVMs) caches

● CacheManager: a way to obtain caches

● Cache, “inspired” by the Map API with extensions for entry expiration and additional atomic operations

● A Cache Lifecycle (starting, stopping)

● Entry Listeners for specific events

● Optional features: JTA support and annotations

● One of the oldest JSRs, dormant for a long time, recently revived by JSR-347

Page 35: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant35

And now ?

● Now that I've put a lot of data in my distributed cache, what can I do with it ?

● And most importantly...

● HOW ?

Page 36: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant36

Multiple clustering options

● Replication

● All nodes have all of the data.

● Grid Size == smallest node

● Distribution

● The Grid maintains n copies of each time of data on different nodes

● Grid Size == total size / n

Page 37: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant37

We like asynchronous

● So much that we want it in the API:

● Future<V> getAsync(K);

● Future<V> getAndPut(K, V);

Page 38: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant38

Keeping things close together

● If I need to access semantically-close data quickly, why not keep it on the same node ?

● Grouping API

● Distribution per-group and not per-key

● Via annotations

● Via a Grouper class

Page 39: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant39

Eventual consistency

● One step further than asynchronous clustering for higher performance

● Entries are tagged with a version (e.g. a timestamp or a time-based UUID): newer versions will eventually replace all older versions in the cluster

● Applications retrieving data may get an older entry, which may be “good enough”

Page 40: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant40

Big Data

Page 41: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant41

Remote Query

Page 42: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant42

Distributed Query

Page 43: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant43

Performing parallel computation

● Distributed Executors

● Run on all nodes where a cache exists

● Each executor works on the slice of data local to itself

● Fastest access

● Parallelization of operations

● Usually returns

Page 44: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant44

Map / Reduce

● A mapper function iterates through a set of key/values transforming them and sending them to a collector

void map(KIn, VIn, Collector<KOut, Vout>)

● A reducer works through the collected values for each key, returning a single value

VOut reduce(KOut, Iterator<VOut>)

● Finally a collator processes the reduced key/values and returns a result to the invoker

R collate(Map<KOut, VOut> reducedResults)

Page 45: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant45

Use Cases

Page 46: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant46

Replicated Use Case

● Finance● Master / Slave● High Availability● Failover● Performance + Consistency● Data – Lifespan● Servers – Few● Memory – Medium

Page 47: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant47

Distributed Use Case #1

● Telecom / Media● Performance > Consistency● Data

● Infinite● Calculated

● Servers – Few● Memory – Large

Page 48: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant48

Distributed Use Case #2

● Telecom● Consistency > Performance● Data

● Continuous● Limited Lifespan

● Servers – Many● Memory - Normal

Page 49: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant49

Q & A

Look for a follow up on the howtojboss.com blog.

Page 50: NoSQL, No sweat with JBoss Data Grid

Shane K Johnson / Tristan Tarrant50

Thanks for joining us.