snappydata overview niketechtalk 11/19/15

47
SnappyData Getting Spark ready for real- time, operational analytics www.snappydata. io Jags Ramnarayan jramnarayan@snappydat a.io Co-founder SnappyData Nov 2015

Upload: snappydata

Post on 12-Jan-2017

1.314 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: SnappyData overview NikeTechTalk 11/19/15

SnappyDataGetting Spark ready for real-time,

operational analytics

www.snappydata.io

Jags Ramnarayanjramnarayan@snappydat

a.ioCo-founder SnappyData

Nov 2015

Page 2: SnappyData overview NikeTechTalk 11/19/15

SnappyData - an EMC/Pivotal spin out● New Spark-based open source project started by

Pivotal GemFire founders+engineers● Decades of in-memory data management experience● Focus on real-time, operational analytics: Spark inside

an OLTP+OLAP database

www.snappydata.io

Page 3: SnappyData overview NikeTechTalk 11/19/15

Lambda Architecture (LA) for Analytics

SnappyData Focus

Page 4: SnappyData overview NikeTechTalk 11/19/15

Perspective on LA for real time

In-Memory DB

Interactive queries, updates

Deep Scale, High volumeMPP DBTransform

Data-in-motion

Analytics

Application

Streams

Alerts

Page 5: SnappyData overview NikeTechTalk 11/19/15

Use case: TelemetryRevenue

GenerationReal-time Location based

Mobile Advertising (B2B2C)

Location Based Services (B2C, B2B, B2B2C)

Revenue Protection

Customer experience management to reduce

churn

Customers Sentiment analysis

Network EfficiencyNetwork bandwidth

optimisation

Network signalling maximisation

• Network optimization– E.g. re-reroute call to another cell tower if congestion detected

• Location based Ads– Match incoming event to Subscriber profile; If ‘Opt-in’ show location sensitive Ad

• Challenge: Too much streaming data– Many subscribers, lots of 2G/3G/4G voice/data– Network events: location events, CDRs, network issues

Page 6: SnappyData overview NikeTechTalk 11/19/15

Challenge - Keeping up with streams

In-Memory DB

Interactive queries, updates

Deep Scale, High volumeMPP DBTransform

Data-in-motion

Analytics

Application

Streams

Alerts

• Millions of events/sec• HA – Continuously Ingest• Cannot throttle the stream• Diverse formats

Page 7: SnappyData overview NikeTechTalk 11/19/15

Challenge - Transform is expensive

In-Memory DB

Interactive queries, updates

Deep Scale, High volumeMPP DBTransform

Data-in-motion

Analytics

Application

Streams

Alerts

• Filter, Normalize, transform• Need reference data to

normalize – point lookups

Reference DB(Enterprise Oracle, …)

Page 8: SnappyData overview NikeTechTalk 11/19/15

Challenge - Stream joins, correlations

In-Memory DB

Interactive queries, updates

Deep Scale, High volumeMPP DBTransform

Data-in-motion

Analytics

Application

Streams

Alerts

Analyze over time window● Simple rules - (CallDroppedCount > threshold) then alert● Or, Complex (OLAP like query)● TopK, Trending, Join with reference data, correlate with

history

How do you keep up with OLAP style analytics with millions of events in window and billions of records in ref data?

Page 9: SnappyData overview NikeTechTalk 11/19/15

Challenge - State management

In-Memory DB

Interactive queries, updates

Deep Scale, High volumeMPP DBTransform

Data-in-motion

Analytics

Application

Streams

Alerts

Manage generated state● Mutating state: millions of

counters● “Once and only once”● Consistency across distributed

system● State HA

Page 10: SnappyData overview NikeTechTalk 11/19/15

Challenge - Interactive Query speed

In-Memory DB

Interactive queries, updates

Deep Scale, High volumeMPP DBTransform

Data-in-motion

Analytics

Application

Streams

Alerts

Interactive queries- OLAP style

queries- High

concurrency- Low response

time

Page 11: SnappyData overview NikeTechTalk 11/19/15

Today: queue -> process -> NoSQLMessaging cluster adds extra hops, management

No distributed, HA Data store

Streaming joins, or with external state is slow and not scalable in many cases

Page 12: SnappyData overview NikeTechTalk 11/19/15

SnappyData: A new approach

Single unified HA cluster: OLTP + OLAP + Stream for real-time analytics

Batch design, high throughput

Real-time design center- Low latency, HA,

concurrent

Vision: Drastically reduce the cost and complexity in modern big data

Page 13: SnappyData overview NikeTechTalk 11/19/15

SnappyData: A new approachSingle unified HA cluster: OLTP + OLAP +

Stream for real-time analytics

Batch design, high throughput

Real time operational Analytics – TBs in memory

RDB

Rows

TxnColumn

ar

API

Stream processing

ODBC, JDBC, REST

Spark - Scala, Java, Python, R

HDFSAQP

First commercial project on Approximate Query Processing(AQP)

MPP DB

Index

Page 14: SnappyData overview NikeTechTalk 11/19/15

Why columnar storage?

Page 15: SnappyData overview NikeTechTalk 11/19/15

Why Spark?● Blends streaming, interactive, and batch

analytics ● Appeals to Java, R, Python, Scala folks● Succinct programs● Rich set of transformations and libraries● RDD and fault tolerance without replication● Stream processing with high throughput

Page 16: SnappyData overview NikeTechTalk 11/19/15

Spark Myths● It is a distributed in-memory database

○ It’s a computational framework with immutable caching

● It is Highly Available○ Fault tolerance is not the same as HA

● Well suited for real time, operational environments○ Does not handle concurrency well

Page 17: SnappyData overview NikeTechTalk 11/19/15

Common Spark Streaming Architecture

Driver

Executor – spark engine

RDD Partition

@t0

RDD Partition

@t2

RDD Partition

@t1 time

Executor – spark engine

RDD Partition

@t0

RDD Partition

@t2

RDD Partition

@t1 timecassandra

Kafka queue

Client submits stream App

Queue is buffered in executor. Driver submits batch job

every second. This results in a

new RDD pushed to stream(batch

from buffer)Short term immutable

state.Long term – In external

DB

Page 18: SnappyData overview NikeTechTalk 11/19/15

Challenge: Spark driver not HA

DriverExecutor – spark

engine

Executor – spark engine

Client submits stream App If Driver fails –

Executors automatically exit

All CACHED STATE HAS TO BE

RE_HYDRATED

Page 19: SnappyData overview NikeTechTalk 11/19/15

Challenge: Sharing state

DriverClient1

Executor• Spark designed for

total isolation across client apps

• Sharing state across clients requires external DB/Tachyon

Executor

DriverClient2Executor

Executor

Page 20: SnappyData overview NikeTechTalk 11/19/15

Challenge: External state management

Driver

Executor – spark engine

RDD Partition

@t0

RDD Partition

@t2

RDD Partition

@t1 time

timecassandra

Kafka queue

Client submits stream App

Key based access might keep upBut, Joins, analytic operators is a problem.

Serialization, copying costs are too high, esp in JVMs

newDStream = wordDstream.updateStateByKey[Int] (func) - Spark capability to update state as batches arrive requires full iteration over RDD

Page 21: SnappyData overview NikeTechTalk 11/19/15

Challenge: “Once and only once” = hard

Executor

ExecutorRecovered partition

cassandra

X = 10X = 20

X = 30

X = X+10

X = X+10

OK

Page 22: SnappyData overview NikeTechTalk 11/19/15

Challenge: Always on

DriverExecutor – spark

engine

RDD Partition

@t0

RDD Partition

@t2

RDD Partition

@t1 time

Executor – spark engine

RDD Partition

@t0RDD Partition

@t2RDD Partition

@t1 time

Kafka queue

Client submits stream App

HA requirement : If something fails, there is always a redundant copy that is fully in sync. Failover is instantaneous

Fault tolerance in Spark: Recover state from the original source or checkpoint by tracking lineage. Can take too long.

Page 23: SnappyData overview NikeTechTalk 11/19/15

Challenge: Concurrent queries too slow

SELECT SUBSTR(sourceIP, 1, X),

SUM(adRevenue)

FROM uservisits

GROUP BY SUBSTR(sourceIP, 1, X)

Berkeley AMPLab Big Data Benchmark -- AWS m2.4xlarge ; total of 342 GB

Page 24: SnappyData overview NikeTechTalk 11/19/15

SnappyData: P2P cluster w/ consensus

Data Server JVM1

Data Server JVM2

Data Server JVM3

● Cluster elects a coordinator● Consistent views across

members● Virtual synchrony across

members● WHY? Strong

consistency during replication, failure detection is accurate and fast

Page 25: SnappyData overview NikeTechTalk 11/19/15

Colocated row/column Tables in Spark

RowTable

ColumnTable

Spark ExecutorTASK

Spark Block Manager

Stream processing

RowTable

ColumnTable

Spark ExecutorTASK

Spark Block Manager

Stream processing

RowTable

ColumnTable

Spark ExecutorTASK

Spark Block Manager

Stream processing

● Spark Executors are long lived and shared across multiple apps

● Gem Memory Mgr and Spark Block Mgr integrated

Page 26: SnappyData overview NikeTechTalk 11/19/15

Table can be partitioned or replicated

ReplicatedTable

Partitioned Table(Buckets A-H)

ReplicatedTable

Partitioned Table(Buckets I-P)

consistent replica on each node

PartitionReplica(Buckets A-H)

ReplicatedTable

Partitioned Table(Buckets Q-W)

PartitionReplica(Buckets I-P)

Data partitioned with one or more replicas

Page 27: SnappyData overview NikeTechTalk 11/19/15

Linearly scale with shared partitions

Spark Executor

Spark Executor

Kafka queue

Subscriber N-Z

Subscriber A-M

Subscriber A-M Ref data

Linearly scale with partition pruning

Input queue, Stream, IMDB, Output queue all share the same partitioning strategy

Page 28: SnappyData overview NikeTechTalk 11/19/15

Point access, updates, fast writes● Row tables with PKs are distributed HashMaps

○ with secondary indexes● Support for transactional semantics

○ read_committed, repeatable_read● Support for scalable high write rates

○ streaming data goes through stages○ queue streams, intermediate storage (Delta row

buffer), immutable compressed columns

Page 29: SnappyData overview NikeTechTalk 11/19/15

Full Spark Compatibility● Any table is also visible as a DataFrame● Any RDD[T]/DataFrame can be stored in

SnappyData tables● Tables appear like any JDBC sourced table

○ But, in executor memory by default● Addtional API for updates, inserts, deletes

//Save a dataFrame using the spark context … context.createExternalTable(”T1", "ROW", myDataFrame.schema, props );

//save using DataFrame APIdataDF.write.format("ROW").mode(SaveMode.Append).options(props).saveAsTable(”T1");

Page 30: SnappyData overview NikeTechTalk 11/19/15

Extends SparkCREATE [Temporary] TABLE [IF NOT EXISTS] table_name ( <column definition> ) USING ‘JDBC | ROW | COLUMN ’OPTIONS ( COLOCATE_WITH 'table_name', // Default none PARTITION_BY 'PRIMARY KEY | column name', // will be a replicated table, by default REDUNDANCY '1' , // Manage HA PERSISTENT "DISKSTORE_NAME ASYNCHRONOUS | SYNCHRONOUS",

// Empty string will map to default disk store. OFFHEAP "true | false" EVICTION_BY "MEMSIZE 200 | COUNT 200 | HEAPPERCENT",….. [AS select_statement];

Page 31: SnappyData overview NikeTechTalk 11/19/15

Key feature: Synopses Data● Maintain stratified samples

○ Intelligent sampling to keep error bounds low● Probabilistic data

○ TopK for time series (using time aggregation CMS, item aggregation)

○ Histograms, HyperLogLog, Bloom Filters, WaveletsCREATE SAMPLE TABLE sample-table-name USING columnar OPTIONS (

BASETABLE ‘table_name’ // source column table or stream table[ SAMPLINGMETHOD "stratified | uniform" ]STRATA name (

QCS (“comma-separated-column-names”)[ FRACTION  “frac” ]

),+ // one or more QCS

Page 32: SnappyData overview NikeTechTalk 11/19/15

Stratified Sampling Spark Demo

www.snappydata.io

Page 33: SnappyData overview NikeTechTalk 11/19/15

Driver HA, JobServer for interactive jobs

● REST based JobServer for sharing a single Context across clients○ clients use REST to execute streaming jobs, queries, DML○ secondary JobServer for HA○ primary election using Gem clustering

● Native SnappyData cluster manager for long running executors○ makes resources (executors) long running○ resuse same executors across apps, jobs

● Low latency scheduling that skips the Spark driver altogether

Page 34: SnappyData overview NikeTechTalk 11/19/15
Page 35: SnappyData overview NikeTechTalk 11/19/15

Unified OLAP/OLTP streaming w/ Spark

● Far fewer resources: TB problem becomes GB.○ CPU contention drops

● Far less complex○ single cluster for stream ingestion, continuous queries,

interactive queries and machine learning● Much faster

○ compressed data managed in distributed memory in columnar form reduces volume and is much more responsive

Page 36: SnappyData overview NikeTechTalk 11/19/15

www.snappydata.io

SnappyData is Open Source● Beta will be on github before December. We are looking

for contributors!● Learn more & register for beta: www.snappydata.io● Connect:

○ twitter: www.twitter.com/snappydata○ facebook: www.facebook.com/snappydata○ linkedin: www.linkedin.com/snappydata○ slack: http://snappydata-slackin.herokuapp.com○ IRC: irc.freenode.net #snappydata

Page 37: SnappyData overview NikeTechTalk 11/19/15

Extras

www.snappydata.io

Page 38: SnappyData overview NikeTechTalk 11/19/15

OLAP/OLTP with SynopsesCQ

Subscriptions

OLAP Query Engine

Micro Batch Processing

Module (Plugins)

Sliding WindowEmits Batches[ ]

User Applications processing Events & Issuing

Interactive Queries

Summary DB

▪ Time Series with decay▪ TopK, Frequency Summary

Structures▪ Counters▪ Histograms▪ Stratified Samples▪ Raw Data Windows

Exact DB(Row + column

oriented)

Page 39: SnappyData overview NikeTechTalk 11/19/15

Not pancea, but comes close● Synopses require prior workload knowledge● Not all queries … complex queries will result in high

error rates○ single cluster for stream ingestion and analytics queries

(both streaming and interactive)● Our strategy - be adjunct to MPP databases...

○ first compute the error estimate; if error is above tolerance delegate to exact store

Page 40: SnappyData overview NikeTechTalk 11/19/15

Adjunct store in certain scenarios

Page 41: SnappyData overview NikeTechTalk 11/19/15

Speed/Accuracy tradeoffEr

ror

30 mins

Time to Execute on

Entire Dataset

InteractiveQueries

2 secExecution Time (Sample Size)

41

Page 42: SnappyData overview NikeTechTalk 11/19/15

Stratified Sampling● Random sampling has intuitive semantics● However, data is typically skewed and our queries are

multi-dimensional○ avg sales order price for each product class for each

geography○ some products may have little to no sales○ stratification ensures that each “group” (product class) is

represented

Page 43: SnappyData overview NikeTechTalk 11/19/15

Stratified Sampling Challenges● Solutions exist for batch data (BlinkDB)● Needs to work for infinite streams of data

○ Answer: use combination of Stratified with other techniques like Bernouli/reservoir sampling

○ Exponentially decay over time

Page 44: SnappyData overview NikeTechTalk 11/19/15

Dealing with errors and latency● Well known error techniques for “closed form

aggregations”● Exploring other techniques -- Analytical Bootstrap● User can specify error bound with confidence intervalSELECT avg(sessionTime) FROM Table

WHERE city=‘San Francisco’ERROR 0.1 CONFIDENCE 95.0%

● Engine would determine if it can satisfy error bound first● If not, delegate execution to an “exact” store (GPDB, etc)● Query execution can also be latency bounded

Page 45: SnappyData overview NikeTechTalk 11/19/15

Sketching techniques● Sampling not effective for outlier detection

○ MAX/MIN etc● Other probabilistic structures like CMS, heavy hitters, etc● We implemented Hokusai

○ capture frequencies of items in time series● Design permits TopK queries over arbitrary trim intervals(Top100 popular URLs)SELECT pageURL, count(*) frequency FROM TableWHERE …. GROUP BY ….ORDER BY frequency DESCLIMIT 100

Page 46: SnappyData overview NikeTechTalk 11/19/15

Demo

Zeppelin Spark

Interpreter(Driver)

Zeppelin Server

Row cacheColumnarcompressed

Spark Executor JVM

Row cacheColumnarcompressed

Spark Executor JVM

Row cacheColumnarcompressed

Spark Executor JVM

Page 47: SnappyData overview NikeTechTalk 11/19/15

A new approach to Real Time Analytics

Streaming

Analytics Probabilistic data

Distributed In-

Memory SQL

Deep integration of Spark + Gem

Unified cluster, AlwaysOn, Cloud ready

For Real time analytics

Vision – Drastically reduce the cost and complexity in modern big data. …Using fraction of the resources10X better response time, drop resource cost 10X,

reduce complexity 10X

Deep Scale, High volume

MPP DBIntegrate with