apache zeppelin and spark for enterprise data science

Post on 15-Apr-2017

147 Views

Category:

Data & Analytics

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Enabling Apache Zeppelin* and Spark* for Data Science in the Enterprise

Bikas Saha@bikassaha

*Apache Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive,HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper,Oozie, Zeppelin and the Hadoop elephant logo are trademarks of theApache Software Foundation.

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaMaking Big Data Science easy to approach

What are the current issues for the enterprise

Making Apache Zeppelin enterprise ready

Future Roadmap

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Zeppelin

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Zeppelin makes Big Data Science Easy to Approach

Zero install – Just connect via a web browser and ready to run

Support for multiple execution platforms (Apache Spark, JDBC, Hive…)

Support for multiple languages (Scala, SQL, Python…)

Support for built-in visualizations

Support for reporting

Support for sharing and collaborative work

Does NOT have machine learning built-in – that’s where Apache Spark comes in (or your favorite SQL engine Apache Flink/Drill/Hive… and 30+ others)

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Zeppelin for Sharing

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaMaking Big Data Science easy to approach

What are the current issues for the enterprise

Making Apache Zeppelin enterprise ready

Future Roadmap

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Current Apache Zeppelin and Spark integration

ZeppelinServer

SparkDriver

U

s

e

r SparkExecutor

SparkExecutor

SparkExecutor

SparkExecutor

SparkExecutor

SparkExecutor

SparkExecutor

SparkExecutor

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Architectural Issue with Secure Data Access

ZeppelinServer

SparkDriver

U

s

e

r

1

SparkExecutor

SparkExecutor

SparkExecutor

SparkExecutor

SparkExecutor

SparkExecutor

SparkExecutor

SparkExecutor

Zeppelin ServerUser

H

D

F

S

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Architectural Issues with Multi-Tenancy – Fault Tolerance

ZeppelinServer

SparkDriver

U

s

e

r

1

SparkExecutor

SparkExecutor

SparkExecutor

SparkExecutor

SparkExecutor

SparkExecutor

SparkExecutor

SparkExecutor

U

s

e

r

2

User 1 failure affects User 2

Heavy-weight Spark drivers

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Architectural Issues with Multi-Tenancy – Privacy

ZeppelinServer

SparkDriver

U

s

e

r

1

SparkExecutor

SparkExecutor

SparkExecutor

SparkExecutor

SparkExecutor

SparkExecutor

SparkExecutor

U

s

e

r

2

User 1 can

access User 2Data

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaMaking Big Data Science easy to approach

What are the current issues for the enterprise

Enterprise Ready Big Data Science

Future Roadmap

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Livy Server as a Session Management Service

LivyServer

Remote Spark Driver

SessionRemote Context

Interactive REST API

BatchREST API

Standard Spark Batch Job

SparkExecutor

SparkExecutor

SparkExecutor

SparkExecutor

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Secure Data Access - Solved

ZeppelinServer

LivyInterpreter

U

s

e

r

SparkExecutor

SparkExecutor

LivyServer

Remote Spark Driver

Session

Remote Context

User

HDFS

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Multi Tenancy - Solved

ZeppelinServer

LivyInterpreter

LivyServer

Session 1

U

s

e

r

1

U

s

e

r

2

LivyInterpreter

Session 2

Remote Spark Driver

Remote Context

SparkExecutor

Remote Spark Driver

Remote Context

SparkExecutor

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaMaking Big Data Science easy to approach

What are the current issues for the enterprise

Making Apache Zeppelin enterprise ready

Future Roadmap

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Near Term Improvements

Session Management

Debuggability

Unified session for all languages

Better visualizations for Machine Learning

Support for Spark 2.0

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Long Term Improvements

Controlled sharing of sessions for collaboration

Data exploration and browsing with metadata

Taking the model from training to production

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank You

top related