adding spark support to kylin at bay area spark meetup

8
http://kylin.io Adding Spark Support to Apache Kylin Luke Han | 韩韩 Kylin co-creator & PMC Member | [email protected]

Upload: luke-han

Post on 06-Aug-2015

276 views

Category:

Technology


1 download

TRANSCRIPT

http://kylin.io

Adding Spark Support to Apache Kylin

Luke Han | 韩卿Kylin co-creator & PMC Member | [email protected]

http://kylin.io

Extreme OLAP Engine for Big Data

Kylin is an open source Distributed Analytics Engine from eBay that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets

What’s Kylin

kylin / ˈkiːˈlɪn / 麒麟--n. (in Chinese art) a mythical animal of composite form

• Open Sourced on Oct 1st, 2014• Be accepted as Apache Incubator Project on Nov 25th, 2014

http://kylin.io

Kylin Architecture Overview

3

Cube Build EngineBatch (MapReduce, Spark)

& Streaming

SQL

Low Latency - SecondsMid Latency - MinutesRouting

3rd Party App(Web App, Mobile…)

Metadata

SQL-Based Tool(BI Tools: Tableau…)

Query Engine

HadoopHive

REST API JDBC/ODBC

Online Analysis Data Flow Offline Data Flow

Clients/Users interactive with Kylin via SQL

OLAP Cube is transparent to users

Star Schema Data Key Value Data

Data Cube

OLAPCube(HBase)

SQL

REST Server

SparkSQL

http://kylin.io4

High latency when reading data from Hive Several hours to fetch data when join big tables Route to SQL-on-Hadoop turned off due to performance

issue

Time-to-Market of data latency Huge IO & Network traffic with MR jobs

Streaming? Streaming process and pre-calculate cubes

Challenges…

http://kylin.io5

Integrating with Spark SQL: Option I: Read data from SparkSQL instead of Hive Option II: Route unsupported queries to SparkSQL Option III: Kylin to be OLAP source of SparkSQL

Spark Cube Build Engine Efficiency cube generate engine with Spark

Spark Streaming Leverage SparkStreaming for StreamingOLAP

HBase?

Add Spark Support to Apache Kylin

http://kylin.io

Kylin Evolution Roadmap

201520142013

Initial

Prototype for MOLAP• Basic end to end

POC

MOLAP• Incremental

Refresh• ANSI SQL• ODBC Driver• Web GUI• ACL• Open Source

HOLAP• Streaming OLAP• JDBC Driver• New UI• Excel Support• SparkSQL• … more

Next Gen• Automation• Capacity

Management• Spark Engine• In-Memory

Analysis• … more

TBD

Future…

Sep, 2013

Jan, 2014

Sep, 2014

Q1, 2015

http://kylin.io

Kylin Core Fundamental framework of

Kylin OLAP Engine

Extension Plugins to support for

additional functions and features

Integration Lifecycle Management

Support to integrate with other applications

Interface Allows for third party users

to build more features via user-interface atop Kylin core

Driver ODBC and JDBC Drivers

Kylin OLAPCore

Extension Security Redis

Storage Spark

Engine Docker

Interface Web Console Customized BI Ambari/Hue

Plugin

Integration ODBC Driver ETL Drill SparkSQL

Kylin Ecosystem

http://kylin.io

If you want to go fast, go alone.If you want to go far, go together.

--African Proverb