the evolution of apache kylin by luke han

Post on 09-Jan-2017

683 Views

Category:

Software

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Evolution of Apache Kylin

LukeHan|韩卿lukehan@apache.org

2016-05-09Vancouver,Canada

Aboutme…

§Luke Han|韩卿§ Co-creator&VPofApacheKylin

§ ASFMember

§ Co-founder&CEOatKyligenceInc

§ lukehan@apache.org

§ Twitter:@lukehq

ApacheKylin

Why

Happiness

Latency10s

Whatwehavetried?

Kylin

AboutApache Kylin

http://kylin.apache.org

Extreme OLAP Engine for Big Data

Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop, supporting extremely large datasets and sub-second level response time.

kylin /ˈkiːˈlɪn/麒麟--n.(inChineseart)amythicalanimalofcompositeform

AboutApache Kylin

OLAP/数据集市

• BornforBigDataAnlytics

• Sub-secondsLatency

• ANSISQL

• SeamlessIntegration

withBITools

• Plug-ableArchitecture

time, item

time, item, location

time, item, location, supplier

time item location supplier

time, location

Time, supplier

item, location

item, supplier

location, supplier

time, item, supplier

time, location, supplier

item, location, supplier

0-D(apex) cuboid

1-D cuboids

2-D cuboids

3-D cuboids

4-D(base) cuboid

• Base vs. aggregate cells; ancestor vs. descendant cells; parent vs. child cells1. (9/15, milk, Urbana, Dairy_land) - <time, item, location, supplier>2. (9/15, milk, Urbana, *) - <time, item, location>3. (*, milk, Urbana, *) - <item, location>4. (*, milk, Chicago, *) - <item, location>5. (*, milk, *, *) - <item>

• Cuboid = one combination of dimensions• Cube = all combination of dimensions

(all cuboids)

OLAPCube

Cube- BalanceBetweenSpaceandTime

Architecture

MapReduce/Spark

Kylin

BITools,WebApp…

ANSISQL

ApacheKylin Journey

GoLiveateBay&OpenSourceonGithub

ApacheIncubator

FirstApacheReleasev0.71

InfoWorld:BossieAwardBestOpenSourceBigDataTool

ApacheReleasev1.0

ApacheTopLevelProject

Sept2013 Oct2014 June2015 Nov2015

Nov2014 Sept2015

§ Kyligence founded

Mar2016

Projectkickoff

Apache KylinGlobalAdoptions

UseCase:JD.com

UseCase:Baidu Map

UseCase:NetEase

PerformanceandThroughput

ByNetEase:http://www.bitstech.net/2016/01/04/kylin-olap/

TheEvolution

ApacheKylin NewFeatures

§ Plugin-ablearchitecture§NewMRCubeEnginewithfastcubing(1.5xfaster)§NewHBaseStoragewithparallelscan(2xfaster)§Nearreal-timeanalysis§Userdefinedaggregations§ Excel/PowerBI/Zeppelinintegration

TheFreedom,Extensibility,Flexibility

§ Freedom

§ Zoobreak,notboundtoHadoopanymore

§ Freetogotoabetterengineorstorage

§ Extensibility

§ Acceptanyinput,e.g.Kafka

§ Embracenext-gendistributedplatform,e.g.Spark

§ Flexibility

§ Choosedifferentenginefordifferentdataset

Newgenerationdesign

CubeBuilder(MapReduce…)

SQL

LowLatency-SecondsRouting

3rdPartyApp(WebApp,Mobile…)

Metadata

SQL-BasedTool(BITools:Tableau…)

QueryEngine

HadoopHive

RESTAPI JDBC/ODBC

Ø OnlineAnalysisDataFlowØ OfflineDataFlow

Ø Clients/Users interactive withKylinviaSQL

Ø OLAPCubeistransparent tousers

StarSchemaData KeyValueData

DataCubeOLAPCubes(HBase)

SQL

RESTServerDa

taSource

Abstraction Engine

Abstraction

Storage

Abstraction

MREngineIN OUT

HiveSource

HBaseStorage

CubeMetadata

SourceFactory StorageFactoryEngineFactory

Plug-ablearchitecture

Plug-ablearchitecture

MREngine

HiveAdapter HBase Adapter

loaddata savecubeHiveSource

HBaseStorage

adapttoIN adapttoOUT

ParallelScan

§ Slowqueriesare5-10xfaster.

§ NewHbase storageenablespartitiononcuboidsthatarebigenough.

§ Overallquerytimeis2x faster thanbefore,sumresultsfrom10,000+queries.

Query

CuboidA

CuboidB

Query

A1 B1

A2 B2

A3 C

CuboidC

Server1

Server2

Server3

Server1

Server2

Server3

NearRealtime IncrementalBuild

n Minutesmicrocubesn Kafkasourcen In-memcubingn Automerge

UserDefinedAggregationTypes

§HyperLogLog CountDistinct§ TopN§ BitMap PreciseCountDistinct

§ fromSun,Yerui (meituan.com)

§ RawRecords§ fromWang,Xiaoyu (jd.com)

Support more BI &VisualizationTools

§ SupportsTableau9.1§ SupportsMSExcel§ SupportsMSPowerBI§ SupportsZeppelin

Roadmap

ApacheKylinRoadmap

2016Focus…

§ Streaming and Real Time§ Performance,performanceandperformance§ SupportmoreBI&visualizationtools§ SQL &OLAP Functions.

Q&A

§More…§Website:http://kylin.apache.org§Twitter:@ApacheKylin

§ContactMe:§ lukehan@apache.org§@lukehq

top related