apache tez : accelerating hadoop data processing

49
© Hortonworks Inc. 2013 Page 1 Apache Tez : Accelerating Hadoop Data Processing Bikas Saha @bikassaha

Upload: blaze-burris

Post on 03-Jan-2016

37 views

Category:

Documents


2 download

DESCRIPTION

Apache Tez : Accelerating Hadoop Data Processing. Bikas Saha @ bikassaha. Tez – Introduction. Distributed execution framework targeted towards data-processing applications. Based on expressing a computation as a dataflow graph. Highly customizable to meet a broad spectrum of use cases. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013 Page 1

Apache Tez : Accelerating Hadoop Data Processing

Bikas Saha@bikassaha

Page 2: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Introduction

Page 2

• Distributed execution framework targeted towards data-processing applications.

• Based on expressing a computation as a dataflow graph.• Highly customizable to meet a broad spectrum of use cases.• Built on top of YARN – the resource management framework for

Hadoop.• Open source Apache project and Apache licensed.

Page 3: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2012© Hortonworks Inc. 2013. Confidential and Proprietary.

Tez – Hadoop 1 ---> Hadoop 2Monolithic• Resource Management – MR • Execution Engine – MR

Layered• Resource Management – YARN• Engines – Hive, Pig, Cascading, Your App!

Page 4: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Empowering Applications

Page 4

• Tez solves hard problems of running on a distributed Hadoop environment• Apps can focus on solving their domain specific problems• This design is important to be a platform for a variety of applications

App

Tez

• Custom application logic• Custom data format• Custom data transfer technology

• Distributed parallel execution• Negotiating resources from the Hadoop framework• Fault tolerance and recovery• Horizontal scalability• Resource elasticity• Shared library of ready-to-use components• Built-in performance optimizations• Security

Page 5: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – End User Benefits

• Better performance of applications• Built-in performance + Application define optimizations

• Better predictability of results• Minimization of overheads and queuing delays

• Better utilization of compute capacity• Efficient use of allocated resources

• Reduced load on distributed filesystem (HDFS)• Reduce unnecessary replicated writes

• Reduced network usage• Better locality and data transfer using new data patterns

• Higher application developer productivity• Focus on application business logic rather than Hadoop internals

Page 5

Page 6: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Design considerations

Don’t solve problems that have already been solved. Or else you will have to solve them again!

• Leverage discrete task based compute model for elasticity, scalability and fault tolerance

• Leverage several man years of work in Hadoop Map-Reduce data shuffling operations

• Leverage proven resource sharing and multi-tenancy model for Hadoop and YARN

• Leverage built-in security mechanisms in Hadoop for privacy and isolation

Page 6

Look to the Future with an eye on the Past

Page 7: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Problems that it addresses

• Expressing the computation• Direct and elegant representation of the data processing flow• Interfacing with application code and new technologies

• Performance• Late Binding : Make decisions as late as possible using real data from at

runtime• Leverage the resources of the cluster efficiently• Just work out of the box!• Customizable to let applications tailor the job to meet their specific

requirements

• Operation simplicity• Painless to operate, experiment and upgrade

Page 7

Page 8: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Simplifying Operations

• No deployments to do. No side effects. Easy and safe to try it out!• Tez is a completely client side application. • Simply upload to any accessible FileSystem and change local Tez

configuration to point to that.• Enables running different versions concurrently. Easy to test new

functionality while keeping stable versions for production.• Leverages YARN local resources.

Page 8

ClientMachine

NodeManager

TezTask

NodeManager

TezTaskTezClient

HDFSTez Lib 1 Tez Lib 2

ClientMachine

TezClient

Page 9: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Expressing the computation

Page 9

Aggregate Stage

Partition Stage

Preprocessor Stage

Sampler

Task-1 Task-2

Task-1 Task-2

Task-1 Task-2

Samples

Ranges

Distributed Sort

Distributed data processing jobs typically look like DAGs (Directed Acyclic Graph). • Vertices in the graph represent data transformations • Edges represent data movement from producers to consumers

Page 10: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Expressing the computation

Page 10

Tez provides the following APIs to define the processing• DAG API

• Defines the structure of the data processing and the relationship between producers and consumers

• Enable definition of complex data flow pipelines using simple graph connection API’s. Tez expands the logical DAG at runtime

• This is how the connection of tasks in the job gets specified

• Runtime API• Defines the interfaces using which the framework and app code interact

with each other• App code transforms data and moves it between tasks• This is how we specify what actually executes in each task on the cluster

nodes

Page 11: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – DAG API

// Define DAGDAG dag = DAG.create();

// Define VertexVertex Map1 = Vertex.create(Processor.class);

// Define EdgeEdge edge = Edge.create(Map1, Reduce1, SCATTER_GATHER, PERSISTED, SEQUENTIAL, Output.class, Input.class);

// Connect themdag.addVertex(Map1).addEdge(edge)…

Page 11

Defines the global processing flow

Map1 Map2

Reduce1 Reduce2

Join

Scatter Gather

Scatter Gather

Page 12: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – DAG API

Page 12

Data movement – Defines routing of data between tasks

• One-To-One : Data from the ith producer task routes to the ith consumer task.

• Broadcast : Data from a producer task routes to all consumer tasks.

• Scatter-Gather : Producer tasks scatter data into shards and consumer tasks gather the data. The ith shard from all producer tasks routes to the ith consumer task.

Edge properties define the connection between producer and consumer tasks in the DAG

Page 13: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Logical DAG expansion at Runtime

Page 13

Reduce1

Map2

Reduce2

Join

Map1

Page 14: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Runtime API

Flexible Inputs-Processor-Outputs Model•Thin API layer to wrap around arbitrary application code•Compose inputs, processor and outputs to execute

arbitrary processing•Applications decide logical data format and data transfer

technology•Customize for performance•Built-in implementations for Hadoop 2.0 data services –

HDFS and YARN ShuffleService. Built on the same API. Your implementations are as first class as ours!

Page 14

Page 15: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Task Composition

Page 15

V-A

V-B V-C

Logical DAG

Output-1 Output-3

Processor-A

Input-2

Processor-B

Input-4

Processor-C

Task A

Task B Task C

Edge AB Edge AC

V-A = { Processor-A.class }V-B = { Processor-B.class }V-C = { Processor-C.class }Edge AB = { V-A, V-B, Output-1.class, Input-2.class }Edge AC = { V-A, V-C, Output-3.class, Input-4.class }

Page 16: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Library of Inputs and Outputs

Page 16

Classical ‘Map’ Classical ‘Reduce’

Intermediate ‘Reduce’ for Map-Reduce-Reduce

Map Processor

HDFS Input

Sorted Output

Reduce Processor

Shuffle Input

HDFS Output

Reduce Processor

Shuffle Input

Sorted Output

• What is built in?– Hadoop InputFormat/OutputFormat– SortedGroupedPartitioned Key-Value

Input/Output– UnsortedGroupedPartitioned Key-Value

Input/Output– Key-Value Input/Output

Page 17: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Composable Task Model

Page 17

Hive Processor

HDFSInput

RemoteFile

ServerInput

HDFSOutput

LocalDisk

Output

Custom Processor

HDFSInput

RemoteFile

ServerInput

HDFSOutput

LocalDisk

Output

Custom Processor

RDMAInput

NativeDB

Input

KakfaPub-SubOutput

AmazonS3

Output

Adopt Evolve Optimize

Page 18: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Performance

• Benefits of expressing the data processing as a DAG• Reducing overheads and queuing effects• Gives system the global picture for better planning

• Efficient use of resources• Re-use resources to maximize utilization• Pre-launch, pre-warm and cache• Locality & resource aware scheduling

• Support for application defined DAG modifications at runtime for optimized execution

• Change task concurrency • Change task scheduling• Change DAG edges• Change DAG vertices

Page 18

Page 19: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Benefits of DAG execution

Faster Execution and Higher Predictability• Eliminate replicated write barrier between successive computations.• Eliminate job launch overhead of workflow jobs.• Eliminate extra stage of map reads in every workflow job.• Eliminate queue and resource contention suffered by workflow jobs that

are started after a predecessor job completes.• Better locality because the engine has the global picture

Page 19

Pig/Hive - MRPig/Hive - Tez

Page 20: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Container Re-Use

• Reuse YARN containers/JVMs to launch new tasks• Reduce scheduling and launching delays• Shared in-memory data across tasks• JVM JIT friendly execution

Page 20

YARN Container / JVM

TezTask Host

TezTask1

TezTask2

Shar

ed O

bjec

ts

YARN Container

Tez Application Master

Start Task

Task Done

Start Task

Page 21: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Sessions

Page 21

Application Master

Client

Start Session

Submit DAG

Task Scheduler

Cont

aine

r Poo

lShared Object

Registry

Pre Warmed

JVM

Sessions• Standard concepts of pre-launch

and pre-warm applied• Key for interactive queries• Represents a connection between

the user and the cluster• Multiple DAGs executed in the

same session• Containers re-used across queries• Takes care of data locality and

releasing resources when idle

Page 22: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Re-Use in Action

Task Execution Timeline

Cont

aine

rs

Time

Page 23: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Customizable Core Engine

Page 23

Vertex-2

Vertex-1

Startvertex

Vertex Manager

Starttasks

DAGScheduler

Get Priority

Get Priority

Startvertex

TaskScheduler

Get container

Get container

• Vertex Manager• Determines task

parallelism• Determines when

tasks in a vertex can start.

• DAG SchedulerDetermines priority of task

• Task SchedulerAllocates containers from YARN and assigns them to tasks

Page 24: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Event Based Control Plane

Page 24

Reduce Task 2

Input1 Input2

Map Task 2Output1

Output2Output3

Map Task 1Output1

Output2Output3

AMRouter

Scatter-Gather Edge

• Events used to communicate between the tasks and between task and framework

• Data Movement Event used by producer task to inform the consumer task about data location, size etc.

• Input Error event sent by task to the engine to inform about errors in reading input. The engine then takes action by re-generating the input

• Other events to send task completion notification, data statistics and other control plane information

Data Event

Error Event

Page 25: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Automatic Reduce Parallelism

Page 25

Map Vertex

Reduce VertexApp Master

Vertex ManagerData Size Statistics

Vertex StateMachine

Set Parallelism

Cancel Task

Re-Route

Event ModelMap tasks send data statistics events to the Reduce Vertex Manager.

Vertex ManagerPluggable application logic that understands the data statistics and can formulate the correct parallelism. Advises vertex controller on parallelism

Page 26: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Reduce Slow Start/Pre-launch

Page 26

Map Vertex

Reduce VertexApp Master

Vertex ManagerTask Completed

Vertex StateMachine

Start Tasks

Start

Event ModelMap completion events sent to the Reduce Vertex Manager.

Vertex ManagerPluggable application logic that understands the data size. Advises the vertex controller to launch the reducers before all maps have completed so that shuffle can start.

Page 27: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Theory to Practice

• Performance• Scalability

Page 27

Page 28: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Hive TPC-DS Scale 200GB latency

Page 29: Apache Tez : Accelerating Hadoop Data Processing

Replicated Join (2.8x)

Join + Groupby

(1.5x)

Join + Groupby +

Orderby (1.5x)

3 way Split + Join +

Groupby + Orderby

(2.6x)

0500

100015002000250030003500400045005000

MRTez

Tim

e in

se

cs

Tez – Pig performance gains

• Demonstrates performance gains from a basic translation to a Tez DAG

• Deeper integration in the works for further improvements

Page 30: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Observations on Performance

• Number of stages in the DAG• Higher the number of stages in the DAG, performance of Tez (over MR) will

be better.• Cluster/queue capacity

• More congested a queue is, the performance of Tez (over MR) will be better due to container reuse.

• Size of intermediate output• More the size of intermediate output, the performance of Tez (over MR) will

be better due to reduced HDFS usage.• Size of data in the job

• For smaller data and more stages, the performance of Tez (over MR) will be better as percentage of launch overhead in the total time is high for smaller jobs.

• Offload work to the cluster• Move as much work as possible to the cluster by modelling it via the job

DAG. Exploit the parallelism and resources of the cluster. E.g. MR split calculation.

• Vertex caching• The more re-computation can be avoided the better is the performance.

Page 30

Page 31: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Data at scale

Page 31

Hive TPC-DS Scale 10TB

Page 32: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – DAG definition at scale

Page 32

Hive : TPC-DS Query 64 Logical DAG

Page 33: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Container Reuse at Scale

78 vertices + 8374 tasks on 50 containers (TPC-DS Query 4)

Page 33

Page 34: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Real World Use Cases for the API

Page 34

Page 35: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Broadcast EdgeSELECT ss.ss_item_sk, ss.ss_quantity, avg_price, inv.inv_quantity_on_handFROM (select avg(ss_sold_price) as avg_price, ss_item_sk, ss_quantity_sk from store_sales group by ss_item_sk) ssJOIN inventory invON (inv.inv_item_sk = ss.ss_item_sk);

Hive – MR Hive – Tez

M

M

M

M M

HDFS

Store Sales scan. Group by and aggregation reduce size of this input.

Inventory scan and Join

Broadcast edge

M M M

HDFS

Store Sales scan. Group by and aggregation.

Inventory and Store Sales (aggr.) output scan and shuffle join.

R R

R R

RR

M

MMM

HDFS

Hive : Broadcast Join

Page 36: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Multiple Outputs

Page 36

Pig : Split & Group-byf = LOAD ‘foo’ AS (x, y, z);g1 = GROUP f BY y;g2 = GROUP f BY z;j = JOIN g1 BY group, g2 BY group;

Group by y Group by z

Load foo

Join

Load g1 and Load g2

Group by y Group by z

Load foo

Join

Multiple outputs

Reduce followsreduce

HDFS HDFS

Split multiplex de-multiplex

Pig – MR Pig – Tez

Page 37: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Multiple OutputsFROM (SELECT * FROM store_sales, date_dim WHERE ss_sold_date_sk = d_date_sk and

d_year = 2000)INSERT INTO TABLE t1 SELECT distinct ss_item_skINSERT INTO TABLE t2 SELECT distinct ss_customer_sk;

Hive – MR Hive – Tez

M MM

M

HDFS

Map join date_dim/store sales

Two MR jobs to do the distinct

M MM

M M

HDFS

RR

HDFS

M M M

R

M M M

R

HDFS

Broadcast Join (scan date_dim, join store sales)

Distinct for customer + items

Materialize join on HDFS

Hive : Multi-insert queries

Page 38: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – One to One Edge

Page 38

Aggregate

Sample L

Join

Stage sample mapon distributed cache

l = LOAD ‘left’ AS (x, y);r = LOAD ‘right’ AS (x, z);j = JOIN l BY x, r BY x USING ‘skewed’;

Load &Sample

Aggregate

Partition L

Join

Pass through inputvia 1-1 edge

Partition R

HDFS

Broadcastsample map

Partition L and Partition R

Pig – MR Pig – Tez

Pig : Skewed Join

Page 39: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Custom Edge

SELECT ss.ss_item_sk, ss.ss_quantity, inv.inv_quantity_on_handFROM store_sales ssJOIN inventory invON (inv.inv_item_sk = ss.ss_item_sk);

Hive – MR Hive – Tez

M MM

M M

HDFS

Inventory scan (Runs on cluster potentially more than 1 mapper)

Store Sales scan and Join (Custom vertex reads both inputs – no side file reads)

Custom edge (routes outputs of previous stage to the correct Mappers of the next stage)M MM

M

HDFS

Inventory scan (Runs as single local map task)

Store Sales scan and Join (Inventory hash table read as side file)

HDFS

Hive : Dynamically Partitioned Hash Join

Page 40: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Bringing it all together

Page 40Architecting the Future of Big Data

Tez Session populates container pool

Dimension table calculation and HDFS split generation in parallel

Dimension tables broadcasted to Hive MapJoin tasks

Final Reducer pre-launched and fetches completed inputs

TPCDS – Query-27 with Hive on Tez

Page 41: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Bridging the Data Spectrum

Page 41

Fact Table Dimension Table 1

Result Table 1

Dimension Table 2

Result Table 2

Dimension Table 3

Result Table 3

BroadcastJoin

ShuffleJoin

Typical pattern in a TPC-DS query

Fact Table

Dimension Table 1

Dimension Table 1

Dimension Table 1

Broadcast join for small data sets

Based on data size, the query optimizer can run either plan as a single Tez job

BroadcastJoin

Page 42: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Current status

• Apache Project–Growing community of contributors and users– Latest release is 0.5.0

• Focus on stability– Testing and quality are highest priority– Code ready and deployed on multi-node environments at scale

• Support for a vast topology of DAGs– Already functionally equivalent to Map Reduce. Existing Map Reduce

jobs can be executed on Tez with few or no changes–Rapid adoption by important open source projects and by vendors

Page 42

Page 43: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Developer Release

• API stability – Intuitive and clean APIs that will be supported and be backwards compatible with future releases

• Local Mode Execution – Allows the application to run entirely in the same JVM without any need for a cluster. Enables easy debugging using IDE’s on the developer machine

• Performance Debugging – Adds swim-lanes analysis tool to visualize job execution and help identify performance bottlenecks

• Documentation and tutorials to learn the APIs

Page 43

Page 44: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Adoption

• Apache Hive• Hadoop standard for declarative access via SQL-like interface (Hive 13)

• Apache Pig• Hadoop standard for procedural scripting and pipeline processing (Pig 14)

• Cascading• Developer friendly Java API and SDK• Scalding (Scala API on Cascading) (3.0)

• Commercial Vendors• ETL : Use Tez instead of MR or custom pipelines• Analytics Vendors : Use Tez as a target platform for scaling parallel

analytical tools to large data-sets

Page 44

Page 45: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Adoption Path

Pre-requisite : Hadoop 2 with YARNTez has zero deployment pain. No side effects or traces left behind on your cluster. Low risk and low effort to try out.• Using Hive, Pig, Cascading, Scalding

• Try them with Tez as execution engine

• Already have MapReduce based pipeline• Use configuration to change MapReduce to run on Tez by setting

‘mapreduce.framework.name’ to ‘yarn-tez’ in mapred-site.xml• Consolidate MR workflow into MR-DAG on Tez• Change MR-DAG to use more efficient Tez constructs

• Have custom pipeline• Wrap custom code/scripts into Tez inputs-processor-outputs• Translate your custom pipeline topology into Tez DAG• Change custom topology to use more efficient Tez constructs

Page 45

Page 46: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Roadmap

• Richer DAG support– Addition of vertices at runtime– Shared edges for shared outputs– Enhance Input/Output library

• Performance optimizations– Improve support for high concurrency– Improve locality aware scheduling– Add framework level data statistics – HDFS memory storage integration

• Usability– Stability and testability– UI integration with YARN– Tools for performance analysis and debugging

Page 46

Page 47: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Community

• Early adopters and code contributors welcome– Adopters to drive more scenarios. Contributors to make them happen.

• Tez meetup for developers and users– http://www.meetup.com/Apache-Tez-User-Group

• Technical blog series– http://

hortonworks.com/blog/apache-tez-a-new-chapter-in-hadoop-data-processing

• Useful links– Work tracking: https://issues.apache.org/jira/browse/TEZ– Code: https://github.com/apache/tez – Developer list: [email protected]

User list: [email protected] Issues list: [email protected]

Page 47

Page 48: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez – Takeaways

• Distributed execution framework that models processing as dataflow graphs

• Customizable execution architecture designed for extensibility and user defined performance optimizations

• Works out of the box with the platform figuring out the hard stuff

• Zero install – Safe and Easy to Evaluate and Experiment• Addresses common needs of a broad spectrum of applications

and usage patterns• Open source Apache project – your use-cases and code are

welcome• It works and is already being used by Apache Hive and Pig

Page 48

Page 49: Apache Tez : Accelerating Hadoop Data Processing

© Hortonworks Inc. 2013

Tez

Thanks for your time and attention!Video with Deep Dive on Tez http://goo.gl/BL67o7https://hortonworks.webex.com/hortonworks/lsr.php?RCID=b4382d626867fe57de70b218c74c03e4 – WordCount code walk through with 0.5 APIs. Debugging in local mode. Execution and profiling in cluster mode. (Starts at ~27 min. Ends at ~37 min)

Questions?

@bikassahaPage 49