scale 12 x efficient multi-tenant hadoop 2 workloads with yarn

Hadoop 2: Efficient mul3-‐tenant workloads that enable the Modern Data Architecture

@ddkaiser linkedin.com/in/dkaiser facebook.com/dkaiser [email protected] [email protected]

SCALE 12X, Los Angeles February 23, 2014

David Kaiser

Who Am I?

David Kaiser

20+ years experience with Linux 3 years experience with Hadoop Career experiences: •  Data Warehousing •  Geospa3al Analy3cs •  Open-‐source Solu3ons and Architecture

Employed at Hortonworks as a Senior Solu3ons Engineer


Hadoop 2: Efficient multi-tenant workloads that enable the Modern Data Architecture

• Abstract: – Hadoop is about so much more than batch processing. With the recent release of Hadoop 2, there have been significant changes to how a Hadoop cluster uses resources.

– YARN, the new resource management component, allows for a more efficient mix of workloads across hardware resources, and enables new applica3ons and new processing paradigms such as stream-‐processing.

– This talk will discuss the new design and components of Hadoop 2, and provide examples of Modern Data Architectures that leverage Hadoop 2.

What is This Thing?

hZp://hadoop.apache.org/

Misconceptions •  Bucket brigade for large or slow data processing tasks

Misconceptions •  Bucket brigade for large or slow data processing tasks •  Batch processor – Another mainframe

Misconceptions •  Bucket brigade for large or slow data processing tasks •  Batch processor – Another mainframe •  Dumb/inflexible, trendy, too simple

Misconceptions •  Incorrect assump3on that Java == SLOW

Misconceptions •  Incorrect assump3on that Java == EVIL

Hadoop + Linux

Provides a 100% Open-Source framework for efficient scalable data processing on commodity hardware

Commodity Hardware

Linux – The Open-‐source

Opera3ng System

Hadoop – The Open-‐source

Data Opera3ng System

Hadoop Fundamentals • Hadoop is a single system, across multiple Linux systems

• Two basic capabilities of Hadoop

– Reliable, Redundant and Distributed Storage – Distributed Computa3on

• Storage: Hadoop Distributed File System (HDFS) – Replicated, distributed filesystem – Blocks wriZen to underlying filesystem on mul3ple nodes

• Computation – Resource management – Frameworks to divide workloads across collec3on of resources

– Hadoop V1: MapReduce framework only

– Hadoop V2: MapReduce, Tez, Spark, others…

HDFS: File create lifecycle

NameNode

RACK

1

RACK

2

RACK

3

FILE B1 B2 FILE

HDFS CLIENT

Create

B1 B2 B2

B2

B1

B1

ack

ack 2 1

ack

3

Complete 4

Hadoop 1 Computation •  MapReduce Framework

–  Combined both Resource Management and Applica3on Logic in the same code

•  Limitations –  Resource alloca3on units (slots) fixed per cluster –  Difficult to use a cluster for differing or simultaneous workloads

The 1st Generation of Hadoop: Batch

HADOOP 1.0 Built for Web-‐Scale Batch Apps

Single App

BATCH

HDFS

Single App

INTERACTIVE

Single App

BATCH

HDFS

•  All other usage paZerns must leverage that same infrastructure

•  Forces the crea3on of silos for managing mixed workloads

Single App

BATCH

HDFS

Single App

ONLINE

Hadoop MapReduce Classic • JobTracker

– Manages cluster resources and job scheduling

• TaskTracker

– Per-‐node agent

– Manage tasks

MapReduce Classic: Limitations

• Scalability – Maximum Cluster size – 4,000 nodes – Maximum concurrent tasks – 40,000 – Coarse synchroniza3on in JobTracker

•  Availability

– Failure kills all queued and running jobs •  Hard partition of resources into map and reduce slots – Low resource u3liza3on

•  Lacks support for alternate paradigms and services

– Itera3ve applica3ons implemented using MapReduce are 10x slower

Hadoop 1: Poor Utilization of Cluster Resources

Hard-‐Coded values. Task tracker must be restarted aker a change

Map tasks are wai3ng for the slots which are NOT currently used by reduce tasks

Hadoop 1 JobTracker and TaskTracker used fixed-‐sized “slots” for resource alloca3on

Hadoop 2: Moving Past MapReduce

HADOOP 1.0

HDFS (redundant, reliable storage)

MapReduce (cluster resource management

& data processing)

HDFS2 (redundant, highly-‐available & reliable storage)

YARN (cluster resource management)

MapReduce (data processing)

Others

HADOOP 2.0

Single Use System Batch Apps

Mul/ Purpose Pla5orm Batch, Interac/ve, Online, Streaming, …

Apache Tez as the new Primitive

HDFS2 (redundant, reliable storage)

Tez (execu3on engine)

YARN (cluster resource management)

HADOOP 2.0

MapReduce as Base Apache Tez as Base

HDFS (redundant, reliable storage)

MapReduce (cluster resource management

& data processing)

Pig (data flow)

Hive (sql)

Others (cascading)

HADOOP 1.0 Data Flow

Pig SQL Hive

Others (cascading)

Batch MapReduce

?? (HOYA) (con3nuous execu3on)

Online Data

Processing HBase, Accumulo

Real Time Stream

Processing Storm

Tez – Execution Performance •  Performance gains over Map Reduce

–  Eliminate replicated write barrier between successive computa3ons. –  Eliminate job launch overhead of workflow jobs. –  Eliminate extra stage of map reads in every workflow job. –  Eliminate queue and resource conten3on suffered by workflow jobs that are started aker a predecessor job completes.

Pig/Hive -‐ MR Pig/Hive -‐ Tez

YARN: Taking Hadoop Beyond Batch

ApplicaSons Run NaSvely in Hadoop

HDFS2 (Redundant, Reliable Storage)

YARN (Cluster Resource Management)

BATCH (MapReduce)

INTERACTIVE (Tez)

STREAMING (Storm, S4,…)

GRAPH (Giraph)

IN-‐MEMORY (Spark)

HPC MPI (OpenMPI)

ONLINE (HBase)

OTHER (Search) (Weave…)

Store ALL DATA in one place…

Interact with that data in MULTIPLE WAYS

with Predictable Performance and Quality of Service

YARN Overview • Goals:

– Reduce the responsibili3es of the JobTracker –  Separate the resource management du3es away from the job coordina3on du3es

– Allow mul3ple simultaneous jobs –  Enables different style and sized workloads in one cluster

• Design: – A separate Resource Manager

–  1 Global Resource Scheduler for the en3re cluster

–  Each worker (slave) node runs a Node Manager, manages life-‐cycle of containers

–  JobTracker is now called Applica3on Master –  Each Applica3on has 1 Applica3on Master

– Manages applica3on scheduling and task execu3on

NodeManager NodeManager NodeManager NodeManager

ResourceManager



AM 1 AM2

Container 2.4 Container 2.1

Container 2.2

Container 2.3

Container 1.1

Container 1.2

Container 1.3

YARN Architecture

Scheduler

Client 1

Client 2

Capacity Sharing: Concepts • Application – Applica3on is a temporal job or a service submiZed to YARN – Examples

–  Map Reduce Job (job)

–  Storm topology (service)

• Container – Basic unit of alloca3on – Fine-‐grained resource alloca3on across mul3ple resource types (memory, cpu, disk, network, etc.)

–  container_0 = 2GB

–  container_1 = 1GB

– Replaces fixed map/reduce slots (from Hadoop 1.x)

28

YARN – Resource Allocation & Usage!

•  ResourceRequest!–  Fine-‐grained resource ask to the ResourceManager –  Ask for a specific amount of resources (memory, cpu etc.) on a specific machine or rack

– Use special value of * for resource name for any machine

ResourceRequest!priority!

resourceName!

capability!

numContainers!

priority! capability! resourceName! numContainers!

!0!

!<2gb, 1 core>!

host01! 1!

rack0! 1!

*! 1!

1! <4gb, 1 core>! *! 1!

CGroup •  Linux Kernel capability to limit, account and isolate resources

–  CPU : Controlling the prioriza3on of processes in the group. Think of it as a more advanced nice level

– Memory : Allow for setng limits on RAM and swap usage –  Disk I/O – Network

•  YARN currently support, CPU / Memory

List of YARN Apps •  MapReduce (of course) •  Apache Tez

–  Apache Hive –  Apache Pig

•  Apache Hama - Iterative, Bulk Synchronous Parallel (BSP) engine •  Apache Giraph - Iterative, BSP-based Graph Analysis engine •  HBase on YARN (HOYA) •  Apache Storm – Real-time stream processing •  Apache Spark – Advanced DAG execution engine that supports cyclic data

flow and in-memory computing •  Apache S4 – Real-time processing •  Open MPI – Open source Message Passing Interface for HPC

http://wiki.apache.org/hadoop/PoweredByYarn

The YARN Book •  “Coming Soon”

• Expected by 2nd Quarter 2014

• Complete coverage of YARN

Modern Data Architecture •  Effective use of data – especially BIG Data – is enhanced when data is

co-located, enabling discovery and mining of unanticipated patterns. •  A “Data Lake” is the growing body of all data

–  Encompassing more than a single warehouse –  Data can con3nuously stream in to and out of the lake

Multi-Tenancy Requirements

Multi-Tenancy in one shared cluster •  Multiple Business Units

•  Multiple Applications

Requirements

•  Shared Processing Capacity

•  Shared Storage Capacity

•  Data Access Security

Multi-Tenancy: Capabilities • Group and User:

– Use of Linux and HDFS permissions to separate files and directories to create tenant boundaries – can be integrated with LDAP (or AD)

• Security – Used to enforce tenant boundaries – can be integrated with Kerberos

• Capacity: – Storage quota setup to manage consump3on – Capacity resource scheduler queues to balance shared processing resources between tenants – Use ACLs to define tenants

The Capacity Scheduler

•  Queues with priori3es

•  ACLs for job submit permissions

Capacity Sharing

FUNCT

ION

•  Max capacity per queue

•  User limits within queue

Capacity Enforcement

FUNCT

ION

•  Monitoring + Management Admin ACLs

•  Capacity-‐Scheduler.xml

Admin-‐istraSon

FUNCT

ION

Roadmap: Capacity Scheduling

37

Feature DescripSon

CS Pre-‐emp3on •  Enhance SLA support •  Re-‐claim capacity from tasks in queue that have

been over-‐scheduled

Queue Hierarchy •  Granular configura3on of queues •  Provide constraints across a set of queues

Node Labels •  Schedule tasks on specific cluster nodes •  Account for op3mized hardware

Container Isola3on •  Stronger isola3on of resources for each container, incorpora3ng CPU

CPU Scheduling •  Schedule and share CPU core capacity across tasks

Capacity Scheduler by example

Total Cluster capacity • 20 slots • 11 Mappers • 9 Reducers

Queue : ProducSon • Guarantee 70% resources • 14 slots – 8M / 6R

• Max 100%

Queue : Dev • Guarantee 10% resources • 2 slots – 1M / 1R

• Max 50%

Queue : Default • Guarantee 20% resources • 4 slots – 2M / 2R

• Max 80%

Hierarchical queues

39

root

Default 20%

Produc3on 70%

Dev 10%

DevOps 10%

Reserved 20%

Prod 70%

Test 80%

Eng 20%

P0 70%

P1 30%

CS: Example Queue Configuration •  Default: 10 users | Ad-hoc BI Query jobs etc. | General User SLAs •  Dev: 4 users | Ad-hoc Data Science Only (Pig+Mahout) | Lower SLAs •  Applications: 2 users | Batch ETL and Report Generation jobs | Production SLAs

40

Yarn.scheduler.capacity.root.default

Capacity ACLs

Min: 0.10 | Max: 0.20 | User Limit: 0.8 ‘Users’ group

Yarn.scheduler.capacity.root.dev

Capacity ACLs

Min: 0.10 | Max: 0.10 | User Limit: 0.5 ‘Engineering’ group

Yarn.scheduler.capacity.root.producSon

Capacity ACLs

Min: 0.20 | Max: 0.70 | User Limit: 1.0 ‘Applica3ons’ group

CS: Configuration •  yarn.scheduler.capacity.root.default.acl_administer_jobs=* •  yarn.scheduler.capacity.root.default.acl_submit_jobs=*

•  yarn.scheduler.capacity.root.default.capacity=100 •  yarn.scheduler.capacity.root.default.maximum-capacity=100 •  yarn.scheduler.capacity.root.default.user-limit-factor=1

• http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html

CS: Configuration •  yarn.scheduler.capacity.root.default.acl_administer_jobs=Admin •  yarn.scheduler.capacity.root.default.acl_submit_jobs=Users •  yarn.scheduler.capacity.root.default.capacity=10 •  yarn.scheduler.capacity.root.default.maximum-capacity=20 •  yarn.scheduler.capacity.root.default.user-limit-factor=0.8

•  yarn.scheduler.capacity.root.dev.acl_administer_jobs=Engineering •  yarn.scheduler.capacity.root.dev.acl_submit_jobs=Engineering •  yarn.scheduler.capacity.root.dev.capacity=10 •  yarn.scheduler.capacity.root.dev.maximum-capacity=10 •  yarn.scheduler.capacity.root.dev.user-limit-factor=0.5

•  yarn.scheduler.capacity.root.production.acl_administer_jobs=Applications •  yarn.scheduler.capacity.root.production.acl_submit_jobs=Admin •  yarn.scheduler.capacity.root.production.capacity=20 •  yarn.scheduler.capacity.root.production.maximum-capacity=70 •  yarn.scheduler.capacity.root.production.user-limit-factor=1.0

Capacity Scheduler by example •  Job 1 : Launch in production queue

–  Require 100 slots –  Get 14 slots at a 3me

Cluster resources

Produc3on

Development

Default

Idle

Capacity Scheduler by example •  Job 1 : Running in Production queue

– Using 14 slots •  Job 2 : Schedule in Development queue

–  Require 50 slots – Get 4 slots at a 3me Cluster resources

Produc3on

Development

Default

Idle

Capacity Scheduler by example •  Job 1 : Running in Production queue

–  98 complete, only 2 slots in use un3l finish

•  Job 2 : Schedule in Development queue –  Require 50 slots –  S3ll only getng 4 slots at a 3me Cluster resources

Produc3on

Development

Default

Idle

Summary •  YARN is the logical extension of Apache Hadoop

–  Complements HDFS, the data reservoir

•  Resource Management for the Enterprise Data Lake –  Shared, secure, mul3-‐tenant Hadoop

Allows for all processing in Hadoop

HDFS2 (Redundant, Reliable Storage)

YARN (Cluster Resource Management)

BATCH (MapReduce)

INTERACTIVE (Tez)

STREAMING (Storm, S4,…)

GRAPH (Giraph)

IN-‐MEMORY (Spark)

HPC MPI (OpenMPI)

ONLINE (HBase)

OTHER (Search) (Weave…)

Your Fastest On-ramp to Enterprise Hadoop™!

hZp://hortonworks.com/products/hortonworks-‐sandbox/

The Sandbox lets you experience Apache Hadoop from the convenience of your own laptop – no data center, no cloud and no internet connec3on needed! The Hortonworks Sandbox is: •  A free download: hZp://hortonworks.com/products/hortonworks-‐sandbox/ •  A complete, self contained virtual machine with Apache Hadoop pre-‐configured •  A personal, portable and standalone Hadoop environment •  A set of hands-‐on, step-‐by-‐step tutorials that allow you to learn and explore Hadoop

Ques3ons?


David Kaiser

scale 12 x efficient multi-tenant hadoop 2 workloads with yarn

Technology