scale 12 x efficient multi-tenant hadoop 2 workloads with yarn
DESCRIPTION
Hadoop is about so much more than batch processing. With the recent release of Hadoop 2, there have been significant changes to how a Hadoop cluster uses resources. YARN, the new resource management component, allows for a more efficient mix of workloads across hardware resources, and enables new applications and new processing paradigms such as stream-processing. This talk will discuss the new design and components of Hadoop 2, and examples of Modern Data Architectures that leverage Hadoop for maximum business efficiency.TRANSCRIPT
Hadoop 2: Efficient mul3-‐tenant workloads that enable the Modern Data Architecture
@ddkaiser linkedin.com/in/dkaiser facebook.com/dkaiser [email protected] [email protected]
SCALE 12X, Los Angeles February 23, 2014
David Kaiser
Who Am I?
David Kaiser
20+ years experience with Linux 3 years experience with Hadoop Career experiences: • Data Warehousing • Geospa3al Analy3cs • Open-‐source Solu3ons and Architecture
Employed at Hortonworks as a Senior Solu3ons Engineer
@ddkaiser linkedin.com/in/dkaiser facebook.com/dkaiser [email protected] [email protected]
Hadoop 2: Efficient multi-tenant workloads that enable the Modern Data Architecture
• Abstract: – Hadoop is about so much more than batch processing. With the recent release of Hadoop 2, there have been significant changes to how a Hadoop cluster uses resources.
– YARN, the new resource management component, allows for a more efficient mix of workloads across hardware resources, and enables new applica3ons and new processing paradigms such as stream-‐processing.
– This talk will discuss the new design and components of Hadoop 2, and provide examples of Modern Data Architectures that leverage Hadoop 2.
What is This Thing?
hZp://hadoop.apache.org/
Misconceptions • Bucket brigade for large or slow data processing tasks
Misconceptions • Bucket brigade for large or slow data processing tasks • Batch processor – Another mainframe
Misconceptions • Bucket brigade for large or slow data processing tasks • Batch processor – Another mainframe • Dumb/inflexible, trendy, too simple
Misconceptions • Incorrect assump3on that Java == SLOW
Misconceptions • Incorrect assump3on that Java == SLOW
Misconceptions • Incorrect assump3on that Java == SLOW
Misconceptions • Incorrect assump3on that Java == EVIL
Misconceptions • Incorrect assump3on that Java == EVIL
Hadoop + Linux
Provides a 100% Open-Source framework for efficient scalable data processing on commodity hardware
Commodity Hardware
Linux – The Open-‐source
Opera3ng System
Hadoop – The Open-‐source
Data Opera3ng System
Hadoop Fundamentals • Hadoop is a single system, across multiple Linux systems
• Two basic capabilities of Hadoop
– Reliable, Redundant and Distributed Storage – Distributed Computa3on
• Storage: Hadoop Distributed File System (HDFS) – Replicated, distributed filesystem – Blocks wriZen to underlying filesystem on mul3ple nodes
• Computation – Resource management – Frameworks to divide workloads across collec3on of resources
– Hadoop V1: MapReduce framework only
– Hadoop V2: MapReduce, Tez, Spark, others…
HDFS: File create lifecycle
Page 16
NameNode
RACK
1
RACK
2
RACK
3
FILE B1 B2 FILE
HDFS CLIENT
Create
B1 B2 B2
B2
B1
B1
ack
ack 2 1
ack
3
Complete 4
Hadoop 1 Computation • MapReduce Framework
– Combined both Resource Management and Applica3on Logic in the same code
• Limitations – Resource alloca3on units (slots) fixed per cluster – Difficult to use a cluster for differing or simultaneous workloads
The 1st Generation of Hadoop: Batch
HADOOP 1.0 Built for Web-‐Scale Batch Apps
Single App
BATCH
HDFS
Single App
INTERACTIVE
Single App
BATCH
HDFS
• All other usage paZerns must leverage that same infrastructure
• Forces the crea3on of silos for managing mixed workloads
Single App
BATCH
HDFS
Single App
ONLINE
Hadoop MapReduce Classic • JobTracker
– Manages cluster resources and job scheduling
• TaskTracker
– Per-‐node agent
– Manage tasks
Page 19
MapReduce Classic: Limitations
• Scalability – Maximum Cluster size – 4,000 nodes – Maximum concurrent tasks – 40,000 – Coarse synchroniza3on in JobTracker
• Availability
– Failure kills all queued and running jobs • Hard partition of resources into map and reduce slots – Low resource u3liza3on
• Lacks support for alternate paradigms and services
– Itera3ve applica3ons implemented using MapReduce are 10x slower
Page 20
Hadoop 1: Poor Utilization of Cluster Resources
Hard-‐Coded values. Task tracker must be restarted aker a change
Map tasks are wai3ng for the slots which are NOT currently used by reduce tasks
Hadoop 1 JobTracker and TaskTracker used fixed-‐sized “slots” for resource alloca3on
Hadoop 2: Moving Past MapReduce
Page 22
HADOOP 1.0
HDFS (redundant, reliable storage)
MapReduce (cluster resource management
& data processing)
HDFS2 (redundant, highly-‐available & reliable storage)
YARN (cluster resource management)
MapReduce (data processing)
Others
HADOOP 2.0
Single Use System Batch Apps
Mul/ Purpose Pla5orm Batch, Interac/ve, Online, Streaming, …
Apache Tez as the new Primitive
HDFS2 (redundant, reliable storage)
Tez (execu3on engine)
YARN (cluster resource management)
HADOOP 2.0
MapReduce as Base Apache Tez as Base
HDFS (redundant, reliable storage)
MapReduce (cluster resource management
& data processing)
Pig (data flow)
Hive (sql)
Others (cascading)
HADOOP 1.0 Data Flow
Pig SQL Hive
Others (cascading)
Batch MapReduce
?? (HOYA) (con3nuous execu3on)
Online Data
Processing HBase, Accumulo
Real Time Stream
Processing Storm
Tez – Execution Performance • Performance gains over Map Reduce
– Eliminate replicated write barrier between successive computa3ons. – Eliminate job launch overhead of workflow jobs. – Eliminate extra stage of map reads in every workflow job. – Eliminate queue and resource conten3on suffered by workflow jobs that are started aker a predecessor job completes.
Page 24
Pig/Hive -‐ MR Pig/Hive -‐ Tez
YARN: Taking Hadoop Beyond Batch
Page 25
ApplicaSons Run NaSvely in Hadoop
HDFS2 (Redundant, Reliable Storage)
YARN (Cluster Resource Management)
BATCH (MapReduce)
INTERACTIVE (Tez)
STREAMING (Storm, S4,…)
GRAPH (Giraph)
IN-‐MEMORY (Spark)
HPC MPI (OpenMPI)
ONLINE (HBase)
OTHER (Search) (Weave…)
Store ALL DATA in one place…
Interact with that data in MULTIPLE WAYS
with Predictable Performance and Quality of Service
YARN Overview • Goals:
– Reduce the responsibili3es of the JobTracker – Separate the resource management du3es away from the job coordina3on du3es
– Allow mul3ple simultaneous jobs – Enables different style and sized workloads in one cluster
• Design: – A separate Resource Manager
– 1 Global Resource Scheduler for the en3re cluster
– Each worker (slave) node runs a Node Manager, manages life-‐cycle of containers
– JobTracker is now called Applica3on Master – Each Applica3on has 1 Applica3on Master
– Manages applica3on scheduling and task execu3on
NodeManager NodeManager NodeManager NodeManager
ResourceManager
NodeManager NodeManager NodeManager NodeManager
NodeManager NodeManager NodeManager NodeManager
AM 1 AM2
Container 2.4 Container 2.1
Container 2.2
Container 2.3
Container 1.1
Container 1.2
Container 1.3
YARN Architecture
Scheduler
Client 1
Client 2
Capacity Sharing: Concepts • Application – Applica3on is a temporal job or a service submiZed to YARN – Examples
– Map Reduce Job (job)
– Storm topology (service)
• Container – Basic unit of alloca3on – Fine-‐grained resource alloca3on across mul3ple resource types (memory, cpu, disk, network, etc.)
– container_0 = 2GB
– container_1 = 1GB
– Replaces fixed map/reduce slots (from Hadoop 1.x)
28
YARN – Resource Allocation & Usage!
• ResourceRequest!– Fine-‐grained resource ask to the ResourceManager – Ask for a specific amount of resources (memory, cpu etc.) on a specific machine or rack
– Use special value of * for resource name for any machine
Page 29
ResourceRequest!priority!
resourceName!
capability!
numContainers!
priority! capability! resourceName! numContainers!
!0!
!<2gb, 1 core>!
host01! 1!
rack0! 1!
*! 1!
1! <4gb, 1 core>! *! 1!
CGroup • Linux Kernel capability to limit, account and isolate resources
– CPU : Controlling the prioriza3on of processes in the group. Think of it as a more advanced nice level
– Memory : Allow for setng limits on RAM and swap usage – Disk I/O – Network
• YARN currently support, CPU / Memory
List of YARN Apps • MapReduce (of course) • Apache Tez
– Apache Hive – Apache Pig
• Apache Hama - Iterative, Bulk Synchronous Parallel (BSP) engine • Apache Giraph - Iterative, BSP-based Graph Analysis engine • HBase on YARN (HOYA) • Apache Storm – Real-time stream processing • Apache Spark – Advanced DAG execution engine that supports cyclic data
flow and in-memory computing • Apache S4 – Real-time processing • Open MPI – Open source Message Passing Interface for HPC
http://wiki.apache.org/hadoop/PoweredByYarn
The YARN Book • “Coming Soon”
• Expected by 2nd Quarter 2014
• Complete coverage of YARN
Modern Data Architecture • Effective use of data – especially BIG Data – is enhanced when data is
co-located, enabling discovery and mining of unanticipated patterns. • A “Data Lake” is the growing body of all data
– Encompassing more than a single warehouse – Data can con3nuously stream in to and out of the lake
Multi-Tenancy Requirements
Multi-Tenancy in one shared cluster • Multiple Business Units
• Multiple Applications
Requirements
• Shared Processing Capacity
• Shared Storage Capacity
• Data Access Security
Page 34
Multi-Tenancy: Capabilities • Group and User:
– Use of Linux and HDFS permissions to separate files and directories to create tenant boundaries – can be integrated with LDAP (or AD)
• Security – Used to enforce tenant boundaries – can be integrated with Kerberos
• Capacity: – Storage quota setup to manage consump3on – Capacity resource scheduler queues to balance shared processing resources between tenants – Use ACLs to define tenants
Page 35
The Capacity Scheduler
Page 36
• Queues with priori3es
• ACLs for job submit permissions
Capacity Sharing
FUNCT
ION
• Max capacity per queue
• User limits within queue
Capacity Enforcement
FUNCT
ION
• Monitoring + Management Admin ACLs
• Capacity-‐Scheduler.xml
Admin-‐istraSon
FUNCT
ION
Roadmap: Capacity Scheduling
37
Feature DescripSon
CS Pre-‐emp3on • Enhance SLA support • Re-‐claim capacity from tasks in queue that have
been over-‐scheduled
Queue Hierarchy • Granular configura3on of queues • Provide constraints across a set of queues
Node Labels • Schedule tasks on specific cluster nodes • Account for op3mized hardware
Container Isola3on • Stronger isola3on of resources for each container, incorpora3ng CPU
CPU Scheduling • Schedule and share CPU core capacity across tasks
Capacity Scheduler by example
Total Cluster capacity • 20 slots • 11 Mappers • 9 Reducers
Queue : ProducSon • Guarantee 70% resources • 14 slots – 8M / 6R
• Max 100%
Queue : Dev • Guarantee 10% resources • 2 slots – 1M / 1R
• Max 50%
Queue : Default • Guarantee 20% resources • 4 slots – 2M / 2R
• Max 80%
Hierarchical queues
39
root
Default 20%
Produc3on 70%
Dev 10%
DevOps 10%
Reserved 20%
Prod 70%
Test 80%
Eng 20%
P0 70%
P1 30%
CS: Example Queue Configuration • Default: 10 users | Ad-hoc BI Query jobs etc. | General User SLAs • Dev: 4 users | Ad-hoc Data Science Only (Pig+Mahout) | Lower SLAs • Applications: 2 users | Batch ETL and Report Generation jobs | Production SLAs
40
Yarn.scheduler.capacity.root.default
Capacity ACLs
Min: 0.10 | Max: 0.20 | User Limit: 0.8 ‘Users’ group
Yarn.scheduler.capacity.root.dev
Capacity ACLs
Min: 0.10 | Max: 0.10 | User Limit: 0.5 ‘Engineering’ group
Yarn.scheduler.capacity.root.producSon
Capacity ACLs
Min: 0.20 | Max: 0.70 | User Limit: 1.0 ‘Applica3ons’ group
CS: Configuration • yarn.scheduler.capacity.root.default.acl_administer_jobs=* • yarn.scheduler.capacity.root.default.acl_submit_jobs=*
• yarn.scheduler.capacity.root.default.capacity=100 • yarn.scheduler.capacity.root.default.maximum-capacity=100 • yarn.scheduler.capacity.root.default.user-limit-factor=1
• http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
CS: Configuration • yarn.scheduler.capacity.root.default.acl_administer_jobs=Admin • yarn.scheduler.capacity.root.default.acl_submit_jobs=Users • yarn.scheduler.capacity.root.default.capacity=10 • yarn.scheduler.capacity.root.default.maximum-capacity=20 • yarn.scheduler.capacity.root.default.user-limit-factor=0.8
• yarn.scheduler.capacity.root.dev.acl_administer_jobs=Engineering • yarn.scheduler.capacity.root.dev.acl_submit_jobs=Engineering • yarn.scheduler.capacity.root.dev.capacity=10 • yarn.scheduler.capacity.root.dev.maximum-capacity=10 • yarn.scheduler.capacity.root.dev.user-limit-factor=0.5
• yarn.scheduler.capacity.root.production.acl_administer_jobs=Applications • yarn.scheduler.capacity.root.production.acl_submit_jobs=Admin • yarn.scheduler.capacity.root.production.capacity=20 • yarn.scheduler.capacity.root.production.maximum-capacity=70 • yarn.scheduler.capacity.root.production.user-limit-factor=1.0
Capacity Scheduler by example • Job 1 : Launch in production queue
– Require 100 slots – Get 14 slots at a 3me
Cluster resources
Produc3on
Development
Default
Idle
Capacity Scheduler by example • Job 1 : Running in Production queue
– Using 14 slots • Job 2 : Schedule in Development queue
– Require 50 slots – Get 4 slots at a 3me Cluster resources
Produc3on
Development
Default
Idle
Capacity Scheduler by example • Job 1 : Running in Production queue
– 98 complete, only 2 slots in use un3l finish
• Job 2 : Schedule in Development queue – Require 50 slots – S3ll only getng 4 slots at a 3me Cluster resources
Produc3on
Development
Default
Idle
Summary • YARN is the logical extension of Apache Hadoop
– Complements HDFS, the data reservoir
• Resource Management for the Enterprise Data Lake – Shared, secure, mul3-‐tenant Hadoop
Allows for all processing in Hadoop
Page 46
HDFS2 (Redundant, Reliable Storage)
YARN (Cluster Resource Management)
BATCH (MapReduce)
INTERACTIVE (Tez)
STREAMING (Storm, S4,…)
GRAPH (Giraph)
IN-‐MEMORY (Spark)
HPC MPI (OpenMPI)
ONLINE (HBase)
OTHER (Search) (Weave…)
Your Fastest On-ramp to Enterprise Hadoop™!
Page 47
hZp://hortonworks.com/products/hortonworks-‐sandbox/
The Sandbox lets you experience Apache Hadoop from the convenience of your own laptop – no data center, no cloud and no internet connec3on needed! The Hortonworks Sandbox is: • A free download: hZp://hortonworks.com/products/hortonworks-‐sandbox/ • A complete, self contained virtual machine with Apache Hadoop pre-‐configured • A personal, portable and standalone Hadoop environment • A set of hands-‐on, step-‐by-‐step tutorials that allow you to learn and explore Hadoop
Ques3ons?
@ddkaiser linkedin.com/in/dkaiser facebook.com/dkaiser [email protected] [email protected]
David Kaiser