yarn resource management using machine learning

YARN Resource Management Using Machine Learning

�

TrendMicro 劉一正 Tony Liu

About Me�•  劉一正 Tony Liu

•  TrendMicro Staff Engineer

•  Big Data platform Administrator

•  TSMC Big Data Consultant Project

•  Keep improving Big Data platform

•  [email protected]; [email protected]

Agenda�•  Questions About YARN

•  The ways to find the answers

•  YARN resource consumption prediction

•  Conclusion

Questions about YARN�

YARN Fair

Scheduler

What is the proper setting for container

What is the characteristics of jobs run in the cluster

How to properly allocate resource to queues

Why cluster has resources, but still has pending jobs

The ways to find the answers�•  Appropriate configurations for

Container

•  CPU bound / IO bound

•  Queue resource consumption in the cluster

•  Predict and allocate resources

Container SeAing

Job Characteristics

Proper Allocate Resource to Queue

Resource Prediction

My Thinking�Container SeAing

Job CPU / IO bound

•  Correct container seAing•  What’s the primary constraints•  Number of containers in the

cluster•  Memory calculation

Queue Status

•  Queue status in the cluster•  Allocate resource by Job SLA•  Pending Job and Unused resource in queue•  BoAleneck resource

Prediction

•  Classify Job type: CPU bound or IO bound

•  Predict resource consumption

•  Allocate unused resource to queue according to job type

Appropriate configurations for Container�•  Appropriate configurations for

Container



•  Predict and allocate resource

Container SeAing

Job Characteristics

Proper Allocate Resource to Queue

Resource Prediction

Appropriate configurations for Container�

Container •  Total available resource - Available vmems: total memory – reserved memory - Available vcores: total cpu – reserved cpu•  Number of YARN containers - concurrent processing min(vcores, 2 * Disks)•  RAM per container max(2G, total available mem / number of containers)* reserved: for system and HBase

YARN

Container

Node Manager

Scheduler

Map Reduce

AM


•  yarn.nodemanager.resource.memory-mb = containers * RAM per container = total available vmems •  yarn.nodemanager.resource.cpu-vcores = total cores – reserved cores = total available vcores

YARN NodeManager Resource YARN

Container

Node Manager

Scheduler

Map Reduce

AM


•  yarn.scheduler.minimum-allocation-mb = RAM per container •  yarn.scheduler.maximum-allocation-mb = containers * RAM per container•  yarn.scheduler.minimum-allocation-vcores = 1 •  yarn.scheduler.maximum-allocation-vcores = total available cores

YARN Scheduler

YARN

Container

Node Manager

Scheduler

Map Reduce

AM


•  mapreduce.map.memory.mb = RAM per container •  mapreduce.map.java.opts = 0.8 * RAM per container•  mapreduce.map.cpu.vcores = 1 •  mapreduce.map.disk = 0.5

Map

YARN

Container

Node Manager

Scheduler

Map Reduce

AM


•  mapreduce.reduce.memory.mb = 2 * RAM per container •  mapreduce.reduce.java.opts = 0.8 * ( 2 * RAM per container)•  mapreduce.reduce.cpu.vcores = 1 •  mapreduce.reduce.disk = 1.33

Reduce

YARN

Container

Node Manager

Scheduler

Map Reduce

AM


•  yarn.app.mapreduce.am.resource. mb = 2 * RAM per container •  yarn.app.mapreduce.am.command-opts = 0.8 * ( 2 * RAM per container)•  yarn.app.mapreduce.am.resource.cpu-vcores = 1

AM

YARN

Container

Node Manager

Scheduler

Map Reduce

AM

Container Size – Memory Calculation�

r = Requested memoryThe logic works like below: a. Take max of(requested resource and minimum resource) = max(768, 512) = 768 b. roundup(768, StepFactor) = roundUp (768, 512) == 1279 (Approximately) Roundup does : ((768 + (512 -1)) / 512) * 512 c. min(roundup(512, stepFactor), maximumresource) = min(1279, 1024) = 1024So finally, the alloAed memory is 1024 MB, which is what you are geAing.

Container Size – Memory Calculation�

Map Container

Map Task

Map Container

Map asking 1500 MB memory per map containermapreduce.map.memory.mb = 1500yarn.scheduler.minimum-allocation-mb = 1024RM will allocate 2048 MB container2 * yarn.scheduler.minimum-allocation-mb

How Many Containers Launch�

•  Map split (HDFS block size)

Input file

Map Container

Map Task

ReducerContainer

Application Master

Container

Map Task Map Task Map Task

Map Container

Map Container

Map Container

•  Data locality (data located, rack located, any other NM)•  Application Master

will re-aAempt tasks•  4 times fail task fail

•  Require resource from Resource Manager•  AM stops sending heartbeats, RM will re-aAempt•  2 times fail whole application fail

•  mapred.job.reduces parameter

ReducerTask

•  Reducers can be given resources before all the map tasks complete mapreduce.job.reduce.slowstart.completedmaps

•  Wasting resources on process that are waiting for work•  Potentially creating a deadlock when resources are constrained in a

shared environment

Observe the configuration�•  Observe which configuration is best for you through TeraGen and TeraSort •  hadoop jar $HADOOP_PATH/hadoop-examples.jar teragen -Dmapreduce.job.maps=$i -Dmapreduce.map.memory.mb=$k -Dmapreduce.map.java.opts.max.heap=$MAP_MB •  hadoop jar $HADOOP_PATH/hadoop-examples.jar terasort -Dmapreduce.job.maps=$i -Dmapreduce.job.reduces=$j -Dmapreduce.map.memory.mb=$k -Dmapreduce.map.java.opts.max.heap=$MAP_MB -Dmapreduce.reduce.memory.mb=$k -Dmapreduce.reduce.java.opts.max.heap=$RED_MB

Container Resource Requirement Testing�

•  Appropriate configurations for Container




Container SeAing

Job Characteristics

Proper AllocateResource to Queue

Resource Prediction

Job Characteristics�•  Container is the basic unit of processing capacity in

YARN, and is an encapsulation of resource elements (memory, cpu etc.).

•  Different jobs make different workloads on the

cluster, including the CPU-bound and I/O-bound

•  So, what is the characteristics of the jobs running in the cluster ?

Job Characteristics�•  Reference Tian et al., 2009 investigate the

characteristic of MapReduce jobs in a practical data center

•  Define a classification model to classify MapReduce

jobs is belong to CPU-bound or I/O-bound

Job Characteristics�•  In the Map-Shuffle phase

does five actions: 1) init input data 2) compute map task 3) store output result to local disk 4) shuffle map tasks result data 5) shuffle reduce input data in

Job Characteristics�•  According to the utilization of I/O and CPU, classification

of workloads on the Map-Shuffle phase of MapReduce •  MID: map input data •  MOD: map output data •  SOD: Shuffle out data (=MOD) •  SID: Shuffle in data •  MTCT: Map task completed time •  DIOR: Disk I/O Rate(DFSIO I/O Rate) •  n: Number of YARN containers(concurrent processing)

Job Characteristics� • 

•  CPU-Bound

•  I/O-Bound

•  DIOR: DFSIO

Job Characteristics�

Program MID MOD MTCT myspn_top_cve 1395184 620928 15185 myspn_top_url 54481169 52528135 9867 aggregate_url 286007534 1155960828 420225 USandbox Data Statistic

37612436 4921787 45423

file-solr-daily 75167686 4660452644 224488 aggregate_url_dedupe

639896245 561632270 73926

myspn_top_url_by_origin

499348380 506962079 53927

•  Data source: Job history log

Job Characteristics�•  Data source: Job history log •  Test data set: 5,942 •  Test mode: split 66% train, remainder test •  Classifier model: RandomForest •  Attributes: MID, MOD, MTCT, n, dior, lable === Summary === Correlation coefficient 0.9934 Mean absolute error 0.0099 Root mean squared error 0.0513 Relative absolute error 2.4872 % Root relative squared error 11.4997 % Total Number of Instances 2020

Job Characteristics�

0

200

400

600

800

1000

1200

IO Bound

CPU Bound

Queue Name

Numbers of jobs

Queue Type�

I/O Bound

domain_census myreppathcensus

CPU Bound

alps census census-oozie data_importer domain_census-oozie domain_census_ ews hdfs magicQ myspn

platinum platinum-oozie retroscanretrosplunkrnu spnungle threatconnect threathub threathub-oozie user

Thinking� •  Besides base on the job’s SLA to allocate resource,

what factors should I consider too? - Job Characteristics? - Queue type?

Queue Resource Consumption�





Container SeAing

Job Characteristics


ResourcePrediction

Cluster Resource Allocation�•  YARN fair scheduler - yarn.scheduler.fair.allocation.file fair-scheduler.xml •  The allocation file is reloaded every 10 seconds,

allowing changes to be made on the fly.

Cluster Resource Allocation�•  Fair Scheduler - default queue: root - Hierarchical queues - placement policy - preemption - resource reserved •  Cluster resource - FairShare

<memory: x, vcores: y>

Cluster Resource Allocation�•  Queue Properties - minResources (soft limit)

- maxResources (hard limit)

- weight <weight>1.0</weight>

- maxRunningApps - schedulingPolicy

YARN

Research

Production Service

Marketing Report

adhoc •  fifo•  fair•  drf

Queues

Analysis Cluster Status�•  Retrieve YARN metrics from YARN REST APIs

•  FileSystemCounter

•  JobCounters

•  Task Counters

Pending apps and Available Vcore�

0320

03

35

0350

04

05

0420

04

35

0450

05

05

0520

05

35

0550

06

05

0620

06

35

0650

07

05

0720

07

35

0750

08

05

0820

08

35

0850

09

05

0920

09

35

0950

10

05

1020

10

35

1050

appsPending

availableVCores

Time

100 %

50 %

0%

Vcore

Vcores Utilization�03

20

0335

03

50

0405

04

20

0435

04

50

0505

05

20

0535

05

50

0605

06

20

0635

06

50

0705

07

20

0735

07

50

0805

08

20

0835

08

50

0905

09

20

0935

09

50

1005

10

20

1035

10

50

total_vCores

used_vCores

100 %

50 %

0%

Vcore

Time

Vmemory Utilization�03

20

0335

03

50

0405

04

20

0435

04

50

0505

05

20

0535

05

50

0605

06

20

0635

06

50

0705

07

20

0735

07

50

0805

08

20

0835

08

50

0905

09

20

0935

09

50

1005

10

20

1035

10

50

used_memory

total_memory

100 %

50 % Vmemory

Time

0%

Cluster Resource Utilization�

Queue

BoAleneck Resource�•  Vcores becomes bottleneck resource

Memory Usage: 41.5% VCores Usage: 99.5%

Over Fair Share�•  Cluster still has resources

Over Fair Share�

Thinking� •  Why cluster’s resource can’t be fully utilized?

•  Is there any resource limitation? (bottleneck)

•  How to reduce pending jobs when cluster still has resource?

Thinking� •  Is it possible to predict when will has pending job in

the cluster?

•  Can I predict the resource consumption at specific time and dynamic allocate to fully utilize cluster resource?

Predict Resource Consumption And Allocate Resource �





Container Size

Job Characteristics


ResourcePrediction

YARN resource consumption prediction�

CollectMetrics

Data Processing

Training Model

Pre-procession Training Model

Evaluate

RMSE

Model Prediction

Prediction Queue

Consumption

Training Data �

Fields Description Process date date Ignoretime hour: 0 ~ 23 feature working day 0: working day

1: non-working day feature

weekday week day feature cluster_appsPending Pending apps in the cluster feature cluster_appsRunning Running apps in the cluster feature cluster_availableMB Available vmem in the cluster feature cluster_allocatedMB Allocated vmem in the cluster feature cluster_availableVcore Available vcore in the cluster feature cluster_allocatedVcore Allocated vcore in the cluster feature

•  Data source: Job history log

Training Data �Fields Description Process queue_name Queue name featureminResources_memory Min vmem for queue feature minResources_vcores Min vcore for queue feature maxResources_memory Max vmem for queue feature maxResources_vcores Max vcore for queue feature numPendingApps Pending apps in queue feature numActiveApps Running apps in queue featureusedResources.memory Used vmem in queue feature usedResources.vcore Used vcore in queue feature label label (predict target) label

Training Model�•  Training Model: Linear Regression •  Predict: vcore

Training Model�•  Training model: RandomForest •  Predict: vcore •  Data source: Job history log •  Test data set: 109,736 •  Test mode: split 66% train, remainder test •  Attributes: 19 === Summary === Correlation coefficient 0.999 Mean absolute error 0.1262 Root mean squared error 0.8494 Relative absolute error 1.5905 % Root relative squared error 4.5017 % Total Number of Instances 37,310

Training Model�•  Training Model: Linear Regression •  Predict: vmemory

Training Model�•  Training model: RandomForest •  Predict: vmemory •  Data source: Job history log •  Test data set: 109,736 •  Test mode: split 66% train, remainder test •  Attributes: 19 === Summary === Correlation coefficient 0.9995 Mean absolute error 0.0003 Root mean squared error 0.0019 Relative absolute error 1.4174 % Root relative squared error 3.2014 % Total Number of Instances 37,310

Training Model�•  Training model: RandomForest •  Predict: Pending job •  Data source: Job history log •  Test data set: 122,120 •  Test mode: split 66% train, remainder test •  Attributes: 19 === Summary === Correlation coefficient 0.9917 Mean absolute error 0.0002 Root mean squared error 0.0054 Relative absolute error 7.9308 % Root relative squared error 14.4934 % Total Number of Instances 41,521

AAribute Evaluation�•  Predict: Pending jobs •  Attribute Evaluator: Information Gain •  Ranked attributes :

ABribute Score maxResource_memory 1.14465 maxResource_vcore 1.04186 usedResource_memory 0.53004 usedResource_vcore 0.51167 minResource_memory 0.47563 numActiveApps 0.34418 minResource_vcore 0.3179

Experiment Result�•  According to the prediction result, we reallocate

the resource of the queues which may has pending jobs on specific weekday.

•  Experiment result:

Pending jobs reduce 82%

Pending jobs ratio Before 0.005 After 0.0009

Experiment Result�•  Something you should know: - The total of queues’ minResources should less than the cluster fair share - Queue may not gets its minResources immediately - Preemption kills resources from other Queues to satisfy minResources, but also means waste resources

Experiment Result�•  Something you should know: - Modify fair-scheduler.xml too frequently may cause ResourceManager weird - Failover ResourceManager will cause the jobs submit by oozie retry again - Does tight resource cluster need resource prediction?

Conclusion�•  Deep understand the architecture is the key of

tuning and management.

•  Think about are there any other tools good for my daily job? Even from different domain.

•  Machine Learning has been used on many domains for prediction, it definitely can provide you different perspective.

Q & A�

Thank You�

yarn resource management using machine learning

Technology