yarn resource management using machine learning
TRANSCRIPT
YARN Resource Management Using Machine Learning
�
TrendMicro 劉一正 Tony Liu
About Me�• 劉一正 Tony Liu
• TrendMicro Staff Engineer
• Big Data platform Administrator
• TSMC Big Data Consultant Project
• Keep improving Big Data platform
Agenda�• Questions About YARN
• The ways to find the answers
• YARN resource consumption prediction
• Conclusion
Questions about YARN�
YARN Fair
Scheduler
What is the proper setting for container
What is the characteristics of jobs run in the cluster
How to properly allocate resource to queues
Why cluster has resources, but still has pending jobs
The ways to find the answers�• Appropriate configurations for
Container
• CPU bound / IO bound
• Queue resource consumption in the cluster
• Predict and allocate resources
Container SeAing
Job Characteristics
Proper Allocate Resource to Queue
Resource Prediction
My Thinking�Container SeAing
Job CPU / IO bound
• Correct container seAing• What’s the primary constraints• Number of containers in the
cluster• Memory calculation
Queue Status
• Queue status in the cluster• Allocate resource by Job SLA• Pending Job and Unused resource in queue• BoAleneck resource
Prediction
• Classify Job type: CPU bound or IO bound
• Predict resource consumption
• Allocate unused resource to queue according to job type
Appropriate configurations for Container�• Appropriate configurations for
Container
• CPU bound / IO bound
• Queue resource consumption in the cluster
• Predict and allocate resource
Container SeAing
Job Characteristics
Proper Allocate Resource to Queue
Resource Prediction
Appropriate configurations for Container�
Container • Total available resource - Available vmems: total memory – reserved memory - Available vcores: total cpu – reserved cpu• Number of YARN containers - concurrent processing min(vcores, 2 * Disks)• RAM per container max(2G, total available mem / number of containers)* reserved: for system and HBase
YARN
Container
Node Manager
Scheduler
Map Reduce
AM
Appropriate configurations for Container�
• yarn.nodemanager.resource.memory-mb = containers * RAM per container = total available vmems • yarn.nodemanager.resource.cpu-vcores = total cores – reserved cores = total available vcores
YARN NodeManager Resource YARN
Container
Node Manager
Scheduler
Map Reduce
AM
Appropriate configurations for Container�
• yarn.scheduler.minimum-allocation-mb = RAM per container • yarn.scheduler.maximum-allocation-mb = containers * RAM per container• yarn.scheduler.minimum-allocation-vcores = 1 • yarn.scheduler.maximum-allocation-vcores = total available cores
YARN Scheduler
YARN
Container
Node Manager
Scheduler
Map Reduce
AM
Appropriate configurations for Container�
• mapreduce.map.memory.mb = RAM per container • mapreduce.map.java.opts = 0.8 * RAM per container• mapreduce.map.cpu.vcores = 1 • mapreduce.map.disk = 0.5
Map
YARN
Container
Node Manager
Scheduler
Map Reduce
AM
Appropriate configurations for Container�
• mapreduce.reduce.memory.mb = 2 * RAM per container • mapreduce.reduce.java.opts = 0.8 * ( 2 * RAM per container)• mapreduce.reduce.cpu.vcores = 1 • mapreduce.reduce.disk = 1.33
Reduce
YARN
Container
Node Manager
Scheduler
Map Reduce
AM
Appropriate configurations for Container�
• yarn.app.mapreduce.am.resource. mb = 2 * RAM per container • yarn.app.mapreduce.am.command-opts = 0.8 * ( 2 * RAM per container)• yarn.app.mapreduce.am.resource.cpu-vcores = 1
AM
YARN
Container
Node Manager
Scheduler
Map Reduce
AM
Container Size – Memory Calculation�
r = Requested memoryThe logic works like below: a. Take max of(requested resource and minimum resource) = max(768, 512) = 768 b. roundup(768, StepFactor) = roundUp (768, 512) == 1279 (Approximately) Roundup does : ((768 + (512 -1)) / 512) * 512 c. min(roundup(512, stepFactor), maximumresource) = min(1279, 1024) = 1024So finally, the alloAed memory is 1024 MB, which is what you are geAing.
Container Size – Memory Calculation�
Map Container
Map Task
Map Container
Map asking 1500 MB memory per map containermapreduce.map.memory.mb = 1500yarn.scheduler.minimum-allocation-mb = 1024RM will allocate 2048 MB container2 * yarn.scheduler.minimum-allocation-mb
How Many Containers Launch�
• Map split (HDFS block size)
Input file
Map Container
Map Task
ReducerContainer
Application Master
Container
Map Task Map Task Map Task
Map Container
Map Container
Map Container
• Data locality (data located, rack located, any other NM)• Application Master
will re-aAempt tasks• 4 times fail task fail
• Require resource from Resource Manager• AM stops sending heartbeats, RM will re-aAempt• 2 times fail whole application fail
• mapred.job.reduces parameter
ReducerTask
• Reducers can be given resources before all the map tasks complete mapreduce.job.reduce.slowstart.completedmaps
• Wasting resources on process that are waiting for work• Potentially creating a deadlock when resources are constrained in a
shared environment
Observe the configuration�• Observe which configuration is best for you through TeraGen and TeraSort • hadoop jar $HADOOP_PATH/hadoop-examples.jar teragen -Dmapreduce.job.maps=$i -Dmapreduce.map.memory.mb=$k -Dmapreduce.map.java.opts.max.heap=$MAP_MB • hadoop jar $HADOOP_PATH/hadoop-examples.jar terasort -Dmapreduce.job.maps=$i -Dmapreduce.job.reduces=$j -Dmapreduce.map.memory.mb=$k -Dmapreduce.map.java.opts.max.heap=$MAP_MB -Dmapreduce.reduce.memory.mb=$k -Dmapreduce.reduce.java.opts.max.heap=$RED_MB
Container Resource Requirement Testing�
• Appropriate configurations for Container
• CPU bound / IO bound
• Queue resource consumption in the cluster
• Predict and allocate resource
Container SeAing
Job Characteristics
Proper AllocateResource to Queue
Resource Prediction
Job Characteristics�• Container is the basic unit of processing capacity in
YARN, and is an encapsulation of resource elements (memory, cpu etc.).
• Different jobs make different workloads on the
cluster, including the CPU-bound and I/O-bound
• So, what is the characteristics of the jobs running in the cluster ?
Job Characteristics�• Reference Tian et al., 2009 investigate the
characteristic of MapReduce jobs in a practical data center
• Define a classification model to classify MapReduce
jobs is belong to CPU-bound or I/O-bound
Job Characteristics�• In the Map-Shuffle phase
does five actions: 1) init input data 2) compute map task 3) store output result to local disk 4) shuffle map tasks result data 5) shuffle reduce input data in
Job Characteristics�• According to the utilization of I/O and CPU, classification
of workloads on the Map-Shuffle phase of MapReduce • MID: map input data • MOD: map output data • SOD: Shuffle out data (=MOD) • SID: Shuffle in data • MTCT: Map task completed time • DIOR: Disk I/O Rate(DFSIO I/O Rate) • n: Number of YARN containers(concurrent processing)
Job Characteristics� •
• CPU-Bound
• I/O-Bound
• DIOR: DFSIO
Job Characteristics�
Program MID MOD MTCT myspn_top_cve 1395184 620928 15185 myspn_top_url 54481169 52528135 9867 aggregate_url 286007534 1155960828 420225 USandbox Data Statistic
37612436 4921787 45423
file-solr-daily 75167686 4660452644 224488 aggregate_url_dedupe
639896245 561632270 73926
myspn_top_url_by_origin
499348380 506962079 53927
• Data source: Job history log
Job Characteristics�• Data source: Job history log • Test data set: 5,942 • Test mode: split 66% train, remainder test • Classifier model: RandomForest • Attributes: MID, MOD, MTCT, n, dior, lable === Summary === Correlation coefficient 0.9934 Mean absolute error 0.0099 Root mean squared error 0.0513 Relative absolute error 2.4872 % Root relative squared error 11.4997 % Total Number of Instances 2020
Job Characteristics�
0
200
400
600
800
1000
1200
IO Bound
CPU Bound
Queue Name
Numbers of jobs
Queue Type�
I/O Bound
domain_census myreppathcensus
CPU Bound
alps census census-oozie data_importer domain_census-oozie domain_census_ ews hdfs magicQ myspn
platinum platinum-oozie retroscanretrosplunkrnu spnungle threatconnect threathub threathub-oozie user
Thinking� • Besides base on the job’s SLA to allocate resource,
what factors should I consider too? - Job Characteristics? - Queue type?
Queue Resource Consumption�
• Appropriate configurations for Container
• CPU bound / IO bound
• Queue resource consumption in the cluster
• Predict and allocate resource
Container SeAing
Job Characteristics
Proper AllocateResource to Queue
ResourcePrediction
Cluster Resource Allocation�• YARN fair scheduler - yarn.scheduler.fair.allocation.file fair-scheduler.xml • The allocation file is reloaded every 10 seconds,
allowing changes to be made on the fly.
Cluster Resource Allocation�• Fair Scheduler - default queue: root - Hierarchical queues - placement policy - preemption - resource reserved • Cluster resource - FairShare
<memory: x, vcores: y>
Cluster Resource Allocation�• Queue Properties - minResources (soft limit)
- maxResources (hard limit)
- weight <weight>1.0</weight>
- maxRunningApps - schedulingPolicy
YARN
Research
Production Service
Marketing Report
adhoc • fifo• fair• drf
Queues
Analysis Cluster Status�• Retrieve YARN metrics from YARN REST APIs
• FileSystemCounter
• JobCounters
• Task Counters
Pending apps and Available Vcore�
0320
03
35
0350
04
05
0420
04
35
0450
05
05
0520
05
35
0550
06
05
0620
06
35
0650
07
05
0720
07
35
0750
08
05
0820
08
35
0850
09
05
0920
09
35
0950
10
05
1020
10
35
1050
appsPending
availableVCores
Time
100 %
50 %
0%
Vcore
Vcores Utilization�03
20
0335
03
50
0405
04
20
0435
04
50
0505
05
20
0535
05
50
0605
06
20
0635
06
50
0705
07
20
0735
07
50
0805
08
20
0835
08
50
0905
09
20
0935
09
50
1005
10
20
1035
10
50
total_vCores
used_vCores
100 %
50 %
0%
Vcore
Time
Vmemory Utilization�03
20
0335
03
50
0405
04
20
0435
04
50
0505
05
20
0535
05
50
0605
06
20
0635
06
50
0705
07
20
0735
07
50
0805
08
20
0835
08
50
0905
09
20
0935
09
50
1005
10
20
1035
10
50
used_memory
total_memory
100 %
50 % Vmemory
Time
0%
Cluster Resource Utilization�
Queue
Cluster Resource Utilization�
Queue
Cluster Resource Utilization�
Queue
BoAleneck Resource�• Vcores becomes bottleneck resource
Memory Usage: 41.5% VCores Usage: 99.5%
Over Fair Share�• Cluster still has resources
Over Fair Share�
Thinking� • Why cluster’s resource can’t be fully utilized?
• Is there any resource limitation? (bottleneck)
• How to reduce pending jobs when cluster still has resource?
Thinking� • Is it possible to predict when will has pending job in
the cluster?
• Can I predict the resource consumption at specific time and dynamic allocate to fully utilize cluster resource?
Predict Resource Consumption And Allocate Resource �
• Appropriate configurations for Container
• CPU bound / IO bound
• Queue resource consumption in the cluster
• Predict and allocate resource
Container Size
Job Characteristics
Proper AllocateResource to Queue
ResourcePrediction
YARN resource consumption prediction�
CollectMetrics
Data Processing
Training Model
Pre-procession Training Model
Evaluate
RMSE
Model Prediction
Prediction Queue
Consumption
Training Data �
Fields Description Process date date Ignoretime hour: 0 ~ 23 feature working day 0: working day
1: non-working day feature
weekday week day feature cluster_appsPending Pending apps in the cluster feature cluster_appsRunning Running apps in the cluster feature cluster_availableMB Available vmem in the cluster feature cluster_allocatedMB Allocated vmem in the cluster feature cluster_availableVcore Available vcore in the cluster feature cluster_allocatedVcore Allocated vcore in the cluster feature
• Data source: Job history log
Training Data �Fields Description Process queue_name Queue name featureminResources_memory Min vmem for queue feature minResources_vcores Min vcore for queue feature maxResources_memory Max vmem for queue feature maxResources_vcores Max vcore for queue feature numPendingApps Pending apps in queue feature numActiveApps Running apps in queue featureusedResources.memory Used vmem in queue feature usedResources.vcore Used vcore in queue feature label label (predict target) label
Training Model�• Training Model: Linear Regression • Predict: vcore
Training Model�• Training model: RandomForest • Predict: vcore • Data source: Job history log • Test data set: 109,736 • Test mode: split 66% train, remainder test • Attributes: 19 === Summary === Correlation coefficient 0.999 Mean absolute error 0.1262 Root mean squared error 0.8494 Relative absolute error 1.5905 % Root relative squared error 4.5017 % Total Number of Instances 37,310
Training Model�• Training Model: Linear Regression • Predict: vmemory
Training Model�• Training model: RandomForest • Predict: vmemory • Data source: Job history log • Test data set: 109,736 • Test mode: split 66% train, remainder test • Attributes: 19 === Summary === Correlation coefficient 0.9995 Mean absolute error 0.0003 Root mean squared error 0.0019 Relative absolute error 1.4174 % Root relative squared error 3.2014 % Total Number of Instances 37,310
Training Model�• Training model: RandomForest • Predict: Pending job • Data source: Job history log • Test data set: 122,120 • Test mode: split 66% train, remainder test • Attributes: 19 === Summary === Correlation coefficient 0.9917 Mean absolute error 0.0002 Root mean squared error 0.0054 Relative absolute error 7.9308 % Root relative squared error 14.4934 % Total Number of Instances 41,521
AAribute Evaluation�• Predict: Pending jobs • Attribute Evaluator: Information Gain • Ranked attributes :
ABribute Score maxResource_memory 1.14465 maxResource_vcore 1.04186 usedResource_memory 0.53004 usedResource_vcore 0.51167 minResource_memory 0.47563 numActiveApps 0.34418 minResource_vcore 0.3179
Experiment Result�• According to the prediction result, we reallocate
the resource of the queues which may has pending jobs on specific weekday.
• Experiment result:
Pending jobs reduce 82%
Pending jobs ratio Before 0.005 After 0.0009
Experiment Result�• Something you should know: - The total of queues’ minResources should less than the cluster fair share - Queue may not gets its minResources immediately - Preemption kills resources from other Queues to satisfy minResources, but also means waste resources
Experiment Result�• Something you should know: - Modify fair-scheduler.xml too frequently may cause ResourceManager weird - Failover ResourceManager will cause the jobs submit by oozie retry again - Does tight resource cluster need resource prediction?
Conclusion�• Deep understand the architecture is the key of
tuning and management.
• Think about are there any other tools good for my daily job? Even from different domain.
• Machine Learning has been used on many domains for prediction, it definitely can provide you different perspective.
Q & A�
Thank You�