Tomer Morad, EE PhD Postdoctoral Fellow, Cornell Tech
Founder of DatArcs
2
• Traditionally operators manually sweep through parameters and search for best configuration for a benchmark
Today
• Labor intensive • Exploration of only a small number of knobs per workload • Needs to be done on a regular basis • Workload requirements varies between phases
Challenges
• Dynamic Tuning!
Solution
3
100’s of different knobs to tune!
Hardware
• SMT • Cache
partitioning • Hardware
prefetching • Peripheral
power states • …
Firmware
• Power Management Unit (PMU), DVFS, Power States
• CPU Microcode • …
Operating System
• Task Scheduler • IO Scheduler • Page Cache • File
Prefetching Algorithm
• Memory Allocation Algorithm
• Affinity • …
Application
• Apache web server
• Choice of compiler
• Choice of libraries
• …
§ Source: Zhao, Zhengji, Nicholas J. Wright, and Katie Antypas. "Effects of Hyper-Threading on the NERSC workload on Edison." Proc. Cray User Group (2013).
4
Hyperthreading improves performance with NAMD
Hyperthreading degrades performance with VASP
§ Source: http://www.phoronix.com/scan.php?page=article&item=linux_iosched_2012&num=3
5
CFQ is better for “Initial Create” Deadline is better for “Read Compiled Tree”
6
Run threads only when the increase in throughput justifies the extra power
Sample performance counters
Candidate ←Next task by CFS
Schedule Candidate
EPInew<EPIcur
Schedule Idle thread
Context Switch / Scheduling Interrupt
No
Yes
Yes
More Candidates?
No
Source: Tomer Y. Morad, Noam Shalev, Idit Keidar, Avinoam Kolodny, Uri C. Weiser. “Energy-Efficient Task Assignment.” Submitted to the Journal of Parallel and Distributed Computing.
7
3.2.0
Energy-Friendly Scheduler
Quad-core Intel i5-2500
8
-20%
0%
20%
40%
60%
Ener
gy S
avin
gs a
nd S
peed
up
System Energy Savings
Chip Energy Savings
Speedup
Affected Benchmarks
Knows the knobs in my
system
Tunes fast and
automatically
Adapts Optimizes for my metric
9
10
Workload
Dynamic Tuner
Workload Classifier
Metrics Estimator
Metrics
Configuration Files
Knob Applier
Bucket
Estimations
Knob values
• Workload = currently running programs and their characteristics
• Bucket = a set of workloads that benefit from the same knob configuration
Objective: Classify
workload into buckets
• Number of useful buckets can’t be too low or too high
• Workload exhibits different behavior in each phase
Challenges
11
• Metrics = Energy, performance, etc. Objective:
Gather System Metrics
• Platform energy estimation • Performance estimation • Each server generation has different
counters
Challenges
12
Dynamic Tuner
• Tune server for current workload Objective:
• Knobs are not independent • Workload changes during the sampling period• Environmental factors (temperature) impact the
metrics • Balance between parameter exploration and
running the application normally
Challenges:
13
§ DatArcs Optimizer is in closed beta § We’re looking for partners from the HPC community § We wish to learn about the needs of the HPC community
§ Please sign up for the beta: www.datarcs.com/downloads
14
15