performance prediction of hpc applications the simgrid project · performance prediction of hpc...
TRANSCRIPT
P❡r❢♦r♠❛♥❝❡ Pr❡❞✐❝t✐♦♥ ♦❢ ❍P❈ ❆♣♣❧✐❝❛t✐♦♥s❚❤❡ ❙✐♠●r✐❞ Pr♦❥❡❝t
❆✳ ▲❡❣r❛♥❞ ❛♥❞ t❤❡ ❙✐♠●r✐❞ t❡❛♠(M. Quinson, F. Suter, A. Degomme, L. Stanisic, B. Videau,
L. Genovese, S. Thibault, E. Aggulo, . . . )
❈◆❘❙✴■♥r✐❛✴❯♥✐✈❡rs✐t② ♦❢ ●r❡♥♦❜❧❡
▼❛✐s♦♥ ❞❡ ❧❛ s✐♠✉❧❛t✐♦♥▼❛② ✶✵✱ ✷✵✶✻
✶ ✴ ✸✵
❙✉♣❡r❝♦♠♣✉t❡rs
✶✵✵✱✵✵✵ t♦ ✶✱✵✵✵✱✵✵✵ ❝♦r❡s✱ ❛❝❝❡❧❡r❛t♦rs ✭●P❯✱ ❳❡♦♥ P❤✐✮ ❛♥❞ ❛❝♦♠♣❧❡① ❤✐❣❤ s♣❡❡❞ ✐♥t❡r❝♦♥♥❡❝t❆ ✇♦r❧❞✇✐❞❡ ❝♦♠♣❡t✐t✐♦♥ ✭❚♦♣✺✵✵✮
✷ ✴ ✸✵
❈♦♠♣❧❡① ❆♣♣❧✐❝❛t✐♦♥s
▲❛r❣❡r ❛♥❞ ❧❛r❣❡r s❝❛❧❡ ❤②❜r✐❞ ♠❛❝❤✐♥❡s Different programming approaches(e.g., in linear algebra applications)
“Rigid, hand tuned”
SuperLU
Task-based and Dynamic
MUMPS❆♥❛❧②s✐s ❛♥❞ ❈♦♠♣❛r✐s♦♥ ♦❢ ❚✇♦ ❉✐str✐❜✉t❡❞ ▼❡♠♦r② ❙♣❛rs❡ ❙♦❧✈❡rs
❆♠❡st♦②✱ ❉✉✛✱ ▲✬❡①❝❡❧❧❡♥t✱ ▲✐✳ ❆❈▼ ❚r❛♥s✳ ♦♥ ▼❛t❤✳ ❙♦❢t✇❛r❡✱ ❱♦❧✳ ✷✼✱ ◆♦✳ ✹✱ ✷✵✵✶✳
Multitude of technologies
▼P■ ❖♣❡♥▼P ❈✐❧❦ ❈❯❉❆ ❖♣❡♥❈▲ ❈❤❛r♠✰✰
❑❆❆P■ ❙t❛rP❯ ◗❯❆❘❑ P❛r❙❊❈ ❖♠♣❙s ✳✳✳
♥❡❡❞ ❢♦r ♣❡r❢♦r♠❛♥❝❡ ❡✈❛❧✉❛t✐♦♥ ♦♥ ❛ r❡❣✉❧❛r ❜❛s✐s
✸ ✴ ✸✵
❚♦✇❛r❞ ❊①❛s❝❛❧❡ ❄
❆❧r❡❛❞② ✐♥s❛♥❡❧② ❝♦♠♣❧❡① ♣❧❛t❢♦r♠s ❛♥❞ ❛♣♣❧✐❝❛t✐♦♥s ✇✐t❤ P❡t❛✲s❝❛❧❡s②st❡♠s✳ ❉♦ ✇❡ ❤❛✈❡ ❛ ❝❤❛♥❝❡ t♦ ✉♥❞❡rst❛♥❞ ❡①❛s❝❛❧❡ s②st❡♠s ❄
❊✉r♦♣❡❛♥ ❛♣♣r♦❛❝❤ t♦ ❊①❛s❝❛❧❡✿ ▼♦♥t✲❇❧❛♥❝❀ ❧♦✇✲♣♦✇❡r ❝♦♠♠♦❞✐t②❤❛r❞✇❛r❡ s✳❛✳ ❆❘▼✰●P❯s✰❊t❤❡r♥❡t
◆❡❡❞ ❢♦r ❛♣♣❧✐❝❛t✐♦♥ ♣❡r❢♦r♠❛♥❝❡ ♣r❡❞✐❝t✐♦♥ ❛♥❞ ❝❛♣❛❝✐t② ♣❧❛♥♥✐♥❣
▼P■ s✐♠✉❧❛t✐♦♥✿ ✇❤❛t ❢♦r ❄✶ ❍❡❧♣✐♥❣ ❛♣♣❧✐❝❛t✐♦♥ ❞❡✈❡❧♦♣❡rs
Non-intrusive tracing and repeatable executionClassical debugging tools (gdb, valgrind) can be usedSave computing resources (runs on your laptop if possible)
✷ ❍❡❧♣✐♥❣ ❛♣♣❧✐❝❛t✐♦♥ ✉s❡rs
How much resources should I ask for? (scaling)Configure MPI collective operationsProvide baseline
✸ ❈❛♣❛❝✐t② ♣❧❛♥♥✐♥❣ ✭❝❛♥ ✇❡ s❛✈❡ ♦♥ ❝♦♠♣♦♥❡♥ts❄ ✇❤❛t✲✐❢ ❛♥❛❧②s✐s✮
✹ ✴ ✸✵
❋❧♦✉r✐s❤✐♥❣ st❛t❡ ♦❢ t❤❡ ❆rt
❚❤❡r❡ ❛r❡ ♠❛♥② ❞✐✛❡r❡♥t ♣r♦❥❡❝ts✿
❉✐♠❡♠❛s ✭❇❙❈✱ ♣r♦❜❛❜❧② ♦♥❡ ♦❢ t❤❡ ❡❛r❧✐❡st✮P❙■◆❙ ✭❙❉❙❈✱ ✉s❡❞ t♦ r❡❧② ♦♥ ❉✐♠❡♠❛s✮❇✐❣❙✐♠ ✭❯■❯❈✮✿ ❇✐❣◆❡t❙✐♠ ♦r ❇✐❣❋❛st❙✐♠▲♦❣●♦♣❙✐♠ ✭❯■❯❈✴❊❚❍❩✮❙❙❚ ✭❙❛♥❞✐❛ ◆❛t✳ ▲❛❜✳✮✿ ▼✐❝r♦ ♦r ▼❛❝r♦✳ ✳ ✳
❙✐♠●r✐❞✿ ❆ ✶✻ ②❡❛rs ♦❧❞ ♦♣❡♥✲s♦✉r❝❡ ♣r♦❥❡❝t✳ ❈♦❧❧❛❜♦r❛t✐♦♥ ❜❡t✇❡❡♥ ❋r❛♥❝❡✭■◆❘■❆✱ ❈◆❘❙✱ ❯♥✐✈✳ ▲②♦♥✱ ◆❛♥❝②✱ ●r❡♥♦❜❧❡✱ ✳ ✳ ✳ ✮✱ ❯❙❆ ✭❯❈❙❉✱❯✳ ❍❛✇❛✐✐✮✱ ❯❑✱ ❆✉str✐❛ ✭❱✐❡♥♥❛✮✳ ✳ ✳
❤tt♣✿✴✴s✐♠❣r✐❞✳❣❢♦r❣❡✳✐♥r✐❛✳❢r
■♥✐t✐❛❧❧② ❢♦❝✉s❡❞ ♦♥ ●r✐❞ s❡tt✐♥❣s✱ ✇❡ ❛r❣✉❡ t❤❛t t❤❡ s❛♠❡t♦♦❧✴t❡❝❤♥✐q✉❡s ❝❛♥ ❜❡ ✉s❡❞ ❢♦r P✷P✱ ❍P❈ ❛♥❞ ❝❧♦✉❞
✺ ✴ ✸✵
❖✉t❧✐♥❡
✶ ❈♦♥t❡①t
✷ ❙✐♠●r✐❞ ❛♥❞ ▼P■✿ ❙▼P■❙▼P■❙✐♠●r✐❞ ❛♥❞ ❆▼P■
✸ ❙✐♠●r✐❞ ❛♥❞ ❙t❛rP❯Pr✐♥❝✐♣❧❡❉❡♥s❡ ▲✐♥❡❛r ❆❧❣❡❜r❛ ❆♣♣❧✐❝❛t✐♦♥s❙♣❛rs❡ ▲✐♥❡❛r ❆❧❣❡❜r❛ ❆♣♣❧✐❝❛t✐♦♥s
✹ ❈♦♥❝❧✉s✐♦♥
❙▼P■ ✕ ❖✤✐♥❡ ✈s✳ ❖♥❧✐♥❡ ❙✐♠✉❧❛t✐♦♥
Timed Trace[0.001000] 0 compute 1e6 0.01000
[0.010028] 0 send 1 1e6 0.009028
[0.040113] 0 recv 3 1e6 0.030085
[0.010028] 1 recv 0 1e6 0.010028
...
time slice
Visualization
Paje
TRIVA
<?xml version=1.0?><!DOCTYPE platform SYSTEM "simgrid.dtd"><platform version="3"><cluster id="griffon" prefix="griffon-" suffix=".grid5000.fr" radical="1-144" power="286.087kf" bw="125MBps" lat="24us" bb_bw="1.25GBps" bb_lat="0" sharing_policy="FULLDUPLEX" />
Platform DescriptionDownUp DownUp DownUp DownUp
10G1G
1−39 40−74 105−14475−104
13G
10G
Limiter
... ...... ...1.5G
1G
Limiter
DownUp
Simulated Execution Time43.232 seconds
Model the machine of your dreams
mpirun tau, PAPI
Trace once on a
simple cluster
SMPISimulated or Emulated
Computations
Simulated Communications
Time IndependentTrace
0 compute 1e6
0 send 1 1e6
0 recv 3 1e6
1 recv 0 1e6
1 compute 1e6
1 send 2 1e6
2 recv 1 1e6
2 compute 1e6
2 send 3 1e6
3 recv 2 1e6
3 compute 1e6
3 send 0 1e6
Replay the traceas many times as
you want
MPI Application
On-line: simulate/emulate unmodified complex applications
- Possible memory folding and shadow execution- Handles non-deterministic applications
Off-line: trace replay
❖✤✐♥❡ s✐♠✉❧❛t✐♦♥✶ ❖❜t❛✐♥ ❛ t✐♠❡ ✐♥❞❡♣❡♥❞❡♥t tr❛❝❡✷ ❘❡♣❧❛② ✐t ♦♥ t♦♣ ♦❢ ❙✐♠●r✐❞ ❛s ♦❢t❡♥ ❛s ❞❡s✐r❡❞✸ ❆♥❛❧②③❡ ✇✐t❤ t❤❡ ❝♦♠❢♦rt ♦❢ ❛ s✐♠✉❧❛t♦r
❋❛st✱ ❜✉t r❡q✉✐r❡s ❡①tr❛♣♦❧❛t✐♦♥ ❛♥❞ ❧✐♠✐t❡❞ t♦ ♥♦♥✲❛❞❛♣t✐✈❡ ❝♦❞❡s
✻ ✴ ✸✵
❙▼P■ ✕ ❖✤✐♥❡ ✈s✳ ❖♥❧✐♥❡ ❙✐♠✉❧❛t✐♦♥
Timed Trace[0.001000] 0 compute 1e6 0.01000
[0.010028] 0 send 1 1e6 0.009028
[0.040113] 0 recv 3 1e6 0.030085
[0.010028] 1 recv 0 1e6 0.010028
...
time slice
Visualization
Paje
TRIVA
<?xml version=1.0?><!DOCTYPE platform SYSTEM "simgrid.dtd"><platform version="3"><cluster id="griffon" prefix="griffon-" suffix=".grid5000.fr" radical="1-144" power="286.087kf" bw="125MBps" lat="24us" bb_bw="1.25GBps" bb_lat="0" sharing_policy="FULLDUPLEX" />
Platform DescriptionDownUp DownUp DownUp DownUp
10G1G
1−39 40−74 105−14475−104
13G
10G
Limiter
... ...... ...1.5G
1G
Limiter
DownUp
Simulated Execution Time43.232 seconds
Model the machine of your dreams
mpirun tau, PAPI
Trace once on a
simple cluster
SMPISimulated or Emulated
Computations
Simulated Communications
Time IndependentTrace
0 compute 1e6
0 send 1 1e6
0 recv 3 1e6
1 recv 0 1e6
1 compute 1e6
1 send 2 1e6
2 recv 1 1e6
2 compute 1e6
2 send 3 1e6
3 recv 2 1e6
3 compute 1e6
3 send 0 1e6
Replay the traceas many times as
you want
MPI Application
On-line: simulate/emulate unmodified complex applications
- Possible memory folding and shadow execution- Handles non-deterministic applications
Off-line: trace replay
❖♥❧✐♥❡ s✐♠✉❧❛t✐♦♥
❉✐r❡❝t❧② r✉♥ t❤❡ ❝♦❞❡ ♦♥ t♦♣ ♦❢ ❙✐♠●r✐❞✿ s♦✉r❝❡✲t♦✲s♦✉r❝❡ ✭P❤♦tr❛♥✱
❝♦❝❝✐♥❡❧❧❡✱ ❢✷❝✮✱ ●❖❚ ✐♥❥❡❝t✐♦♥✱ ❝♦♠♣✐❧❡r ♣❛ss ❢♦r ❚▲❙✱ ♠♠❛♣ t❤❡ ❞❛t❛ s❡❣♠❡♥t
P♦ss✐❜❧❡ ♠❡♠♦r② s❤❛r✐♥❣ ❜❡t✇❡❡♥ s✐♠✉❧❛t❡❞ ♣r♦❝❡ss❡s ✭r❡❞✉❝❡s♠❡♠♦r② ❢♦♦t♣r✐♥t✮ ❛♥❞ ❦❡r♥❡❧ s❛♠♣❧✐♥❣ ✭r❡❞✉❝❡s s✐♠✉❧❛t✐♦♥ t✐♠❡✮
❈♦♠♣❧✐❡s ✇✐t❤ ♠♦st ♦❢ t❤❡ ▼P■❈❍✸ t❡sts✉✐t❡✱ ❝♦♠♣❛t✐❜❧❡ ✇✐t❤ ♠❛♥② ❈❋✼✼ ❛♥❞ ❋✾✵ ❝♦❞❡s ✭◆❆❙✱ ▲✐♥P❆❈❑✱ ❙✇❡❡♣✸❉✱ ❇✐❣❉❋❚✱ ❙♣❡❝❋❊▼✸❉✮✻ ✴ ✸✵
P❡r❢♦r♠❛♥❝❡ ❊✈❛❧✉❛t✐♦♥ ❚❤r♦✉❣❤ ❋✐♥❡ ●r❛✐♥ ❙✐♠✉❧❛t✐♦♥
P❛❝❦❡t✲❧❡✈❡❧ ♠♦❞❡❧s ❢✉❧❧ s✐♠✉❧❛t✐♦♥ ♦❢ t❤❡ ✇❤♦❧❡ ♣r♦t♦❝♦❧ st❛❝❦ s♦ ❤♦♣❡❢✉❧❧②♣❡r❢❡❝t✱ ❜✉t
❝♦♠♣❧❡① ♠♦❞❡❧s ❤❛r❞ t♦ ✐♥st❛♥t✐❛t❡ ❛♥❞ ✉♥st❛❜❧❡❋❧♦r❡s ▲✉❝✐♦✱ P❛r❡❞❡s✲❋❛rr❡r❛✱ ❏❛♠♠❡❤✱ ❋❧❡✉r②✱ ❘❡❡❞✳ ❖♣♥❡t ♠♦❞❡❧❡r ❛♥❞ ♥s✲✷✿
❈♦♠♣❛r✐♥❣ t❤❡ ❛❝❝✉r❛❝② ♦❢ ♥❡t✇♦r❦ s✐♠✉❧❛t♦rs ❢♦r ♣❛❝❦❡t✲❧❡✈❡❧ ❛♥❛❧②s✐s ✉s✐♥❣ ❛ ♥❡t✇♦r❦
t❡st❜❡❞✳ ❲❙❊❆❙ ❚r❛♥s❛❝t✐♦♥s ♦♥ ❈♦♠♣✉t❡rs ✷✱ ♥♦✳ ✸ ✭✷✵✵✸✮
✐♥❤❡r❡♥t❧② s❧♦✇ ✭♣❛r❛❧❧❡❧✐s♠ ✇♦♥✬t s❛✈❡ ②♦✉ ❤❡r❡✦✮
s♦♠❡t✐♠❡s ✇r♦♥❣❧② ✐♠♣❧❡♠❡♥t❡❞
✇❤♦ ❝❛♥ ✉♥❞❡rst❛♥❞ t❤❡ ♠❛❝r♦s❝♦♣✐❝ ❜❡❤❛✈✐♦r ♦❢ t❤❡ ❛♣♣❧✐❝❛t✐♦♥ ❄
❲❤❡♥ ✇♦r❦✐♥❣ ❛t t❤❡ ❛♣♣❧✐❝❛t✐♦♥ ❧❡✈❡❧✱ t❤❡r❡ ✐s ❛ ♥❡❡❞ ❢♦r s♦♠❡t❤✐♥❣ ♠♦r❡❤✐❣❤ ❧❡✈❡❧ t❤❛t r❡✢❡❝ts t❤❡ ♠❛❝r♦s❝♦♣✐❝ ❝❤❛r❛❝t❡r✐st✐❝s ♦❢ t❤❡ ♠❛❝❤✐♥❡s
✼ ✴ ✸✵
P❡r❢♦r♠❛♥❝❡ ❊✈❛❧✉❛t✐♦♥ ❚❤r♦✉❣❤ ❋✐♥❡ ●r❛✐♥ ❙✐♠✉❧❛t✐♦♥
P❛❝❦❡t✲❧❡✈❡❧ ♠♦❞❡❧s ❢✉❧❧ s✐♠✉❧❛t✐♦♥ ♦❢ t❤❡ ✇❤♦❧❡ ♣r♦t♦❝♦❧ st❛❝❦ s♦ ❤♦♣❡❢✉❧❧②♣❡r❢❡❝t✱ ❜✉t
❝♦♠♣❧❡① ♠♦❞❡❧s ❤❛r❞ t♦ ✐♥st❛♥t✐❛t❡ ❛♥❞ ✉♥st❛❜❧❡❋❧♦r❡s ▲✉❝✐♦✱ P❛r❡❞❡s✲❋❛rr❡r❛✱ ❏❛♠♠❡❤✱ ❋❧❡✉r②✱ ❘❡❡❞✳ ❖♣♥❡t ♠♦❞❡❧❡r ❛♥❞ ♥s✲✷✿
❈♦♠♣❛r✐♥❣ t❤❡ ❛❝❝✉r❛❝② ♦❢ ♥❡t✇♦r❦ s✐♠✉❧❛t♦rs ❢♦r ♣❛❝❦❡t✲❧❡✈❡❧ ❛♥❛❧②s✐s ✉s✐♥❣ ❛ ♥❡t✇♦r❦
t❡st❜❡❞✳ ❲❙❊❆❙ ❚r❛♥s❛❝t✐♦♥s ♦♥ ❈♦♠♣✉t❡rs ✷✱ ♥♦✳ ✸ ✭✷✵✵✸✮
✐♥❤❡r❡♥t❧② s❧♦✇ ✭♣❛r❛❧❧❡❧✐s♠ ✇♦♥✬t s❛✈❡ ②♦✉ ❤❡r❡✦✮
s♦♠❡t✐♠❡s ✇r♦♥❣❧② ✐♠♣❧❡♠❡♥t❡❞
✇❤♦ ❝❛♥ ✉♥❞❡rst❛♥❞ t❤❡ ♠❛❝r♦s❝♦♣✐❝ ❜❡❤❛✈✐♦r ♦❢ t❤❡ ❛♣♣❧✐❝❛t✐♦♥ ❄
❲❤❡♥ ✇♦r❦✐♥❣ ❛t t❤❡ ❛♣♣❧✐❝❛t✐♦♥ ❧❡✈❡❧✱ t❤❡r❡ ✐s ❛ ♥❡❡❞ ❢♦r s♦♠❡t❤✐♥❣ ♠♦r❡❤✐❣❤ ❧❡✈❡❧ t❤❛t r❡✢❡❝ts t❤❡ ♠❛❝r♦s❝♦♣✐❝ ❝❤❛r❛❝t❡r✐st✐❝s ♦❢ t❤❡ ♠❛❝❤✐♥❡s
✼ ✴ ✸✵
▲♦❣●P❙ ✐♥ ❛ ◆✉ts❤❡❧❧
❚❤❡ ▲♦❣P ♠♦❞❡❧ ✇❛s ✐♥✐t✐❛❧❧② ❞❡s✐❣♥❡❞ ❢♦r ❝♦♠♣❧❡①✐t② ❛♥❛❧②s✐s ❛♥❞❛❧❣♦r✐t❤♠ ❞❡s✐❣♥✳ ▼❛♥② ✈❛r✐❛t✐♦♥s ❛✈❛✐❧❛❜❧❡ t♦ ❛❝❝♦✉♥t ❢♦r ♣r♦t♦❝♦❧ s✇✐t❝❤
Pr
Ps
T✶ T✷ T✸ T✷ T✸T✺ T✶T✹
Ps
Pr
ts
tr
❆s②♥❝❤r♦♥♦✉s ♠♦❞❡ ✭k 6 S ✮ ❘❡♥❞❡③✲✈♦✉s ♠♦❞❡ ✭k > S✮
❚❤❡ Ti ✬s ❛r❡ ❜❛s✐❝❛❧❧② ❝♦♥t✐♥✉♦✉s ❧✐♥❡❛r ❢✉♥❝t✐♦♥s✳
T✶ = o+kOs T✷ =
{
L+ kg ✐❢ k < s
L+ sg + (k − s)G ♦t❤❡r✇✐s❡
T✸ = o+kOr T✹ = ♠❛①(L+o, tr−ts)+o T✺ = ✷o + L
✽ ✴ ✸✵
▲♦❣●P❙ ✐♥ ❛ ◆✉ts❤❡❧❧
❚❤❡ ▲♦❣P ♠♦❞❡❧ ✇❛s ✐♥✐t✐❛❧❧② ❞❡s✐❣♥❡❞ ❢♦r ❝♦♠♣❧❡①✐t② ❛♥❛❧②s✐s ❛♥❞❛❧❣♦r✐t❤♠ ❞❡s✐❣♥✳ ▼❛♥② ✈❛r✐❛t✐♦♥s ❛✈❛✐❧❛❜❧❡ t♦ ❛❝❝♦✉♥t ❢♦r ♣r♦t♦❝♦❧ s✇✐t❝❤
Pr
Ps
T✶ T✷ T✸ T✷ T✸T✺ T✶T✹
Ps
Pr
ts
tr
❆s②♥❝❤r♦♥♦✉s ♠♦❞❡ ✭k 6 S ✮ ❘❡♥❞❡③✲✈♦✉s ♠♦❞❡ ✭k > S✮
❚❤❡ Ti ✬s ❛r❡ ❜❛s✐❝❛❧❧② ❝♦♥t✐♥✉♦✉s ❧✐♥❡❛r ❢✉♥❝t✐♦♥s✳❘♦✉t✐♥❡ ❈♦♥❞✐t✐♦♥ ❈♦st
▼P■❴❙❡♥❞ k 6 S T✶
k > S T✹ + T✺ + T✶
▼P■❴❘❡❝✈ k 6 S ♠❛①(T✶ + T✷ − (tr − ts), ✵) + T✸
k > S ♠❛①(o + L− (tr − ts), ✵) + o+T✺ + T✶ + T✷ + T✸
▼P■❴■s❡♥❞ o
▼P■❴■r❡❝✈ o
✽ ✴ ✸✵
▲♦❣●P❙ ✐♥ ❛ ◆✉ts❤❡❧❧
❚❤❡ ▲♦❣P ♠♦❞❡❧ ✇❛s ✐♥✐t✐❛❧❧② ❞❡s✐❣♥❡❞ ❢♦r ❝♦♠♣❧❡①✐t② ❛♥❛❧②s✐s ❛♥❞❛❧❣♦r✐t❤♠ ❞❡s✐❣♥✳ ▼❛♥② ✈❛r✐❛t✐♦♥s ❛✈❛✐❧❛❜❧❡ t♦ ❛❝❝♦✉♥t ❢♦r ♣r♦t♦❝♦❧ s✇✐t❝❤
Pr
Ps
T✶ T✷ T✸ T✷ T✸T✺ T✶T✹
Ps
Pr
ts
tr
❆s②♥❝❤r♦♥♦✉s ♠♦❞❡ ✭k 6 S ✮ ❘❡♥❞❡③✲✈♦✉s ♠♦❞❡ ✭k > S✮
❚❤❡ Ti ✬s ❛r❡ ❜❛s✐❝❛❧❧② ❝♦♥t✐♥✉♦✉s ❧✐♥❡❛r ❢✉♥❝t✐♦♥s✳
▼❛② r❡✢❡❝t t❤❡ ♦♣❡r❛t✐♦♥ ♦❢ s♣❡❝✐❛❧✐③❡❞ ❍P❈ ♥❡t✇♦r❦s ❢r♦♠ t❤❡ ❡❛r❧②✶✾✾✵s✳ ✳ ✳
■❣♥♦r❡s ♣♦t❡♥t✐❛❧❧② ❝♦♥❢♦✉♥❞✐♥❣ ❢❛❝t♦rs ♣r❡s❡♥t ✐♥ ♠♦❞❡r♥✲❞❛② s②st❡♠s✭❡✳❣✳✱ ❝♦♥t❡♥t✐♦♥✱ t♦♣♦❧♦❣②✱ ❝♦♠♣❧❡① ♣r♦t♦❝♦❧ st❛❝❦✱ ✳ ✳ ✳ ✮
❯♥❧❡ss ②♦✉ ❤❛✈❡ ❛ ✇❡❧❧✲t✉♥❡❞ ❤✐❣❤✲❡♥❞ ♠❛❝❤✐♥❡✱ s✉❝❤ ♠♦❞❡❧ ✐s ✉♥❧✐❦❡❧②t♦ ♣r♦✈✐❞❡ ❛❝❝✉r❛t❡ ❡st✐♠❛t✐♦♥s ♦r ✉s❡❢✉❧ ❜❛s❡❧✐♥❡ ❝♦♠♣❛r✐s♦♥s
✽ ✴ ✸✵
▼P■ P♦✐♥t✲t♦✲P♦✐♥t ❈♦♠♠✉♥✐❝❛t✐♦♥
❘❛♥❞♦♠✐③❡❞ ♠❡❛s✉r❡♠❡♥ts ✭❖♣❡♥▼P■✴❚❈P✴❊t❤✶●❇✮ s✐♥❝❡ ✇❡ ❛r❡ ♥♦t✐♥t❡r❡st❡❞ ✐♥ ♣❡❛❦ ♣❡r❢♦r♠❛♥❝❡s ❜✉t ✐♥ ♣❡r❢♦r♠❛♥❝❡ ❝❤❛r❛❝t❡r✐③❛t✐♦♥
Small
Medium1Medium2
Detached
Small
Medium1Medium2
Detached
MPI_Send MPI_Recv
1e−04
1e−02
1e+01 1e+03 1e+05 1e+01 1e+03 1e+05Message size (bytes)
Dur
atio
n (s
econ
ds) group
Small
Medium1
Medium2
Detached
Large
❚❤❡r❡ ✐s ❛ q✉✐t❡ ✐♠♣♦rt❛♥t ✈❛r✐❛❜✐❧✐t②
❚❤❡r❡ ❛r❡ ❛t ❧❡❛st ✹ ❞✐✛❡r❡♥t ♠♦❞❡s
■t ✐s ♣✐❡❝❡✲✇✐s❡ ❧✐♥❡❛r ❛♥❞ ❞✐s❝♦♥t✐♥✉♦✉s
✾ ✴ ✸✵
❙▼P■ ✕ ❍②❜r✐❞ ▼♦❞❡❧
❙▼P■ ❝♦♠❜✐♥❡s ❛❝❝✉r❛t❡ ❞❡s❝r✐♣t✐♦♥ ♦❢ t❤❡ ♣❧❛t❢♦r♠✱ ✇✐t❤ ❜♦t❤ ✢✉✐❞ ❛♥❞▲♦❣P ❢❛♠✐❧② ♠♦❞❡❧s✿
▲♦❣P✿ ♠❡❛s✉r❡ ♦♥ r❡❛❧ ♥♦❞❡s t♦ ❛❝❝✉r❛t❡❧② ♠♦❞❡❧ ♣t✷♣t ♣❡r❢♦r♠❛♥❝❡✭❞✐s❝♦♥t✐♥✉✐t✐❡s✮ ❛♥❞ ❝♦♠♠✉♥✐❝❛t✐♦♥ ♠♦❞❡s ✭❛s②♥❝❤r♦♥♦✉s✱ ❞❡t❛❝❤❡❞✱s②♥❝❤r♦♥♦✉s✮
Ps
Pr
T2T4
T1
Ps
Pr
T4 T2
Pr
Ps
T1 T2 T3
❋❧✉✐❞ ♠♦❞❡❧ ✭❜❛♥❞✇✐❞t❤ s❤❛r✐♥❣✮✿ ❛❝❝♦✉♥t ❢♦r ❝♦♥t❡♥t✐♦♥ ❛♥❞ ♥❡t✇♦r❦t♦♣♦❧♦❣②
1-39 40-74 105-14475-104
1G
10G
DownUp DownUp DownUp DownUp
10G1G
1−39 40−74 105−14475−104
13G
10G
Limiter
... ...... ...1.5G
1G
Limiter
DownUp
✶✵ ✴ ✸✵
❈♦❧❧❡❝t✐✈❡ ❈♦♠♠✉♥✐❝❛t✐♦♥s
❈❧❛ss✐❝❛❧ ❛♣♣r♦❛❝❤❡s✿
✉s❡ s✐♠♣❧❡ ❛♥❛❧②t✐❝❛❧ ❢♦r♠✉❧❛s
❜❡♥❝❤♠❛r❦ ❡✈❡r②t❤✐♥❣ ❛♥❞ ✐♥❥❡❝t ❝♦rr❡s♣♦♥❞✐♥❣ t✐♠✐♥❣
tr❛❝❡ ❝♦♠♠✉♥✐❝❛t✐♦♥ ♣❛tt❡r♥ ❛♥❞ r❡♣❧❛②
❘❡❛❧ ▼P■ ✐♠♣❧❡♠❡♥t❛t✐♦♥s ❤❛✈❡ s❡✈❡r❛❧ ✐♠♣❧❡♠❡♥t❛t✐♦♥s ❢♦r ❡❛❝❤ ❝♦❧❧❡❝t✐✈❡❛♥❞ s❡❧❡❝t t❤❡ r✐❣❤t ♦♥❡ ❛t r✉♥t✐♠❡
✷✸✵✵ ❧✐♥❡s ♦❢ ❝♦❞❡ ❢♦r t❤❡ ❆❧❧❘❡❞✉❝❡ ✐♥ ❖♣❡♥▼P■✦✦✦
❙▼P■ ♥♦✇ ✉s❡s
♠♦r❡ t❤❛♥ ✶✵✵ ❝♦❧❧❡❝t✐✈❡ ❛❧❣♦r✐t❤♠s ❢r♦♠ t❤r❡❡❡①✐st✐♥❣ ✐♠♣❧❡♠❡♥t❛t✐♦♥s ✭▼P■❈❍✱ ❖♣❡♥▼P■✱❙❚❆❘✲▼P■✮ ❝❛♥ ❜❡ s❡❧❡❝t❡❞
t❤❡ s❛♠❡ s❡❧❡❝t✐♦♥ ❧♦❣✐❝ ❛s ▼P■❈❍ ♦r ❖♣❡♥▼P■t♦ ❛❝❝✉r❛t❡❧② s✐♠✉❧❛t❡ t❤❡✐r ❜❡❤❛✈✐♦r
❙✉❝❤ ❛❝❝✉r❛t❡ ♠♦❞❡❧✐♥❣ ✐s ❛❝t✉❛❧❧② ❝r✐t✐❝❛❧ t♦ ♦❜t❛✐♥ ❞❡❝❡♥t ♣r❡❞✐❝t✐♦♥s
✶✶ ✴ ✸✵
❈♦❧❧❡❝t✐✈❡ ❈♦♠♠✉♥✐❝❛t✐♦♥s
❈❧❛ss✐❝❛❧ ❛♣♣r♦❛❝❤❡s✿
✉s❡ s✐♠♣❧❡ ❛♥❛❧②t✐❝❛❧ ❢♦r♠✉❧❛s
❜❡♥❝❤♠❛r❦ ❡✈❡r②t❤✐♥❣ ❛♥❞ ✐♥❥❡❝t ❝♦rr❡s♣♦♥❞✐♥❣ t✐♠✐♥❣
tr❛❝❡ ❝♦♠♠✉♥✐❝❛t✐♦♥ ♣❛tt❡r♥ ❛♥❞ r❡♣❧❛②
❘❡❛❧ ▼P■ ✐♠♣❧❡♠❡♥t❛t✐♦♥s ❤❛✈❡ s❡✈❡r❛❧ ✐♠♣❧❡♠❡♥t❛t✐♦♥s ❢♦r ❡❛❝❤ ❝♦❧❧❡❝t✐✈❡❛♥❞ s❡❧❡❝t t❤❡ r✐❣❤t ♦♥❡ ❛t r✉♥t✐♠❡
✷✸✵✵ ❧✐♥❡s ♦❢ ❝♦❞❡ ❢♦r t❤❡ ❆❧❧❘❡❞✉❝❡ ✐♥ ❖♣❡♥▼P■✦✦✦
❙▼P■ ♥♦✇ ✉s❡s
♠♦r❡ t❤❛♥ ✶✵✵ ❝♦❧❧❡❝t✐✈❡ ❛❧❣♦r✐t❤♠s ❢r♦♠ t❤r❡❡❡①✐st✐♥❣ ✐♠♣❧❡♠❡♥t❛t✐♦♥s ✭▼P■❈❍✱ ❖♣❡♥▼P■✱❙❚❆❘✲▼P■✮ ❝❛♥ ❜❡ s❡❧❡❝t❡❞
t❤❡ s❛♠❡ s❡❧❡❝t✐♦♥ ❧♦❣✐❝ ❛s ▼P■❈❍ ♦r ❖♣❡♥▼P■t♦ ❛❝❝✉r❛t❡❧② s✐♠✉❧❛t❡ t❤❡✐r ❜❡❤❛✈✐♦r
❙✉❝❤ ❛❝❝✉r❛t❡ ♠♦❞❡❧✐♥❣ ✐s ❛❝t✉❛❧❧② ❝r✐t✐❝❛❧ t♦ ♦❜t❛✐♥ ❞❡❝❡♥t ♣r❡❞✐❝t✐♦♥s✶✶ ✴ ✸✵
❱❛❧✐❞❛t✐♦♥✿ ◆♦♥✲tr✐✈✐❛❧ ❆♣♣❧✐❝❛t✐♦♥ ❙❝❛❧✐♥❣ ✭✶✮
❊①♣❡r✐♠❡♥ts r✉♥ ✇✐t❤ s❡✈❡r❛❧ ◆❆❙ ♣❛r❛❧❧❡❧ ❜❡♥❝❤♠❛r❦s t♦ ✭✐♥✮✈❛❧✐❞❛t❡ t❤❡♠♦❞❡❧ ❢♦r ❚❈P ♣❧❛t❢♦r♠
◆♦♥ tr✐✈✐❛❧ s❝❛❧✐♥❣
❱❡r② ❣♦♦❞ ❛❝❝✉r❛❝② ✭❡s♣❡❝✐❛❧❧② ❝♦♠♣❛r❡❞ t♦ ▲♦❣P✮
✉♥❧❡ss ❝♦♥t❡♥t✐♦♥ ❞r✐✈❡s ❚❈P ✐♥ ❛ ❝r❛③② st❛t❡✳ ✳ ✳
B C
qqq
q
q
q
10
15
20
25
30
CG
6432 128168 6432 128168
Number of nodes
Sp
ee
du
p
Model
q Real
LogGPS
Network collapse
SMPI
▼❛ss✐✈❡ s✇✐t❝❤ ♣❛❝❦❡t ❞r♦♣s ❧❡❛❞ t♦ ✷✵✵♠s t✐♠❡♦✉ts ✐♥ ❚❈P✦❚❤✐s ✐s ❛ s♦❢t✇❛r❡ ✐ss✉❡ t❤❛t ♥❡❡❞s t♦ ❜❡ ✜①❡❞ ✭♥♦t ♠♦❞❡❧❡❞✮ ✐♥ r❡❛❧✐t②
✶✷ ✴ ✸✵
❱❛❧✐❞❛t✐♦♥✿ ◆♦♥✲tr✐✈✐❛❧ ❆♣♣❧✐❝❛t✐♦♥ ❙❝❛❧✐♥❣ ✭✶✮
❊①♣❡r✐♠❡♥ts r✉♥ ✇✐t❤ s❡✈❡r❛❧ ◆❆❙ ♣❛r❛❧❧❡❧ ❜❡♥❝❤♠❛r❦s t♦ ✭✐♥✮✈❛❧✐❞❛t❡ t❤❡♠♦❞❡❧ ❢♦r ❚❈P ♣❧❛t❢♦r♠
◆♦♥ tr✐✈✐❛❧ s❝❛❧✐♥❣
❱❡r② ❣♦♦❞ ❛❝❝✉r❛❝② ✭❡s♣❡❝✐❛❧❧② ❝♦♠♣❛r❡❞ t♦ ▲♦❣P✮
✉♥❧❡ss ❝♦♥t❡♥t✐♦♥ ❞r✐✈❡s ❚❈P ✐♥ ❛ ❝r❛③② st❛t❡✳ ✳ ✳
▼❛ss✐✈❡ s✇✐t❝❤ ♣❛❝❦❡t ❞r♦♣s ❧❡❛❞ t♦ ✷✵✵♠s t✐♠❡♦✉ts ✐♥ ❚❈P✦
❚❤✐s ✐s ❛ s♦❢t✇❛r❡ ✐ss✉❡ t❤❛t ♥❡❡❞s t♦ ❜❡ ✜①❡❞ ✭♥♦t ♠♦❞❡❧❡❞✮ ✐♥ r❡❛❧✐t②
✶✷ ✴ ✸✵
❱❛❧✐❞❛t✐♦♥✿ ◆♦♥✲tr✐✈✐❛❧ ❆♣♣❧✐❝❛t✐♦♥ ❙❝❛❧✐♥❣ ✭✶✮
❊①♣❡r✐♠❡♥ts r✉♥ ✇✐t❤ s❡✈❡r❛❧ ◆❆❙ ♣❛r❛❧❧❡❧ ❜❡♥❝❤♠❛r❦s t♦ ✭✐♥✮✈❛❧✐❞❛t❡ t❤❡♠♦❞❡❧ ❢♦r ❚❈P ♣❧❛t❢♦r♠
◆♦♥ tr✐✈✐❛❧ s❝❛❧✐♥❣
❱❡r② ❣♦♦❞ ❛❝❝✉r❛❝② ✭❡s♣❡❝✐❛❧❧② ❝♦♠♣❛r❡❞ t♦ ▲♦❣P✮
✉♥❧❡ss ❝♦♥t❡♥t✐♦♥ ❞r✐✈❡s ❚❈P ✐♥ ❛ ❝r❛③② st❛t❡✳ ✳ ✳
▼❛ss✐✈❡ s✇✐t❝❤ ♣❛❝❦❡t ❞r♦♣s ❧❡❛❞ t♦ ✷✵✵♠s t✐♠❡♦✉ts ✐♥ ❚❈P✦
❚❤✐s ✐s ❛ s♦❢t✇❛r❡ ✐ss✉❡ t❤❛t ♥❡❡❞s t♦ ❜❡ ✜①❡❞ ✭♥♦t ♠♦❞❡❧❡❞✮ ✐♥ r❡❛❧✐t②
✶✷ ✴ ✸✵
❱❛❧✐❞❛t✐♦♥✿ ◆♦♥✲tr✐✈✐❛❧ ❆♣♣❧✐❝❛t✐♦♥ ❙❝❛❧✐♥❣ ✭✶✮
❊①♣❡r✐♠❡♥ts r✉♥ ✇✐t❤ s❡✈❡r❛❧ ◆❆❙ ♣❛r❛❧❧❡❧ ❜❡♥❝❤♠❛r❦s t♦ ✭✐♥✮✈❛❧✐❞❛t❡ t❤❡♠♦❞❡❧ ❢♦r ❚❈P ♣❧❛t❢♦r♠
◆♦♥ tr✐✈✐❛❧ s❝❛❧✐♥❣
❱❡r② ❣♦♦❞ ❛❝❝✉r❛❝② ✭❡s♣❡❝✐❛❧❧② ❝♦♠♣❛r❡❞ t♦ ▲♦❣P✮
✉♥❧❡ss ❝♦♥t❡♥t✐♦♥ ❞r✐✈❡s ❚❈P ✐♥ ❛ ❝r❛③② st❛t❡✳ ✳ ✳
▼❛ss✐✈❡ s✇✐t❝❤ ♣❛❝❦❡t ❞r♦♣s ❧❡❛❞ t♦ ✷✵✵♠s t✐♠❡♦✉ts ✐♥ ❚❈P✦
❚❤✐s ✐s ❛ s♦❢t✇❛r❡ ✐ss✉❡ t❤❛t ♥❡❡❞s t♦ ❜❡ ✜①❡❞ ✭♥♦t ♠♦❞❡❧❡❞✮ ✐♥ r❡❛❧✐t②
✶✷ ✴ ✸✵
❱❛❧✐❞❛t✐♦♥✿ ◆♦♥✲tr✐✈✐❛❧ ❆♣♣❧✐❝❛t✐♦♥ ❙❝❛❧✐♥❣ ✭✶✮
❊①♣❡r✐♠❡♥ts r✉♥ ✇✐t❤ s❡✈❡r❛❧ ◆❆❙ ♣❛r❛❧❧❡❧ ❜❡♥❝❤♠❛r❦s t♦ ✭✐♥✮✈❛❧✐❞❛t❡ t❤❡♠♦❞❡❧ ❢♦r ❚❈P ♣❧❛t❢♦r♠
◆♦♥ tr✐✈✐❛❧ s❝❛❧✐♥❣
❱❡r② ❣♦♦❞ ❛❝❝✉r❛❝② ✭❡s♣❡❝✐❛❧❧② ❝♦♠♣❛r❡❞ t♦ ▲♦❣P✮
✉♥❧❡ss ❝♦♥t❡♥t✐♦♥ ❞r✐✈❡s ❚❈P ✐♥ ❛ ❝r❛③② st❛t❡✳ ✳ ✳B C
qqq
q
q
q
10
15
20
25
30
CG
6432 128168 6432 128168
Number of nodes
Sp
ee
du
p
Model
q Real
LogGPS
Network collapse
SMPI
▼❛ss✐✈❡ s✇✐t❝❤ ♣❛❝❦❡t ❞r♦♣s ❧❡❛❞ t♦ ✷✵✵♠s t✐♠❡♦✉ts ✐♥ ❚❈P✦
❚❤✐s ✐s ❛ s♦❢t✇❛r❡ ✐ss✉❡ t❤❛t ♥❡❡❞s t♦ ❜❡ ✜①❡❞ ✭♥♦t ♠♦❞❡❧❡❞✮ ✐♥ r❡❛❧✐t②
✶✷ ✴ ✸✵
❱❛❧✐❞❛t✐♦♥✿ ◆♦♥✲tr✐✈✐❛❧ ❆♣♣❧✐❝❛t✐♦♥ ❙❝❛❧✐♥❣ ✭✷✮
❊①♣❡r✐♠❡♥ts ❛❧s♦ r✉♥ ✉s✐♥❣ r❡❛❧ P❤②s✐❝s ❝♦❞❡ ✭❇✐❣❉❋❚✱ ❙P❊❈❋❊▼✸❉✮ ♦♥❚✐❜✐❞❛❜♦ ✭❆❘▼ ❝❧✉st❡r ♣r♦t♦t②♣❡✮
❚❤❡ s❡t ♦❢ ❝♦❧❧❡❝t✐✈❡ ♦♣❡r❛t✐♦♥s ♠❛② ❝♦♠♣❧❡t❡❧② ❝❤❛♥❣❡ ❞❡♣❡♥❞✐♥❣ ♦♥t❤❡ ✐♥st❛♥❝❡✱ ❤❡♥❝❡ t❤❡ ♥❡❡❞ t♦ ✉s❡ ♦♥❧✐♥❡ s✐♠✉❧❛t✐♦♥❱❡r② ❣♦♦❞ ❛❝❝✉r❛❝② ✭❡s♣❡❝✐❛❧❧② ❝♦♠♣❛r❡❞ t♦ ▲♦❣P✮
Tibidabo
q
q
q
q
q
50
100
Sm
all
8 16 32 64 128
Number of nodes
Sp
ee
du
p
Model
q Real
Hybrid
LogGPS
Fluid
✶✸ ✴ ✸✵
❖✉t❧✐♥❡
✶ ❈♦♥t❡①t
✷ ❙✐♠●r✐❞ ❛♥❞ ▼P■✿ ❙▼P■❙▼P■❙✐♠●r✐❞ ❛♥❞ ❆▼P■
✸ ❙✐♠●r✐❞ ❛♥❞ ❙t❛rP❯Pr✐♥❝✐♣❧❡❉❡♥s❡ ▲✐♥❡❛r ❆❧❣❡❜r❛ ❆♣♣❧✐❝❛t✐♦♥s❙♣❛rs❡ ▲✐♥❡❛r ❆❧❣❡❜r❛ ❆♣♣❧✐❝❛t✐♦♥s
✹ ❈♦♥❝❧✉s✐♦♥
❘❡♠❡♠❜❡r t❤❡ ❈♦♠♣❧❡① ❆♣♣❧✐❝❛t✐♦♥s ❄
▲❛r❣❡r ❛♥❞ ❧❛r❣❡r s❝❛❧❡ ❤②❜r✐❞ ♠❛❝❤✐♥❡s Different programming approaches(e.g., in linear algebra applications)
“Rigid, hand tuned”
SuperLU
Task-based and Dynamic
MUMPS❆♥❛❧②s✐s ❛♥❞ ❈♦♠♣❛r✐s♦♥ ♦❢ ❚✇♦ ❉✐str✐❜✉t❡❞ ▼❡♠♦r② ❙♣❛rs❡ ❙♦❧✈❡rs
❆♠❡st♦②✱ ❉✉✛✱ ▲✬❡①❝❡❧❧❡♥t✱ ▲✐✳ ❆❈▼ ❚r❛♥s✳ ♦♥ ▼❛t❤✳ ❙♦❢t✇❛r❡✱ ❱♦❧✳ ✷✼✱ ◆♦✳ ✹✱ ✷✵✵✶✳
Multitude of technologies
▼P■ ❖♣❡♥▼P ❈✐❧❦ ❈❯❉❆ ❖♣❡♥❈▲ ❈❤❛r♠✰✰
❑❆❆P■ ❙t❛rP❯ ◗❯❆❘❑ P❛r❙❊❈ ❖♠♣❙s ✳✳✳
♥❡❡❞ ❢♦r ❛ ♣♦rt❛❜❧❡ ❛♥❞ ❡✣❝✐❡♥t ✇❛② t♦ ❡①♣❧♦✐t s✉❝❤ ❛r❝❤✐t❡❝t✉r❡s
✶✹ ✴ ✸✵
❙t❛rP❯ ❛♥❞ ❙✐♠●r✐❞
❙t❛rP❯ ✭■♥r✐❛ ❇♦r❞❡❛✉①✮
Dynamic runtime for hybrid architectures (CPU, GPU, MPI)
Opportunistic scheduling of a task graph guided by resource performancemodels
Features both dense and sparse applications. FMM ongoing.
❙✐♠●r✐❞
Scalable Simulation framework for distributed systems
Sound fluid network models accounting for heterogeneity and contention
Modeling with threads rather than only trace replay ability to simulatedynamic applications
Portable, open source and easily extendable
❙t❛rP❯ ✇❛s ♣♦rt❡❞ ♦♥ t♦♣ ♦❢ ❙✐♠●r✐❞ ❜② ❙✳ ❚❤✐❜❛✉❧t ✐♥ ✶ ❞❛②✿Replace synchronization and thread creation by SimGrid’s onesVery crude platform model
❚❤❡ s❛♠❡ ❛♣♣r♦❛❝❤ s❤♦✉❧❞ ❜❡ ❛♣♣❧✐❝❛❜❧❡ t♦ ❛♥② t❛s❦✲❜❛s❡❞ r✉♥t✐♠❡✶✺ ✴ ✸✵
❊♥✈✐s✐♦♥❡❞ ❲♦r❦✢♦✇✿ ❙t❛rP❯✰❙✐♠●r✐❞
❙t❛rP❯
❙✐♠●r✐❞
❙✐♠✉❧❛t✐♦♥
◗✉✐❝❦❧② ❙✐♠✉❧❛t❡ ▼❛♥② ❚✐♠❡s
❙t❛rP❯
P❡r❢♦r♠❛♥❝❡ Pr♦✜❧❡
❈❛❧✐❜r❛t✐♦♥
❘✉♥ ♦♥❝❡✦✶✻ ✴ ✸✵
❊♥✈✐s✐♦♥❡❞ ❲♦r❦✢♦✇✿ ❙t❛rP❯✰❙✐♠●r✐❞
❙t❛rP❯
❙✐♠●r✐❞
❙✐♠✉❧❛t✐♦♥
◗✉✐❝❦❧② ❙✐♠✉❧❛t❡ ▼❛♥② ❚✐♠❡s
❙t❛rP❯
P❡r❢♦r♠❛♥❝❡ Pr♦✜❧❡
❈❛❧✐❜r❛t✐♦♥
❘✉♥ ♦♥❝❡✦✶✻ ✴ ✸✵
■♠♣❧❡♠❡♥t❛t✐♦♥ Pr✐♥❝✐♣❧❡s
❊♠✉❧❛t✐♦♥ ❡①❡❝✉t✐♥❣ r❡❛❧ ❛♣♣❧✐❝❛t✐♦♥s ✐♥ ❛ s②♥t❤❡t✐❝ ❡♥✈✐r♦♥♠❡♥t✱ ❣❡♥❡r❛❧❧②s❧♦✇✐♥❣ ❞♦✇♥ t❤❡ ✇❤♦❧❡ ❝♦❞❡
❙✐♠✉❧❛t✐♦♥ ✉s❡ ❛ ♣❡r❢♦r♠❛♥❝❡ ♠♦❞❡❧ t♦ ❞❡t❡r♠✐♥❡ ❤♦✇ ♠✉❝❤ t✐♠❡ ❛♣r♦❝❡ss s❤♦✉❧❞ ✇❛✐t
❙t❛rP❯ ❛♣♣❧✐❝❛t✐♦♥s ❛♥❞ r✉♥t✐♠❡ ❛r❡ ❡♠✉❧❛t❡❞ ✭r❡❛❧ s❝❤❡❞✉❧❡r ❛♥❞❞②♥❛♠✐❝ ❞❡❝✐s✐♦♥ ❣✉✐❞❡❞ ♦♥ ❙t❛rP❯ ❝❛❧✐❜r❛t✐♦♥✮
❆❧❧ ♦♣❡r❛t✐♦♥s r❡❧❛t❡❞ t♦ t❤r❡❛❞ s②♥❝❤r♦♥✐③❛t✐♦♥✱ ❛❝t✉❛❧ ❝♦♠♣✉t❛t✐♦♥s✱♠❡♠♦r② ❛❧❧♦❝❛t✐♦♥ ❛♥❞ ❞❛t❛ tr❛♥s❢❡r ❛r❡ s✐♠✉❧❛t❡❞ ✭♥❡❡❞ ❢♦r ❛ ❣♦♦❞❦❡r♥❡❧ ❛♥❞ ❝♦♠♠✉♥✐❝❛t✐♦♥ ♠♦❞❡❧✮ ❛♥❞ ❢❛❦❡❞
Actual computation results are irrelevant and have no impact on thecontrol flow. Only time mattersIn SimGrid, all threads run in mutual exclusion (polling)
❚❤❡ ❝♦♥tr♦❧ ♣❛rt ♦❢ ❙t❛rP❯ ✐s ♠♦❞✐✜❡❞ t♦ ❞②♥❛♠✐❝❛❧❧② ✐♥❥❡❝t❝♦♠♣✉t❛t✐♦♥ ❛♥❞ ❝♦♠♠✉♥✐❝❛t✐♦♥ t❛s❦s ✐♥t♦ t❤❡ s✐♠✉❧❛t♦r
✶✼ ✴ ✸✵
❉❡♥s❡ ▲✐♥❡❛r ❆❧❣❡❜r❛ ❆♣♣❧✐❝❛t✐♦♥s
❙t❛rt❡❞ ✇✐t❤ r❡❣✉❧❛r ❞❡♥s❡ ❦❡r♥❡❧s ❛♥❞ ❛ ✜①❡❞ t✐❧❡ s✐③❡❯s❡❞ t✇♦ ❞✐✛❡r❡♥t ♠❛tr✐① ❞❡❝♦♠♣♦s✐t✐♦♥ ❛❧❣♦r✐t❤♠s✿
✶ Cholesky✷ LU
❯s❡❞ ❛ ✇✐❞❡ ❞✐✈❡rs✐t② ♦❢ ♠❛❝❤✐♥❡s
◆❛♠❡ Pr♦❝❡ss♦r ★❈♦r❡s ▼❡♠♦r② ●P❯s
❤❛♥♥✐❜❛❧ ❳✺✺✺✵ ✷× ✹ ✷× ✷✹●❇ ✸×◗✉❛❞r♦❋❳✺✽✵✵❛tt✐❧❛ ❳✺✻✺✵ ✷× ✻ ✷× ✷✹●❇ ✸×❚❡s❧❛❈✷✵✺✵♠✐r❛❣❡ ❳✺✻✺✵ ✷× ✻ ✷× ✶✽●❇ ✸×❚❡s❧❛▼✷✵✼✵❝♦♥❛♥ ❊✺✲✷✻✺✵ ✷× ✽ ✷× ✸✷●❇ ✸×❚❡s❧❛▼✷✵✼✺❢r♦❣❦❡♣❧❡r ❊✺✲✷✻✼✵ ✷× ✽ ✷× ✶✻●❇ ✷×❑✷✵♣✐❧✐♣✐❧✐✷ ❊✺✲✷✻✸✵ ✷× ✻ ✷× ✸✷●❇ ✷×❑✹✵✐❞❣r❛❢ ❳✺✻✺✵ ✷× ✻ ✷× ✸✻●❇ ✽×❚❡s❧❛❈✷✵✺✵✐❞❝❤✐r❡ ❊✺✲✹✻✹✵ ✷✹× ✽ ✷✹× ✸✶●❇ ✴
Table: Machines used for the dense linear algebra experiments.
✶✽ ✴ ✸✵
❚❤❡ ♣❛t❤ t♦ r❡❧✐❛❜❧❡ ♣r❡❞✐❝t✐♦♥s
Conan Cholesky Attila LU
0
500
1000
1500
20K 40K 60K 80K 20K 40K 60K 80KMatrix dimension
GF
lop/
s
ExperimentalCondition
SimGrid (improved)
SimGrid (initial)
Native
Hannibal LU QuadroFX5800
0
250
500
750
20K 40K 60K 80KMatrix dimension
GF
lop/
s
ExperimentalCondition
SimGrid (smart)
SimGrid (improved)
SimGrid (initial)
Native
●❡tt✐♥❣ ❡①❝❡❧❧❡♥t r❡s✉❧ts ✭❡✳❣✳✱ ❈❤♦❧❡s❦② ♦♥ ❈♦♥❛♥✮ s♦♠❡t✐♠❡s ❞♦ ♥♦tr❡q✉✐r❡s ♠✉❝❤ ❡✛♦rts
❇✉t ♠♦❞❡❧✐♥❣ ❝♦♠♠✉♥✐❝❛t✐♦♥ ❤❡t❡r♦❣❡♥❡✐t②✱ ❝♦♥t❡♥t✐♦♥✱ ♠❡♠♦r②♦♣❡r❛t✐♦♥ ✭❛♥❞ ❡✈❡♥ s♦♠❡t✐♠❡s ❤❛r❞✇❛r❡✴❞r✐✈❡r ♣❡❝✉❧✐❛r✐t②✮ ✐s ❡ss❡♥t✐❛❧
❚r② t♦ ❜❡ ❛s ❡①❤❛✉st✐✈❡ ❛s ♣♦ss✐❜❧❡✳ ✳ ✳
✶✾ ✴ ✸✵
❖✈❡r✈✐❡✇ ♦❢ ❙✐♠✉❧❛t✐♦♥ ❆❝❝✉r❛❝②
hannibal: 3 QuadroFX5800 attila: 3 TeslaC2050 mirage: 3 TeslaM2070
0
1000
2000
3000
4000
0
1000
2000
3000
4000
Ch
ole
sky
LU
20K 40K 60K 80K 20K 40K 60K 80K 20K 40K 60K 80K
Matrix dimension
GF
lop
/s
ExperimentalCondition
SimGrid
Native
Checking predictive capability of the simulation
conan: 3 TeslaM2075 frogkepler: 2 K20 pilipili2: 2 K40
0
1000
2000
3000
4000
0
1000
2000
3000
4000
Ch
ole
sky
LU
20K 40K 60K 80K 20K 40K 60K 80K 20K 40K 60K 80K
Matrix dimension
GF
lop
/s
ExperimentalCondition
SimGrid
Native
✷✵ ✴ ✸✵
❇❡②♦♥❞ ❙✐♠♣❧❡ ●r❛♣❤s
❈♦♠♣❛r✐♥❣ ❉✐✛❡r❡♥t ❙❝❤❡❞✉❧❡rs
❈❤♦❧❡s❦② ♦♥ ❆tt✐❧❛
DMDA DMDAR DMDAS
0
500
1000
1500
20K 40K 60K 80K 20K 40K 60K 80K 20K 40K 60K 80K
Matrix dimension
GF
lop
/s
ExperimentalCondition
SimGrid
Native
■♥✈❡st✐❣❛t✐♥❣ ❉❡t❛✐❧s
✷✶ ✴ ✸✵
❖✉t❧✐♥❡
✶ ❈♦♥t❡①t
✷ ❙✐♠●r✐❞ ❛♥❞ ▼P■✿ ❙▼P■❙▼P■❙✐♠●r✐❞ ❛♥❞ ❆▼P■
✸ ❙✐♠●r✐❞ ❛♥❞ ❙t❛rP❯Pr✐♥❝✐♣❧❡❉❡♥s❡ ▲✐♥❡❛r ❆❧❣❡❜r❛ ❆♣♣❧✐❝❛t✐♦♥s❙♣❛rs❡ ▲✐♥❡❛r ❆❧❣❡❜r❛ ❆♣♣❧✐❝❛t✐♦♥s
✹ ❈♦♥❝❧✉s✐♦♥
❙✐♠✉❧❛t✐♥❣ ❙♣❛rs❡ ❙♦❧✈❡rs
qr♠❴st❛r♣✉
◗❘ ▼❯▼P❙ ♠✉❧t✐✲❢r♦♥t❛❧ ❢❛❝t♦r✐③❛t✐♦♥ ♦♥ t♦♣ ♦❢ ❙t❛rP❯
Tree parallelism: nodes inseparate branches can be treatedindependently
Node parallelism: large nodescan be treated by multipleprocess
Activate
Assemble
Panel
UpdateDeactivate
◆♦ ●P❯ s✉♣♣♦rt ✭♦♥❣♦✐♥❣✮ ✐♥ t❤✐s st✉❞②✱ ♦♥❧② ♠✉❧t✐✲❝♦r❡
P♦rt✐♥❣ qr♠❴st❛r♣✉ ♦♥ t♦♣ ♦❢ ❙✐♠●r✐❞
❈❤❛♥❣✐♥❣ ♠❛✐♥ ❢♦r t❤❡ s✉❜r♦✉t✐♥❡
❈❤❛♥❣✐♥❣ ❝♦♠♣✐❧❛t✐♦♥ ♣r♦❝❡ss
❈❛r❡❢✉❧ ❦❡r♥❡❧ ♠♦❞❡❧✐♥❣ ❛s ♠❛tr✐① ❞✐♠❡♥s✐♦♥ ❦❡❡♣s ❝❤❛♥❣✐♥❣
✷✷ ✴ ✸✵
❊①❛♠♣❧❡ ❢♦r ▼♦❞❡❧✐♥❣ ❑❡r♥❡❧s✿ ●❊◗❘❚
●❊◗❘❚✭P❛♥❡❧✮ ❞✉r❛t✐♦♥✿
T●❊◗❘❚ = a+ ✷b(◆❇✷×▼❇)− ✷c(◆❇✸
× ❇❑ ) +✹d
✸◆❇
✸
❲❡ ❝❛♥ ❞♦ ❛ ❧✐♥❡❛r r❡❣r❡ss✐♦♥ ❜❛s❡❞ ♦♥ ❛❞ ❤♦❝ ❝❛❧✐❜r❛t✐♦♥
●❊◗❘❚ ❉✉r❛t✐♦♥
NB✸ ✶.✺✵× ✶✵−✺ ✭✶.✸✵× ✶✵−✺✱ ✶.✼✵× ✶✵−✺✮ ∗∗∗
NB✷∗MB ✺.✹✾× ✶✵−✼ ✭✺.✹✻× ✶✵−✼✱ ✺.✺✶× ✶✵−✼✮ ∗∗∗
NB✸∗ BK −✺.✺✷× ✶✵−✼ ✭−✺.✺✼× ✶✵−✼✱ −✺.✹✽× ✶✵−✼✮ ∗∗∗
❈♦♥st❛♥t −✷.✹✾× ✶✵✶ ✭−✷.✽✸× ✶✵✶✱ −✷.✶✹× ✶✵✶✮ ∗∗∗
❖❜s❡r✈❛t✐♦♥s ✹✾✸R✷ ✵.✾✾✾
◆♦t❡✿ ∗♣<✵✳✶❀ ∗∗♣<✵✳✵✺❀ ∗∗∗♣<✵✳✵✶
✷✸ ✴ ✸✵
❈♦♠♣❛r✐♥❣ ❑❡r♥❡❧ ❉✉r❛t✐♦♥ ❉✐str✐❜✉t✐♦♥s
❉♦❴s✉❜tr❡❡ ■◆■❚ ●❊◗❘❚ ●❊▼◗❘❚ ❆❙▼
✶✳ ★❋❧♦♣s ★❩❡r♦s ◆❇ ◆❇ ★❈♦❡✛✷✳ ★◆♦❞❡s ★❆ss❡♠❜❧❡ ▼❇ ▼❇ ✴✸✳ ✴ ✴ ❇❑ ❇❑ ✴
R✷ ✵✳✾✾ ✵✳✾✾ ✵✳✾✾ ✵✳✾✾ ✵✳✽✻
Native, Do_subtree Native, INIT Native, GEQRT Native, GEMQRT Native, ASM Native, CLEAN
SimGrid, Do_subtree SimGrid, INIT SimGrid, GEQRT SimGrid, GEMQRT SimGrid, ASM SimGrid, CLEAN
0
2
4
6
0
5
10
15
20
0
50
100
150
200
0
2000
4000
0
50
100
150
200
250
0
10
20
0
2
4
6
0
5
10
15
0
100
200
0
2000
4000
6000
0
100
200
300
0
10
20
30
0 50 100 0 250 500 750 0 25 50 75 100 0 20 40 60 0 10 20 30 0 25 50 75
50 100 0 250 500 750 0 25 50 75 0 20 40 60 0 10 20 0 25 50 75Kernel Duration [ms]
Num
ber
of O
ccur
ance
s Kernel
Do_subtree
INIT
GEQRT
GEMQRT
ASM
CLEAN
✷✹ ✴ ✸✵
❖✈❡r✈✐❡✇ ♦❢ ❙✐♠✉❧❛t✐♦♥ ❆❝❝✉r❛❝②
0
10
20
30
40
50
0
200
400
tp−6
karte
dEt
erni
tyII_
Ede
gme
hirla
me1
8TF
16
Ruc
ci1 sls
TF17
Mak
espa
n [s
]
Type
Native
SimGrid
Fourmi machine with 8 cores
0
10
20
30
40
0
100
200
300
400
500
tp−6
karte
dEt
erni
tyII_
Ede
gme
hirla
me1
8TF
16
Ruc
ci1 sls
TF17
Mak
espa
n [s
]
Type
Native
SimGrid
Riri machine with 10 cores
❘❡s✉❧ts ✐♥ ❛ ♥✉ts❤❡❧❧
▼♦st ♦❢ t❤❡ t✐♠❡✱ s✐♠✉❧❛t✐♦♥ ✐ss❧✐❣❤t❧② ♦♣t✐♠✐st✐❝
❲✐t❤ ❜✐❣❣❡r ❛♥❞ ❛r❝❤✐t❡❝t✉r❛❧❧②♠♦r❡ ❝♦♠♣❧❡① ♠❛❝❤✐♥❡s✱ ❡rr♦r✐♥❝r❡❛s❡s
0
5
10
15
20
0
50
100
150
200
tp−6
karte
dEt
erni
tyII_
Ede
gme
hirla
me1
8TF
16
Ruc
ci1 sls
TF17
Mak
espa
n [s
]
Type
Native
SimGrid
Riri machine with 40 cores
✷✺ ✴ ✸✵
❱✐s✉❛❧ ❚r❛❝❡ ❈♦♠♣❛r✐s♦♥
❚r❛❝❡s ❡①tr❡♠❡❧② ✧❝❧♦s❡✧
❙✐♠✉❧❛t❡❞ ♠❛❦❡s♣❛♥ s❧✐❣❤t❧② ♦♣t✐♠✐st✐❝
❉✐✛❡r❡♥t s❝❤❡❞✉❧✐♥❣ ❛t t❤❡ ❜❡❣✐♥♥✐♥❣ ❛♥❞ ❛t t❤❡ ❡♥❞
✷✻ ✴ ✸✵
❙t✉❞②✐♥❣ ▼❡♠♦r② ❈♦♥s✉♠♣t✐♦♥
▼✐♥✐♠✐③✐♥❣ ♠❡♠♦r② ❢♦♦t♣r✐♥t ✐s ✈❡r② ✐♠♣♦rt❛♥t ❢♦r s✉❝❤ ❛♣♣❧✐❝❛t✐♦♥s❘❡♠❡♠❜❡r s❝❤❡❞✉❧✐♥❣ ✐s ❞②♥❛♠✐❝ s♦ ❝♦♥s❡❝✉t✐✈❡ ◆❛t✐✈❡ ❡①♣❡r✐♠❡♥ts❤❛✈❡ ❞✐✛❡r❡♥t ♦✉t♣✉t
Experiment number 1
Experiment number 2
Experiment number 3
Experiment number 4
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
0 10,000 20,000 30,000 40,000Time [ms]
Allo
cate
d M
emor
y [G
iB]
✷✼ ✴ ✸✵
❙t✉❞②✐♥❣ ▼❡♠♦r② ❈♦♥s✉♠♣t✐♦♥
▼✐♥✐♠✐③✐♥❣ ♠❡♠♦r② ❢♦♦t♣r✐♥t ✐s ✈❡r② ✐♠♣♦rt❛♥t ❢♦r s✉❝❤ ❛♣♣❧✐❝❛t✐♦♥s❘❡♠❡♠❜❡r s❝❤❡❞✉❧✐♥❣ ✐s ❞②♥❛♠✐❝ s♦ ❝♦♥s❡❝✉t✐✈❡ ◆❛t✐✈❡ ❡①♣❡r✐♠❡♥ts❤❛✈❡ ❞✐✛❡r❡♥t ♦✉t♣✉t
Native 1
SimGrid
Native 2
Native 3
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
0 10,000 20,000 30,000 40,000Time [ms]
Allo
cate
d M
emor
y [G
iB]
✷✼ ✴ ✸✵
❊①tr❛♣♦❧❛t✐♥❣ t♦ ▲❛r❣❡r ▼❛❝❤✐♥❡s
Pr❡❞✐❝t✐♥❣ ♣❡r❢♦r♠❛♥❝❡✐♥ ✐❞❡❛❧✐③❡❞ ❝♦♥t❡①t
❙t✉❞②✐♥❣ t❤❡♣❛r❛❧❧❡❧✐③❛t✐♦♥ ❧✐♠✐ts ♦❢t❤❡ ♣r♦❜❧❡♠
Extrapolation
0
30
60
90
4 10 20 40 100 400Number of Threads
Dur
atio
n [s
]
Type
Native
SimGrid
Measured Time
Overall Makespan
Idle Time per Thread
❘❡❣❛r❞✐♥❣ s✐♠✉❧❛t✐♦♥ s♣❡❡❞✿
◗✉✐❝❦❧② ❛♥❞ ❛❝❝✉r❛t❡❧② ❡✈❛❧✉❛t❡ t❤❡ ✐♠♣❛❝t ♦❢ ✈❛r✐♦✉ss❝❤❡❞✉❧✐♥❣✴❛♣♣❧✐❝❛t✐♦♥ ♣❛r❛♠❡t❡rs✿
qr♠❴st❛r♣✉ ❈♦r❡s ❘❆▼ ❊✈❛❧✉❛t✐♦♥ ▼❛❦❡s♣❛♥
◆❛t✐✈❡ ✹✵ ✺✽✳✵●✐❇ ✶✺✼s ✶✹✶s❙✐♠●r✐❞ ✶ ✶✳✺●✐❇ ✺✼s ✶✹✷s
✷✽ ✴ ✸✵
❈♦♥❝❧✉s✐♦♥
❲❡ ❤❛✈❡ ♥♦✇ ❛❝❝✉r❛t❡ ❜❛s❡❧✐♥❡s t♦ ❝♦♠♣❛r❡ ✇✐t❤ ✇❤❡♥❡✈❡r t❤❡r❡ ✐s❛ ♠✐s♠❛t❝❤✱ ✇❡ ❝❛♥ q✉❡st✐♦♥ s✐♠✉❧❛t✐♦♥ ❛s ✇❡❧❧ ❛s ❡①♣❡r✐♠❡♥t❛❧ s❡t✉♣✿
TCP RTO issueInaccurate platform specifications
Flawed MPI optimization
❍♦♣❡ ✐t ✇✐❧❧ ❜❡ ✉s❡❢✉❧ t♦the Mont-Blanc projectyou?. . .
BigDFT or Ondes3D developers
◆❡❡❞ t♦ ✈❛❧✐❞❛t❡ t❤✐s ❛♣♣r♦❛❝❤ ♦♥ ❧❛r❣❡r ♣❧❛t❢♦r♠s✱ ✇✐t❤ ♦t❤❡r ♥❡t✇♦r❦t②♣❡s ❛♥❞ t♦♣♦❧♦❣✐❡s ✭❡✳❣✳✱ ■♥✜♥✐❜❛♥❞✱ t♦r✉s✮❈♦♠♠✉♥✐❝❛t✐♦♥ t❤r♦✉❣❤ s❤❛r❡❞ ♠❡♠♦r② ✐s ♦❦✱ ❜✉t ♠♦❞❡❧✐♥❣ t❤❡✐♥t❡r❢❡r❡♥❝❡ ❜❡t✇❡❡♥ ♠❡♠♦r②✲❜♦✉♥❞ ❦❡r♥❡❧s ✐s r❡❛❧❧② ❤❛r❞❙▼P■ ❛♥❞ ❙t❛rP❯✴❙✐♠●r✐❞ ❛r❡ ♦♣❡♥ s♦✉r❝❡✴s❝✐❡♥❝❡✿ ✇❡ ♣✉t ❛ ❧♦t ♦❢❡✛♦rt ✐♥t♦ ♠❛❦✐♥❣ ✐t ✉s❛❜❧❡
❤tt♣✿✴✴s✐♠❣r✐❞✳❣❢♦r❣❡✳✐♥r✐❛✳❢r
✷✾ ✴ ✸✵
❖♥❣♦✐♥❣ ❲♦r❦
⊠ ❙❡❛♠❧❡ss ❡♠✉❧❛t✐♦♥ ✭♠♠❛♣ ❛♣♣r♦❛❝❤ ✇♦r❦s ❣r❡❛t✮
⊟ ▼♦❞❡❧✐♥❣ ■❇ ♥❡t✇♦r❦s✱ t♦r✉s✴❢❛t tr❡❡ t♦♣♦❧♦❣✐❡s
⊟ ▼♦❞❡❧✐♥❣ ❡♥❡r❣② ✭✇✐t❤ ❆✳❈✳ ❖r❣❡r✐❡✮
⊠ ❘✉♥t✐♠❡s ❢♦r ❤②❜r✐❞ ✭❈P❯✰●P❯✮ ♣❧❛t❢♦r♠s ✭❙t❛rP❯✱ ✇✐t❤ ❙✳❚❤✐❜❛✉❧t✮ ♦♥ t♦♣ ♦❢ ❙✐♠●r✐❞
Works great for both dense (MAGMA/MORSE) and sparse(QR-MUMPS) linear algebra. Ongoing work for FMM.Large scale hybrid platforms (StarPU-MPI)
⊟ ❋♦r♠❛❧ ✈❡r✐✜❝❛t✐♦♥ ❜② ❡①❤❛✉st✐✈❡ st❛t❡ s♣❛❝❡ ❡①♣❧♦r❛t✐♦♥
⊟ ❈♦♠♣✉t❛t✐♦♥✴❝♦♠♠✉♥✐❝❛t✐♦♥✴♠❡♠♦r② ❛❝❝❡ss ✐♥t❡r❢❡r❡♥❝❡s
� ❖♣❡♥▼P✴♣t❤r❡❛❞s
⊟ ■❖s
✸✵ ✴ ✸✵