dilemma of parallel programming

23
Dilemma of Parallel Programming Xinhua Lin ( 林林林 ) HPC Lab of SJTU @XJTU, 17 th Oct 2011

Upload: tracen

Post on 24-Feb-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Dilemma of Parallel Programming. Xinhua Lin ( 林新华 ) HPC Lab of SJTU @XJTU, 17 th Oct 2011 . Disclaimers. I am not funded by CRAY S lides marked with Chapel logo are taken from Brad Chamberlain’s talk ‘ The Mother of All Chapel Talks ’, with permission from himself - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dilemma of Parallel Programming

Dilemma of Parallel Programming

Xinhua Lin (林新华 ) HPC Lab of SJTU

@XJTU, 17th Oct 2011

Page 2: Dilemma of Parallel Programming

Disclaimers

• I am not funded by CRAY

• Slides marked with Chapel logo are taken from Brad Chamberlain’s talk ‘The Mother of All Chapel Talks’, with permission from himself

• Funny pictures are from Internet

Page 3: Dilemma of Parallel Programming

About me and HPC Lab in SJTU

• Directing HPC Lab• Co-translator of PPP• Co-founder of HMPP CoC for AP&Japan

• As MS HPC Invitation institutes @SH• Support For HPC Center of SJTU• Hold SJTU HPC Seminar monthly

http://itis.grid.sjtu.edu.cn/blog

Page 4: Dilemma of Parallel Programming

Three Challenges for ParaProg in multi/many core era

• Revolution V.S. Evolution

• Low level V.S. High level– Performance V.S. Programmable

• Performance V.S. Performance Portability

For more detail:Paper Version: <中国教育网络 > Special issue for HPC and Cloud, Sep 2011Online Version: http://itis.grid.sjtu.edu.cn/blog

Page 5: Dilemma of Parallel Programming

Outline

• Right Level to expose Parallel

• ParaProg languages Reviews

• Multiresolution and Chapel

Page 6: Dilemma of Parallel Programming

Right Level to Expose Parallel

Page 7: Dilemma of Parallel Programming

Can we stop water/parallel ?

Hardware

ISA

OS

Library

Language

Page 8: Dilemma of Parallel Programming

Performance V.S. Programmable

Target Machine

MPI

OpenMP

pthreads

ExposeImplementingMechanisms

“Why is everything so tedious?”

Target MachineTarget Machine

ZPL

HPF

Higher-Level Abstractions

“Why don’t I have more control?”

Low Level High Level

Page 9: Dilemma of Parallel Programming

ParaProg Education • Tired of teaching yet another specific lang.

– MPI for Cluster – OpenMP for SMP then Multi-core CPU– CUDA for GPU, and now OpenCL – More on the way…

• Had to explain concepts by different tools– Single lang. to explain them all?

• Similar in OS education– Production OS: Linux, Unix and Window– OS only for education: Minix

Page 10: Dilemma of Parallel Programming

ParaProg languages Reviews

Page 11: Dilemma of Parallel Programming

Hybrid Programming Model• MPI is insufficient in multi/many core era

– OpenMP for multi-core– CUDA/OpenCL for many-core*

• So called Hybrid Programming was invented as a temporary solution, workable but ugly– MPI+OpenMP for Multi-core cluster– MPI+CUDA/OpenCL for GPU cluster like Tianhe-1A

• Similar idea used in CUDA for thread and thread-block, OpenCL for work-item and work-group* We will wait and see how OpenMP works on Intel MIC

Page 12: Dilemma of Parallel Programming

ParaProg from different ways

• Low Level (expose implementation mechanism )– MPI, CUDA and OpenCL– OpenMP

• High Level– PGAS: CAF, UPC and Tianuim – Global View: NESL, ZPL– APGAS: Chapel, X10

• Directive Based – HMPP, PGI, CRAY-directive

Page 13: Dilemma of Parallel Programming

Mulutiesolution and Chapel

Page 14: Dilemma of Parallel Programming

What is Mulutiesolution?Structure the language in a layered manner, permitting it to be

used at multiple levels as required/desired– support high-level features and automation for convenience– provide the ability to drop down to lower, more manual levels– use appropriate separation of concerns to keep these layers clean

DistributionsData parallelismTask ParallelismLocality Control

Target Machine

Base Language

language concepts

Page 15: Dilemma of Parallel Programming

Where Chapel was born: HPCSHPCS: High Productivity Computing Systems (DARPA et al.)

– Goal: Raise productivity of high-end computing users by 10– Productivity = Performance + Programmability + Portability + Robustness

• Phase II: Cray, IBM, Sun (July 2003 – June 2006)– Evaluated the entire system architecture’s impact on productivity…

• processors, memory, network, I/O, OS, runtime, compilers, tools, …• …and new languages:

Cray: Chapel IBM: X10 Sun: Fortress

• Phase III: Cray, IBM (July 2006 – 2010)– Implement the systems and technologies resulting from phase II– (Sun also continues work on Fortress, without HPCS funding)

Page 16: Dilemma of Parallel Programming

Global-view V.S. FragmentedProblem: “Apply 3-pt stencil to vector”global-view

=

+

(

)/2

fragmented

=

+

=

+

=

)/2 + )/2)/2

( ( (

Page 17: Dilemma of Parallel Programming

Global-view V.S. SPMD Code

Global-Viewdef main() { var n: int = 1000; var a, b: [1..n] real;

forall i in 2..n-1 { b(i) = (a(i-1) + a(i+1))/2; }}

SPMDdef main() { var n: int = 1000; var locN: int = n/numProcs; var a, b: [0..locN+1] real;

if (iHaveRightNeighbor) { send(right, a(locN)); recv(right, a(locN+1)); } if (iHaveLeftNeighbor) { send(left, a(1)); recv(left, a(0)); } forall i in 1..locN { b(i) = (a(i-1) + a(i+1))/2; }}

Page 18: Dilemma of Parallel Programming

Chapel Overview• A design principle for HPC

– “Support the general case, optimize for the common case”

• Data Parallel (ZPL) + Task Parallel(CRAY MTA) + Script Lang.

• Latest version 1.3.0 is available in as OSS:• http://sourceforge.net/projects/chapel

DistributionsData parallelismTask ParallelismLocality Control

Target Machine

Base Language

language concepts

Page 19: Dilemma of Parallel Programming

Chapel example: Heat TransferA:

1.0

n

n

4

repeat until max change <

Page 20: Dilemma of Parallel Programming

Chapel Code For Heat Transfer

Page 21: Dilemma of Parallel Programming

Chapel as Minix in ParaProg

• If I were to offer a ParaProg class, I’d want to teach about:– data parallelism– task parallelism– concurrency– synchronization– locality/affinity– deadlock, livelock, and other pitfalls– performance tuning– …

Page 22: Dilemma of Parallel Programming

Conclusion—Major Points

• Programmable and Performance are always the dilemma of ParaProg

• Multiresolution sounds perfect in theory but not mature enough for production

• However, Chapel could be used as Minix in ParaProg

Page 23: Dilemma of Parallel Programming

Q&A