cee 618 scientific parallel computing (lecture 4): message … · 2013. 2. 1. · cee 618...
Post on 28-Sep-2020
2 Views
Preview:
TRANSCRIPT
CEE 618 Scientific Parallel Computing (Lecture 4):Message-Passing Interface (MPI)
Albert S. Kim
Department of Civil and Environmental EngineeringUniversity of Hawai‘i at Manoa
2540 Dole Street, Holmes 383, Honolulu, Hawaii 96822
1 / 48
Table of Contents
1 Cluster progress
2 Introduction to MPI (Message-Passing Interface)
3 Calculation of π using MPIBasicsWall TimeBroadcastBarrierData TypesReduceOperation TypesResources
2 / 48
Cluster progress
Outline
1 Cluster progress
2 Introduction to MPI (Message-Passing Interface)
3 Calculation of π using MPIBasicsWall TimeBroadcastBarrierData TypesReduceOperation TypesResources
3 / 48
Cluster progress
My first home-made cluster, UCLA 1997
1 CPU: Pentium II 450MHz2 Memory: 128MB/PC3 Network card: Netgear FX310, 100/10MBPS Ethernet card4 Switch: Netgear 8 port, 100/10MBPS Ethernet switch5 KVM sharing device: Belkin Omni Cube 4 port
4 / 48
Cluster progress
UH 2001, the second home-made cluster
1 Composed of 16 PCs sharing ONE keyboard, monitor, and mouse2 Red Hat Linux 7.2 installed (free)3 Connected to a private network by a data switch (192.168.0.0 –
192.168.255.255)4 More than 30 times faster than Pentium 1.0 GHz system
5 / 48
Cluster progress
UH 2007, the third from Dell
1 Linux Cluster from Dell Inc. under support from NSF.2 Initially 16 nodes, 2 Intel(R) Xeon(TM) CPU 2.80GHz, and 2 GB
memory per core3 Queuing system: Platform Lava Ñ LSF, Platform Computing Inc.4 Programming Language: Intel FORTRAN 77/90 and Intel C/C++5 Libraries: BLAS, ATLAS, GotoBLAS, BLACS, LAPACK,
ScaLAPACK, OPENMPI-1.2.8 (http://www.open-mpi.org/).6 / 48
Cluster progress
1 GNU-4.1.2 & Intel-11.1 compilers, OpenMPI-1.4.1, PBSPro-10.22 Host name: jaws.mhpcc.hawaii.edu3 IP addresses: 132.160.97.245 & 132.160.97.246 7 / 48
Cluster progress
UH 2013, the system updated
1 The second rack was added.2 Additional 3 nodes, 8 Intel(R) Xeon(R) CPU E5345 2.33GHz per
node, and 2 GB memory per core3 Currently total 56 cores with 2GB memory each.4 Queuing system: PBS (Portrable Batch System), torque5 Programming Language: Intel FORTRAN and Intel C/C++
(version 13.1.0)6 Libraries: OPENMPI-1.6.1
8 / 48
Introduction to MPI (Message-Passing Interface)
Outline
1 Cluster progress
2 Introduction to MPI (Message-Passing Interface)
3 Calculation of π using MPIBasicsWall TimeBroadcastBarrierData TypesReduceOperation TypesResources
9 / 48
Introduction to MPI (Message-Passing Interface)
What is MPI?
MESSAGE-PASSING INTERFACE
1 A program library, NOT a language.
2 Called from FORTRAN 77/90, C/C++, and Python (and Java).3 Most widely used parallel library, but NOT a revolutionary way for
parallel computation.4 A collection of the best features of (many) existing
message-passing systems.
10 / 48
Introduction to MPI (Message-Passing Interface)
What is MPI?
MESSAGE-PASSING INTERFACE
1 A program library, NOT a language.2 Called from FORTRAN 77/90, C/C++, and Python (and Java).3 Most widely used parallel library, but NOT a revolutionary way for
parallel computation.4 A collection of the best features of (many) existing
message-passing systems.
10 / 48
Introduction to MPI (Message-Passing Interface)
How MPI works?
Figure: Distributed memory system using 4 nodes.
Suppose we have a cluster composed of four computers:alpha, beta, gamma, and delta. (Each computer has one core.)Usually, the first computer (alpha) is a master machine (and fileserver). (E.g., fractal)You have a MPI code, mympi.f, in your working directory of alpha.
1 Compile the code1: mpif90\mympi.f90\-o\mympi.x ê2 Run it using 4 nodes2: mpirun\-np \4\mympi.x ê
1“mpif90” are generated when MPI is installed with specific compilers.2We don’t execute mpirun directly. In practice, we will use Makefile (and
later ‘qsub’ command).11 / 48
Introduction to MPI (Message-Passing Interface)
How MPI works?
Figure: Distributed memory system using 4 nodes.
Suppose we have a cluster composed of four computers:alpha, beta, gamma, and delta. (Each computer has one core.)Usually, the first computer (alpha) is a master machine (and fileserver). (E.g., fractal)
You have a MPI code, mympi.f, in your working directory of alpha.1 Compile the code1: mpif90\mympi.f90\-o\mympi.x ê2 Run it using 4 nodes2: mpirun\-np \4\mympi.x ê
1“mpif90” are generated when MPI is installed with specific compilers.2We don’t execute mpirun directly. In practice, we will use Makefile (and
later ‘qsub’ command).11 / 48
Introduction to MPI (Message-Passing Interface)
How MPI works?
Figure: Distributed memory system using 4 nodes.
Suppose we have a cluster composed of four computers:alpha, beta, gamma, and delta. (Each computer has one core.)Usually, the first computer (alpha) is a master machine (and fileserver). (E.g., fractal)You have a MPI code, mympi.f, in your working directory of alpha.
1 Compile the code1: mpif90\mympi.f90\-o\mympi.x ê2 Run it using 4 nodes2: mpirun\-np \4\mympi.x ê
1“mpif90” are generated when MPI is installed with specific compilers.2We don’t execute mpirun directly. In practice, we will use Makefile (and
later ‘qsub’ command).11 / 48
Introduction to MPI (Message-Passing Interface)
TCP/IP communication through ssh
MPI uses a default machine file that contains (for example)172.20.0.100 ÝÑ compute-01-00’s IP (alpha)172.20.0.101 ÝÑ compute-01-01’s IP (beta)172.20.0.102 ÝÑ compute-01-02’s IP (gamma)172.20.0.103 ÝÑ compute-01-03’s IP (delta)
Basically, when a MPI job is submitted using mpirun, thismachine file is read, and node numbers (ranks) are automaticallyassigned in a sequence (not always ordered).
172.20.0.100 ÝÑ 0172.20.0.101 ÝÑ 1172.20.0.102 ÝÑ 2172.20.0.103 ÝÑ 3
In most cases including ours, a queueing system (PBS, LSF orothers) takes care of job allocation to optimize computationalresources.Inter-node communication was through rsh (remote shell) in thepast, but is now ssh (secure shell) (rsh+data encription).
12 / 48
Introduction to MPI (Message-Passing Interface)
TCP/IP communication through ssh
MPI uses a default machine file that contains (for example)172.20.0.100 ÝÑ compute-01-00’s IP (alpha)172.20.0.101 ÝÑ compute-01-01’s IP (beta)172.20.0.102 ÝÑ compute-01-02’s IP (gamma)172.20.0.103 ÝÑ compute-01-03’s IP (delta)
Basically, when a MPI job is submitted using mpirun, thismachine file is read, and node numbers (ranks) are automaticallyassigned in a sequence (not always ordered).
172.20.0.100 ÝÑ 0172.20.0.101 ÝÑ 1172.20.0.102 ÝÑ 2172.20.0.103 ÝÑ 3
In most cases including ours, a queueing system (PBS, LSF orothers) takes care of job allocation to optimize computationalresources.Inter-node communication was through rsh (remote shell) in thepast, but is now ssh (secure shell) (rsh+data encription).
12 / 48
Introduction to MPI (Message-Passing Interface)
TCP/IP communication through ssh
MPI uses a default machine file that contains (for example)172.20.0.100 ÝÑ compute-01-00’s IP (alpha)172.20.0.101 ÝÑ compute-01-01’s IP (beta)172.20.0.102 ÝÑ compute-01-02’s IP (gamma)172.20.0.103 ÝÑ compute-01-03’s IP (delta)
Basically, when a MPI job is submitted using mpirun, thismachine file is read, and node numbers (ranks) are automaticallyassigned in a sequence (not always ordered).
172.20.0.100 ÝÑ 0172.20.0.101 ÝÑ 1172.20.0.102 ÝÑ 2172.20.0.103 ÝÑ 3
In most cases including ours, a queueing system (PBS, LSF orothers) takes care of job allocation to optimize computationalresources.Inter-node communication was through rsh (remote shell) in thepast, but is now ssh (secure shell) (rsh+data encription).
12 / 48
Introduction to MPI (Message-Passing Interface)
Basic Structure of MPI programs
1 program mympi2 i m p l i c i t none3 include ’ mpi f . h ’4 integer : : numprocs , rank , i e r r , rc , RESULTLEN5 character ( len =20) : : PNAME67 c a l l MPI_INIT ( i e r r )8 i f ( i e r r /= MPI_SUCCESS) then9 pr in t ∗ , ’ E r ro r s t a r t i n g MPI program . Terminat ing . ’
10 c a l l MPI_ABORT(MPI_COMM_WORLD, rc , i e r r )11 end i f1213 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank , i e r r )14 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs , i e r r )15 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )1617 write (∗ , " ( ’No . o f procs = ’ ,1X, i4 ,1 x , ’ , My rank = ’ ,1X, i4 , ’ , pname = ’ , A20 ) " )&18 numprocs , rank , pname1920 c a l l MPI_FINALIZE ( i e r r )21 end
How many “MPI” calls?
5
13 / 48
Introduction to MPI (Message-Passing Interface)
Basic Structure of MPI programs
1 program mympi2 i m p l i c i t none3 include ’ mpi f . h ’4 integer : : numprocs , rank , i e r r , rc , RESULTLEN5 character ( len =20) : : PNAME67 c a l l MPI_INIT ( i e r r )8 i f ( i e r r /= MPI_SUCCESS) then9 pr in t ∗ , ’ E r ro r s t a r t i n g MPI program . Terminat ing . ’
10 c a l l MPI_ABORT(MPI_COMM_WORLD, rc , i e r r )11 end i f1213 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank , i e r r )14 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs , i e r r )15 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )1617 write (∗ , " ( ’No . o f procs = ’ ,1X, i4 ,1 x , ’ , My rank = ’ ,1X, i4 , ’ , pname = ’ , A20 ) " )&18 numprocs , rank , pname1920 c a l l MPI_FINALIZE ( i e r r )21 end
How many “MPI” calls? 5
13 / 48
Introduction to MPI (Message-Passing Interface)
Makefile
1 mpif90 =/ opt / openmpi�1.6.1� i n t e l / b in / mpif90 �l i m f2 mpirun =/ opt / openmpi�1.6.1� i n t e l / b in / mpirun3 mpiopt=�mca b t l tcp , s e l f ��mca b t l _ t c p _ i f _ i n c l u d e eth045 s r c r o o t =mympi6 s r c f i l e =$ ( s r c r o o t ) . f907 e x e f i l e =$ ( s r c r o o t ) . x8 numprocs=69
10 a l l :11 $ ( mpif90 ) $ ( s r c f i l e ) �o $ ( e x e f i l e )1213 srun :14 $ ( mpirun ) �mca b t l tcp , s e l f �np 1 . / $ ( e x e f i l e )1516 prun :17 $ ( mpirun ) $ ( mpiopt ) ��h o s t f i l e . / mpihosts �np $ ( numprocs ) . / $ ( e x e f i l e )1819 e d i t :20 vim $ ( s r c f i l e )
14 / 48
Introduction to MPI (Message-Passing Interface)
PBS
We will discuss how to use open-mpi with pbs later.
15 / 48
Introduction to MPI (Message-Passing Interface)
Job output file
1 No . o f procs = 6 , My rank = 0 , pname = compute�1�012 No . o f procs = 6 , My rank = 2 , pname = compute�1�023 No . o f procs = 6 , My rank = 4 , pname = compute�1�034 No . o f procs = 6 , My rank = 5 , pname = compute�1�035 No . o f procs = 6 , My rank = 3 , pname = compute�1�026 No . o f procs = 6 , My rank = 1 , pname = compute�1�01
Note that ranks are not well ordered.
16 / 48
Introduction to MPI (Message-Passing Interface)
1. call MPI_INIT (ierr)
1 c a l l MPI_INIT ( i e r r )2 i f ( i e r r /= MPI_SUCCESS) then3 pr in t ∗ , ’ E r ro r s t a r t i n g MPI program . Terminat ing . ’4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc , i e r r )5 end i f67 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank , i e r r )8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs , i e r r )9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )
Initializes the MPI execution environment.This function must be called in every MPI program, must be calledbefore any other MPI functions, and must be called only once inan MPI program.ierr = error code, which must be either MPI_SUCCESS (=0) or animplementation-defined error code.Where is MPI_SUCCESS (=0) set? [Ans.] mpi.h
17 / 48
Introduction to MPI (Message-Passing Interface)
1. call MPI_INIT (ierr)
1 c a l l MPI_INIT ( i e r r )2 i f ( i e r r /= MPI_SUCCESS) then3 pr in t ∗ , ’ E r ro r s t a r t i n g MPI program . Terminat ing . ’4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc , i e r r )5 end i f67 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank , i e r r )8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs , i e r r )9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )
Initializes the MPI execution environment.This function must be called in every MPI program, must be calledbefore any other MPI functions, and must be called only once inan MPI program.ierr = error code, which must be either MPI_SUCCESS (=0) or animplementation-defined error code.Where is MPI_SUCCESS (=0) set? [Ans.]
mpi.h
17 / 48
Introduction to MPI (Message-Passing Interface)
1. call MPI_INIT (ierr)
1 c a l l MPI_INIT ( i e r r )2 i f ( i e r r /= MPI_SUCCESS) then3 pr in t ∗ , ’ E r ro r s t a r t i n g MPI program . Terminat ing . ’4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc , i e r r )5 end i f67 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank , i e r r )8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs , i e r r )9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )
Initializes the MPI execution environment.This function must be called in every MPI program, must be calledbefore any other MPI functions, and must be called only once inan MPI program.ierr = error code, which must be either MPI_SUCCESS (=0) or animplementation-defined error code.Where is MPI_SUCCESS (=0) set? [Ans.] mpi.h
17 / 48
Introduction to MPI (Message-Passing Interface)
2. call MPI_ABORT (MPI_COMM_WORLD, rc ,ierr)
1 c a l l MPI_INIT ( i e r r )2 i f ( i e r r /= MPI_SUCCESS) then3 pr in t ∗ , ’ E r ro r s t a r t i n g MPI program . Terminat ing . ’4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc , i e r r )5 end i f67 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank , i e r r )8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs , i e r r )9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )
Terminates (aborts) all MPI processes associated with thecommunicator.In most MPI implementations it terminates ALL processesregardless of the communicator specified.MPI_COMM_WORLD = default communicatordefines one context and the set of all processes. It is one of itemsdefined in ‘mpi.h’.rc = error code
18 / 48
Introduction to MPI (Message-Passing Interface)
2. call MPI_ABORT (MPI_COMM_WORLD, rc ,ierr)
1 c a l l MPI_INIT ( i e r r )2 i f ( i e r r /= MPI_SUCCESS) then3 pr in t ∗ , ’ E r ro r s t a r t i n g MPI program . Terminat ing . ’4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc , i e r r )5 end i f67 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank , i e r r )8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs , i e r r )9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )
Terminates (aborts) all MPI processes associated with thecommunicator.In most MPI implementations it terminates ALL processesregardless of the communicator specified.MPI_COMM_WORLD = default communicatordefines one context and the set of all processes. It is one of itemsdefined in ‘mpi.h’.rc = error code
18 / 48
Introduction to MPI (Message-Passing Interface)
3. call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
1 c a l l MPI_INIT ( i e r r )2 i f ( i e r r /= MPI_SUCCESS) then3 pr in t ∗ , ’ E r ro r s t a r t i n g MPI program . Terminat ing . ’4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc , i e r r )5 end i f67 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank , i e r r )8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs , i e r r )9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )
Determines the rank of the calling process within thecommunicator. whoami?Initially, each process will be assigned a unique integer rankbetween 0 and number of processors - 1 (i.e., numprocs-1),within the communicator MPI_COMM_WORLD.This rank is often referred to as a task ID.If 8 processors are used, then ranks are (?)
0, 1, 2, 3, 4, 5, 6, and 7.
19 / 48
Introduction to MPI (Message-Passing Interface)
3. call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
1 c a l l MPI_INIT ( i e r r )2 i f ( i e r r /= MPI_SUCCESS) then3 pr in t ∗ , ’ E r ro r s t a r t i n g MPI program . Terminat ing . ’4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc , i e r r )5 end i f67 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank , i e r r )8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs , i e r r )9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )
Determines the rank of the calling process within thecommunicator. whoami?Initially, each process will be assigned a unique integer rankbetween 0 and number of processors - 1 (i.e., numprocs-1),within the communicator MPI_COMM_WORLD.This rank is often referred to as a task ID.If 8 processors are used, then ranks are (?)0, 1, 2, 3, 4, 5, 6, and 7.
19 / 48
Introduction to MPI (Message-Passing Interface)
4. call MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
1 c a l l MPI_INIT ( i e r r )2 i f ( i e r r /= MPI_SUCCESS) then3 pr in t ∗ , ’ E r ro r s t a r t i n g MPI program . Terminat ing . ’4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc , i e r r )5 end i f67 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank , i e r r )8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs , i e r r )9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )
Determines (or obtains) the number of processes in the groupassociated with a communicator.Generally used within the communicator MPI_COMM_WORLD todetermine the number of processes (numprocs) being used byyour own application.It matches with the number after “-np” in» mpirun\-np \4\mympi.x êwhere “4” is read as “numprocs” by MPI_COMM_SIZE.
20 / 48
Introduction to MPI (Message-Passing Interface)
4. call MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
1 c a l l MPI_INIT ( i e r r )2 i f ( i e r r /= MPI_SUCCESS) then3 pr in t ∗ , ’ E r ro r s t a r t i n g MPI program . Terminat ing . ’4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc , i e r r )5 end i f67 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank , i e r r )8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs , i e r r )9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )
Determines (or obtains) the number of processes in the groupassociated with a communicator.Generally used within the communicator MPI_COMM_WORLD todetermine the number of processes (numprocs) being used byyour own application.
It matches with the number after “-np” in» mpirun\-np \4\mympi.x êwhere “4” is read as “numprocs” by MPI_COMM_SIZE.
20 / 48
Introduction to MPI (Message-Passing Interface)
4. call MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
1 c a l l MPI_INIT ( i e r r )2 i f ( i e r r /= MPI_SUCCESS) then3 pr in t ∗ , ’ E r ro r s t a r t i n g MPI program . Terminat ing . ’4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc , i e r r )5 end i f67 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank , i e r r )8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs , i e r r )9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )
Determines (or obtains) the number of processes in the groupassociated with a communicator.Generally used within the communicator MPI_COMM_WORLD todetermine the number of processes (numprocs) being used byyour own application.It matches with the number after “-np” in» mpirun\-np \4\mympi.x êwhere “4” is read as “numprocs” by MPI_COMM_SIZE.
20 / 48
Introduction to MPI (Message-Passing Interface)
5. call MPI_GET_PROCESSOR_NAME(PANME, RESULTLEN, ierr)
1 c a l l MPI_INIT ( i e r r )2 i f ( i e r r /= MPI_SUCCESS) then3 pr in t ∗ , ’ E r ro r s t a r t i n g MPI program . Terminat ing . ’4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc , i e r r )5 end i f67 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank , i e r r )8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs , i e r r )9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )
Returns the name of the local processor at the time of the call,i.e, PNAME.‘’RESULTLEN’ is the character length of PNAME
21 / 48
Introduction to MPI (Message-Passing Interface)
5. call MPI_GET_PROCESSOR_NAME(PANME, RESULTLEN, ierr)
1 c a l l MPI_INIT ( i e r r )2 i f ( i e r r /= MPI_SUCCESS) then3 pr in t ∗ , ’ E r ro r s t a r t i n g MPI program . Terminat ing . ’4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc , i e r r )5 end i f67 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank , i e r r )8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs , i e r r )9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )
Returns the name of the local processor at the time of the call,i.e, PNAME.
‘’RESULTLEN’ is the character length of PNAME
21 / 48
Introduction to MPI (Message-Passing Interface)
5. call MPI_GET_PROCESSOR_NAME(PANME, RESULTLEN, ierr)
1 c a l l MPI_INIT ( i e r r )2 i f ( i e r r /= MPI_SUCCESS) then3 pr in t ∗ , ’ E r ro r s t a r t i n g MPI program . Terminat ing . ’4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc , i e r r )5 end i f67 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank , i e r r )8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs , i e r r )9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )
Returns the name of the local processor at the time of the call,i.e, PNAME.‘’RESULTLEN’ is the character length of PNAME
21 / 48
Introduction to MPI (Message-Passing Interface)
6. call MPI_FINALIZE (ierr)
12 write (∗ , " ( ’No . o f procs = ’ ,1X, i4 ,1 x , ’ , My rank = ’ ,1X, i4 , ’ , pname = ’ , A20 ) " )&3 numprocs , rank , pname45 c a l l MPI_FINALIZE ( i e r r )6 end
Terminates the MPI execution environment.This function should be the last MPI routine called in every MPIprogram - NO other MPI routines may be called after it.Note that all MPI programs
start with MPI_INIT (ierr)do something in-between, andend with MPI_FINALIZE (ierr).
Question: Who is/are doing “print *” above?Ans. all 4 processes
22 / 48
Introduction to MPI (Message-Passing Interface)
6. call MPI_FINALIZE (ierr)
12 write (∗ , " ( ’No . o f procs = ’ ,1X, i4 ,1 x , ’ , My rank = ’ ,1X, i4 , ’ , pname = ’ , A20 ) " )&3 numprocs , rank , pname45 c a l l MPI_FINALIZE ( i e r r )6 end
Terminates the MPI execution environment.This function should be the last MPI routine called in every MPIprogram - NO other MPI routines may be called after it.Note that all MPI programs
start with MPI_INIT (ierr)do something in-between, andend with MPI_FINALIZE (ierr).
Question: Who is/are doing “print *” above?
Ans. all 4 processes
22 / 48
Introduction to MPI (Message-Passing Interface)
6. call MPI_FINALIZE (ierr)
12 write (∗ , " ( ’No . o f procs = ’ ,1X, i4 ,1 x , ’ , My rank = ’ ,1X, i4 , ’ , pname = ’ , A20 ) " )&3 numprocs , rank , pname45 c a l l MPI_FINALIZE ( i e r r )6 end
Terminates the MPI execution environment.This function should be the last MPI routine called in every MPIprogram - NO other MPI routines may be called after it.Note that all MPI programs
start with MPI_INIT (ierr)do something in-between, andend with MPI_FINALIZE (ierr).
Question: Who is/are doing “print *” above?Ans. all 4 processes
22 / 48
Introduction to MPI (Message-Passing Interface)
Summary: rank and the number of processors
If the number of processors is Nprocs, then Nprocs processors willhave ranks of 0, 1, 2, � � � , Nprocs � 2, and Nprocs � 1.
For example, if 6 processors are used for a parallel MPIcalculation, then ranks of the processors will be0, 1, 2, 3, 4, and 5.
23 / 48
Introduction to MPI (Message-Passing Interface)
Summary: rank and the number of processors
If the number of processors is Nprocs, then Nprocs processors willhave ranks of 0, 1, 2, � � � , Nprocs � 2, and Nprocs � 1.
For example, if 6 processors are used for a parallel MPIcalculation, then ranks of the processors will be
0, 1, 2, 3, 4, and 5.
23 / 48
Introduction to MPI (Message-Passing Interface)
Summary: rank and the number of processors
If the number of processors is Nprocs, then Nprocs processors willhave ranks of 0, 1, 2, � � � , Nprocs � 2, and Nprocs � 1.
For example, if 6 processors are used for a parallel MPIcalculation, then ranks of the processors will be0, 1, 2, 3, 4, and 5.
23 / 48
Introduction to MPI (Message-Passing Interface)
Random Quiz
What are required MPI routines to generate a minimum parallelcode? And, how many?ANS:
MPI_INITMPI_FINALIZE2
24 / 48
Introduction to MPI (Message-Passing Interface)
Random Quiz
What are required MPI routines to generate a minimum parallelcode? And, how many?ANS:
MPI_INITMPI_FINALIZE2
24 / 48
Calculation of π using MPI
Outline
1 Cluster progress
2 Introduction to MPI (Message-Passing Interface)
3 Calculation of π using MPIBasicsWall TimeBroadcastBarrierData TypesReduceOperation TypesResources
25 / 48
Calculation of π using MPI Basics
Mathematical Identity
Numerically integrate
g pxq �4
1� x2(1)
from x � 0 to 1. Here, g pxq is f pxq in the reference book3.
» x�1
x�0
4
1� x2dx � 4
�tan�1 pxq
�x�1x�0
� 4�tan�1 p1q � tan�1 p0q
�� 4
�π4� 0
�
� π (2)
Make your own derivation, substituting x � tan y!
3Using MPI, Portable Parallel Programming with the Message-PassingInterface, by William Gropp, Ewing Lusk, Anthony Skjellum, The MIT press,page 21-26.
26 / 48
Calculation of π using MPI Basics
Mathematical Identity
Numerically integrate
g pxq �4
1� x2(1)
from x � 0 to 1. Here, g pxq is f pxq in the reference book3.» x�1
x�0
4
1� x2dx � 4
�tan�1 pxq
�x�1x�0
� 4�tan�1 p1q � tan�1 p0q
�� 4
�π4� 0
�
� π (2)
Make your own derivation, substituting x � tan y!
3Using MPI, Portable Parallel Programming with the Message-PassingInterface, by William Gropp, Ewing Lusk, Anthony Skjellum, The MIT press,page 21-26.
26 / 48
Calculation of π using MPI Basics
Integration Scheme
Figure: Integrating to find the value of π with n � 5 where n is the number ofpoints (or rectangles) for the integration.
27 / 48
Calculation of π using MPI Basics
The number of integration and the number ofprocesses (=cores)
1 If n � 100 and Nprocs � 4, the each process takes care of 25points.
Processor 0: n � 1� 25Processor 1: n � 26� 50Processor 2: n � 51� 75Processor 3: n � 76� 100
2 However, what if n{Nprocs is NOT an integer, for example, n � 100and Nprocs � 6? You can try n{Nprocs or n{ pNprocs � 1q. Then howdo you handle remainders?.
28 / 48
Calculation of π using MPI Basics
The number of integration and the number ofprocesses (=cores)
1 If n � 100 and Nprocs � 4, the each process takes care of 25points.
Processor 0: n � 1� 25Processor 1: n � 26� 50Processor 2: n � 51� 75Processor 3: n � 76� 100
2 However, what if n{Nprocs is NOT an integer, for example, n � 100and Nprocs � 6? You can try n{Nprocs or n{ pNprocs � 1q. Then howdo you handle remainders?.
28 / 48
Calculation of π using MPI Basics
Optimum load balance
* If n{Nprocs is NOT an integer, one possible optimal way is jumpingas many steps as the number of processes (i.e., 6)
Processor 0: n � 1, 7, 13, . . . , 91, 97
Processor 1: n � 2, 8, 14, . . . , 92, 98
Processor 2: n � 3, 9, 15, . . . , 93, 99
Processor 3: n � 4, 10, 16, . . . , 94, 100
Processor 4: n � 5, 11, 17, . . . , 95
Processor 5: n � 6, 12, 18, . . . , 96
This method is generally applicable without any restriction. Assignedtasks are almost identical to each process.
29 / 48
Calculation of π using MPI Basics
Optimum load balance
* If n{Nprocs is NOT an integer, one possible optimal way is jumpingas many steps as the number of processes (i.e., 6)
Processor 0: n � 1, 7, 13, . . . , 91, 97
Processor 1: n � 2, 8, 14, . . . , 92, 98
Processor 2: n � 3, 9, 15, . . . , 93, 99
Processor 3: n � 4, 10, 16, . . . , 94, 100
Processor 4: n � 5, 11, 17, . . . , 95
Processor 5: n � 6, 12, 18, . . . , 96
This method is generally applicable without any restriction. Assignedtasks are almost identical to each process.
29 / 48
Calculation of π using MPI Basics
fpi.f90 – the first half
1 program main2 include ’ mpi f . h ’3 double precision : : PI25DT4 parameter ( PI25DT = 3.141592653589793238462643d0 )5 double precision : : mypi , p i , h , sum, x , f , a6 double precision : : T1 , T27 integer : : n , myid , numprocs , i , r c8 ! f u n c t i o n to i n t e g r a t e9 f ( a ) = 4 . d0 / ( 1 . d0 + a∗a )
1011 c a l l MPI_INIT ( i e r r )12 c a l l MPI_COMM_RANK( MPI_COMM_WORLD, myid , i e r r )13 c a l l MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs , i e r r )1415 n = 10000000016 h = 1.0d0 / dble ( n )17 T1 = MPI_WTIME ( )1819 c a l l MPI_BCAST( n ,1 ,MPI_INTEGER,0 ,MPI_COMM_WORLD, i e r r )
30 / 48
Calculation of π using MPI Basics
INTEGER Types
1 INTEGER*2 (non-standard) (2 Bytes = 16 bits):From -32768 to 32767 = 216 � 104.8
2 INTEGER*4 (4 bytes = 32 bits), MPI default:From -2147483648 to 2147483647 = 232 � 109.63 1010
3 INTEGER*8 (8 bytes = 64 bits):From �
�264 � 2
�to
�264 � 2� 1
�= 264 � 1019.26 1020
31 / 48
Calculation of π using MPI Wall Time
MPI_WTIME()
T1 = MPI_WTIME()...T2 = MPI_WTIME()T2 - T1
1 Returns an elapsed wall clock time in seconds (double precision)on the calling processor.
2 Time in seconds since an arbitrary time in the past, e.g.,1970.0.0.0.0.0
3 Only the difference, T2 � T1, makes sense.
32 / 48
Calculation of π using MPI Broadcast
MPI_BCAST
call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr )MPI_BCAST (buffer, count, datatype, root, comm, ierr )broadcasts a message from the process with rank "root" toall other processes of the group.
1 buffer - name (or starting address) of buffer, i.e., variable (choice)2 count - the number of entries in buffer, i.e., how many (integer)3 datatype - data type of buffer such as integer, double precision, ...
(handle)4 root - rank of broadcast root, i.e., from whom?, broadcaster (integer)5 comm - communicator (handle), i.e., a communication language,
default is MPI_COMM_WORLD
Note: This MPI_BCAST call is only to explain how it works in theexample code, which means π will be properly calculated withoutthis call. This is because each process already knows the value ofn.
33 / 48
Calculation of π using MPI Broadcast
MPI_BCAST
call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr )MPI_BCAST (buffer, count, datatype, root, comm, ierr )broadcasts a message from the process with rank "root" toall other processes of the group.
1 buffer - name (or starting address) of buffer, i.e., variable (choice)
2 count - the number of entries in buffer, i.e., how many (integer)3 datatype - data type of buffer such as integer, double precision, ...
(handle)4 root - rank of broadcast root, i.e., from whom?, broadcaster (integer)5 comm - communicator (handle), i.e., a communication language,
default is MPI_COMM_WORLD
Note: This MPI_BCAST call is only to explain how it works in theexample code, which means π will be properly calculated withoutthis call. This is because each process already knows the value ofn.
33 / 48
Calculation of π using MPI Broadcast
MPI_BCAST
call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr )MPI_BCAST (buffer, count, datatype, root, comm, ierr )broadcasts a message from the process with rank "root" toall other processes of the group.
1 buffer - name (or starting address) of buffer, i.e., variable (choice)2 count - the number of entries in buffer, i.e., how many (integer)
3 datatype - data type of buffer such as integer, double precision, ...(handle)
4 root - rank of broadcast root, i.e., from whom?, broadcaster (integer)5 comm - communicator (handle), i.e., a communication language,
default is MPI_COMM_WORLD
Note: This MPI_BCAST call is only to explain how it works in theexample code, which means π will be properly calculated withoutthis call. This is because each process already knows the value ofn.
33 / 48
Calculation of π using MPI Broadcast
MPI_BCAST
call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr )MPI_BCAST (buffer, count, datatype, root, comm, ierr )broadcasts a message from the process with rank "root" toall other processes of the group.
1 buffer - name (or starting address) of buffer, i.e., variable (choice)2 count - the number of entries in buffer, i.e., how many (integer)3 datatype - data type of buffer such as integer, double precision, ...
(handle)
4 root - rank of broadcast root, i.e., from whom?, broadcaster (integer)5 comm - communicator (handle), i.e., a communication language,
default is MPI_COMM_WORLD
Note: This MPI_BCAST call is only to explain how it works in theexample code, which means π will be properly calculated withoutthis call. This is because each process already knows the value ofn.
33 / 48
Calculation of π using MPI Broadcast
MPI_BCAST
call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr )MPI_BCAST (buffer, count, datatype, root, comm, ierr )broadcasts a message from the process with rank "root" toall other processes of the group.
1 buffer - name (or starting address) of buffer, i.e., variable (choice)2 count - the number of entries in buffer, i.e., how many (integer)3 datatype - data type of buffer such as integer, double precision, ...
(handle)4 root - rank of broadcast root, i.e., from whom?, broadcaster (integer)
5 comm - communicator (handle), i.e., a communication language,default is MPI_COMM_WORLD
Note: This MPI_BCAST call is only to explain how it works in theexample code, which means π will be properly calculated withoutthis call. This is because each process already knows the value ofn.
33 / 48
Calculation of π using MPI Broadcast
MPI_BCAST
call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr )MPI_BCAST (buffer, count, datatype, root, comm, ierr )broadcasts a message from the process with rank "root" toall other processes of the group.
1 buffer - name (or starting address) of buffer, i.e., variable (choice)2 count - the number of entries in buffer, i.e., how many (integer)3 datatype - data type of buffer such as integer, double precision, ...
(handle)4 root - rank of broadcast root, i.e., from whom?, broadcaster (integer)5 comm - communicator (handle), i.e., a communication language,
default is MPI_COMM_WORLD
Note: This MPI_BCAST call is only to explain how it works in theexample code, which means π will be properly calculated withoutthis call. This is because each process already knows the value ofn.
33 / 48
Calculation of π using MPI Broadcast
MPI_BCAST
call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr )MPI_BCAST (buffer, count, datatype, root, comm, ierr )broadcasts a message from the process with rank "root" toall other processes of the group.
1 buffer - name (or starting address) of buffer, i.e., variable (choice)2 count - the number of entries in buffer, i.e., how many (integer)3 datatype - data type of buffer such as integer, double precision, ...
(handle)4 root - rank of broadcast root, i.e., from whom?, broadcaster (integer)5 comm - communicator (handle), i.e., a communication language,
default is MPI_COMM_WORLD
Note: This MPI_BCAST call is only to explain how it works in theexample code, which means π will be properly calculated withoutthis call. This is because each process already knows the value ofn.
33 / 48
Calculation of π using MPI Broadcast
Example of MPI_BCAST
1 program broadcast2 i m p l i c i t none3 include ’ mpi f . h ’4 integer numprocs , rank , i e r r , rc5 integer Nbcast6 c a l l MPI_INIT ( i e r r )7 i f ( i e r r /= MPI_SUCCESS) then8 pr in t ∗ , ’ E r ro r s t a r t i n g MPI program . Terminat ing . ’9 c a l l MPI_ABORT(MPI_COMM_WORLD, rc , i e r r )
10 end i f11 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank , i e r r )12 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs , i e r r )1314 Nbcast = �1015 pr in t ∗ , ’ I am ’ , rank , ’ o f ’ , numprocs , ’ and Nbcast = ’ , Nbcast16 c a l l MPI_BARRIER(MPI_COMM_WORLD, i e r r )1718 i f ( rank ==0) Nbcast = 1019 c a l l MPI_BCAST ( Nbcast , 1 ,MPI_INTEGER,0 ,MPI_COMM_WORLD, i e r r )20 pr in t ∗ , ’ I am ’ , rank , ’ o f ’ , numprocs , ’ and Nbcast = ’ , Nbcast2122 c a l l MPI_FINALIZE ( i e r r )23 end
An MPI program is executed by each processor concurrently with given conditionswhich could be different from each processor, i.e., rank-specific.
34 / 48
Calculation of π using MPI Broadcast
Outcome of MPI_BCAST example
I am 0 of 6 and Nbcast = -10
I am 1 of 6 and Nbcast = -10
I am 2 of 6 and Nbcast = -10
I am 5 of 6 and Nbcast = -10
I am 3 of 6 and Nbcast = -10
I am 4 of 6 and Nbcast = -10
I am 0 of 6 and Nbcast = 10
I am 1 of 6 and Nbcast = 10
I am 2 of 6 and Nbcast = 10
I am 4 of 6 and Nbcast = 10
I am 5 of 6 and Nbcast = 10
I am 3 of 6 and Nbcast = 10
Note: Without calling MPI_BARRIER, the above messages will behighly disordered because processes compete each other to printmessage to stout. For each run, change the executable file name.
35 / 48
Calculation of π using MPI Barrier
MPI_BARRIER
call MPI_BARRIER( MPI_COMM_WORLD, ierr )
MPI_BARRIER (comm, ierr )creates a barrier synchronization in a group.Each task, when reaching the MPI_Barrier call, blocks until alltasks in the group reach the same MPI_Barrier call.
Let’s check everybody is in the bus and then drive to the vacation place.No home alone!
36 / 48
Calculation of π using MPI Data Types
MPI FORTRAN Data Types
1 MPI_CHARACTER : character(1)
2 MPI_INTEGER : integer(4)
3 MPI_REAL : real(4)
4 MPI_DOUBLE_PRECISION : double precision = real(8)
5 MPI_COMPLEX : complex
6 MPI_LOGICAL : logical
7 MPI_BYTE : 8 binary digits
8 MPI_PACKED : data packed or unpacked with MPI_Pack()/MPI_Unpack
Note: C/C++ do not have complex and logical data-types.37 / 48
Calculation of π using MPI
fpi.f90 – second half
1 sum = 0.0d02 do i = myid+1 , n , numprocs3 x = h ∗ ( dble ( i ) � 0.5d0 )4 sum = sum + f ( x )5 enddo6 mypi = h ∗ sum7
8 c a l l MPI_REDUCE ( mypi , p i , 1 ,MPI_DOUBLE_PRECISION,MPI_SUM,0 ,MPI_COMM_WORLD, i e r r )9 T2 = MPI_WTIME ( )
10 i f ( myid == 0) then11 write (∗ , " ( ’ p i i s approx imate ly : ’ , F18.16 ) " ) p i12 write (∗ , " ( ’ E r ro r i s : ’ , F18 .16 ) " ) abs ( p i � PI25DT )13 write (∗ ,∗ ) " S t a r t t ime = " , T114 write (∗ ,∗ ) "End t ime = " , T215 write (∗ ,∗ ) " Elapsed t ime = " , T2�T116 write (∗ ,∗ ) " The number o f processes = " , numprocs17 endif18 c a l l MPI_FINALIZE ( rc )19 stop20 end
38 / 48
Calculation of π using MPI Reduce
MPI_REDUCE
call MPI_REDUCE(mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0,MPI_COMM_WORLD, ierr)
MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root,comm, ierr)reduces values on all processes to a single value.
1 sendbuf - address of send buffer by each processor, i.e., variable(choice)
2 recvbuf - address of receive buffer by reducer, i.e., a differentvariable (choice)
3 count - number of elements in send buffer, i.e., how many (integer)4 datatype - data type of elements of send buffer such as integer, real,
... (handle)5 op - reducing aritematic operation (handle)6 root - rank of root process, i.e., from whom?, broadcaster (integer)7 comm - communicator, i.e., a communication language, default is
MPI_COMM_WORLD (handle)
39 / 48
Calculation of π using MPI Reduce
MPI_REDUCE
call MPI_REDUCE(mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0,MPI_COMM_WORLD, ierr)
MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root,comm, ierr)reduces values on all processes to a single value.
1 sendbuf - address of send buffer by each processor, i.e., variable(choice)
2 recvbuf - address of receive buffer by reducer, i.e., a differentvariable (choice)
3 count - number of elements in send buffer, i.e., how many (integer)4 datatype - data type of elements of send buffer such as integer, real,
... (handle)5 op - reducing aritematic operation (handle)6 root - rank of root process, i.e., from whom?, broadcaster (integer)7 comm - communicator, i.e., a communication language, default is
MPI_COMM_WORLD (handle)
39 / 48
Calculation of π using MPI Reduce
MPI_REDUCE
call MPI_REDUCE(mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0,MPI_COMM_WORLD, ierr)
MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root,comm, ierr)reduces values on all processes to a single value.
1 sendbuf - address of send buffer by each processor, i.e., variable(choice)
2 recvbuf - address of receive buffer by reducer, i.e., a differentvariable (choice)
3 count - number of elements in send buffer, i.e., how many (integer)4 datatype - data type of elements of send buffer such as integer, real,
... (handle)5 op - reducing aritematic operation (handle)6 root - rank of root process, i.e., from whom?, broadcaster (integer)7 comm - communicator, i.e., a communication language, default is
MPI_COMM_WORLD (handle)
39 / 48
Calculation of π using MPI Reduce
MPI_REDUCE
call MPI_REDUCE(mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0,MPI_COMM_WORLD, ierr)
MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root,comm, ierr)reduces values on all processes to a single value.
1 sendbuf - address of send buffer by each processor, i.e., variable(choice)
2 recvbuf - address of receive buffer by reducer, i.e., a differentvariable (choice)
3 count - number of elements in send buffer, i.e., how many (integer)
4 datatype - data type of elements of send buffer such as integer, real,... (handle)
5 op - reducing aritematic operation (handle)6 root - rank of root process, i.e., from whom?, broadcaster (integer)7 comm - communicator, i.e., a communication language, default is
MPI_COMM_WORLD (handle)
39 / 48
Calculation of π using MPI Reduce
MPI_REDUCE
call MPI_REDUCE(mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0,MPI_COMM_WORLD, ierr)
MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root,comm, ierr)reduces values on all processes to a single value.
1 sendbuf - address of send buffer by each processor, i.e., variable(choice)
2 recvbuf - address of receive buffer by reducer, i.e., a differentvariable (choice)
3 count - number of elements in send buffer, i.e., how many (integer)4 datatype - data type of elements of send buffer such as integer, real,
... (handle)
5 op - reducing aritematic operation (handle)6 root - rank of root process, i.e., from whom?, broadcaster (integer)7 comm - communicator, i.e., a communication language, default is
MPI_COMM_WORLD (handle)
39 / 48
Calculation of π using MPI Reduce
MPI_REDUCE
call MPI_REDUCE(mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0,MPI_COMM_WORLD, ierr)
MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root,comm, ierr)reduces values on all processes to a single value.
1 sendbuf - address of send buffer by each processor, i.e., variable(choice)
2 recvbuf - address of receive buffer by reducer, i.e., a differentvariable (choice)
3 count - number of elements in send buffer, i.e., how many (integer)4 datatype - data type of elements of send buffer such as integer, real,
... (handle)5 op - reducing aritematic operation (handle)
6 root - rank of root process, i.e., from whom?, broadcaster (integer)7 comm - communicator, i.e., a communication language, default is
MPI_COMM_WORLD (handle)
39 / 48
Calculation of π using MPI Reduce
MPI_REDUCE
call MPI_REDUCE(mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0,MPI_COMM_WORLD, ierr)
MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root,comm, ierr)reduces values on all processes to a single value.
1 sendbuf - address of send buffer by each processor, i.e., variable(choice)
2 recvbuf - address of receive buffer by reducer, i.e., a differentvariable (choice)
3 count - number of elements in send buffer, i.e., how many (integer)4 datatype - data type of elements of send buffer such as integer, real,
... (handle)5 op - reducing aritematic operation (handle)6 root - rank of root process, i.e., from whom?, broadcaster (integer)
7 comm - communicator, i.e., a communication language, default isMPI_COMM_WORLD (handle)
39 / 48
Calculation of π using MPI Reduce
MPI_REDUCE
call MPI_REDUCE(mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0,MPI_COMM_WORLD, ierr)
MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root,comm, ierr)reduces values on all processes to a single value.
1 sendbuf - address of send buffer by each processor, i.e., variable(choice)
2 recvbuf - address of receive buffer by reducer, i.e., a differentvariable (choice)
3 count - number of elements in send buffer, i.e., how many (integer)4 datatype - data type of elements of send buffer such as integer, real,
... (handle)5 op - reducing aritematic operation (handle)6 root - rank of root process, i.e., from whom?, broadcaster (integer)7 comm - communicator, i.e., a communication language, default is
MPI_COMM_WORLD (handle)39 / 48
Calculation of π using MPI Reduce
Example of MPI_REDUCE
1 program broadcast2 i m p l i c i t none3 include ’ mpi f . h ’4 integer : : numprocs , rank , i e r r , rc5 integer : : Ireduced , Nreduced6 c a l l MPI_INIT ( i e r r )7 i f ( i e r r /= MPI_SUCCESS) then8 pr in t ∗ , ’ E r ro r s t a r t i n g MPI program . Terminat ing . ’9 c a l l MPI_ABORT(MPI_COMM_WORLD, rc , i e r r )
10 end i f11
12 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank , i e r r )13 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs , i e r r )14 I reduced = 1 + rank15 Nreduced = 016 pr in t ∗ , ’ I am ’ , rank , ’ o f ’ , numprocs , ’ and Ireduced = ’ , I reduced17 c a l l MPI_BARRIER(MPI_COMM_WORLD, i e r r )18 c a l l MPI_REDUCE &19 ( Ireduced , Nreduced , 1 , MPI_INTEGER, MPI_SUM, 0 , MPI_COMM_WORLD, i e r r )20 i f ( rank ==0) &21 pr in t ∗ , ’ I am ’ , rank , ’ o f ’ , numprocs , ’ and Nreduced = ’ , Nreduced22 c a l l MPI_FINALIZE ( i e r r )23 end
40 / 48
Calculation of π using MPI Reduce
Outcome of MPI_REDUCE example
I am 0 of 6 and Ireduced = 1
I am 1 of 6 and Ireduced = 2
I am 2 of 6 and Ireduced = 3
I am 3 of 6 and Ireduced = 4
I am 4 of 6 and Ireduced = 5
I am 5 of 6 and Ireduced = 6
I am 0 of 6 and Nreduced = 21
1� 2� 3� 4� 5� 6 � 21
Note that rank = Ireduced -1. Without calling MPI_BARRIER, the finalanswer of 21 can be positioned anywhere.
41 / 48
Calculation of π using MPI Operation Types
MPI Operations
1 MPI_MAX – maximum (integer, real, complex )
2 MPI_MIN – minimum (integer, real, complex)3 MPI_SUM – sum (integer, real, complex)4 MPI_PROD – product (integer, real, complex)5 MPI_LAND – logical AND (logical)6 MPI_LOR – logical OR (logical)7 MPI_LXOR – logical XOR (logical)8 MPI_BAND – bit-wise AND (integer, MPI_BYTE)9 MPI_BOR – bit-wise OR (integer, MPI_BYTE)
10 MPI_BXOR – bit-wise XOR (integer, MPI_BYTE)11 MPI_MAXLOC – max value and location (real, complex, double
precision )12 MPI_MINLOC – min value and location (real, complex, double
precision )
42 / 48
Calculation of π using MPI Operation Types
MPI Operations
1 MPI_MAX – maximum (integer, real, complex )2 MPI_MIN – minimum (integer, real, complex)
3 MPI_SUM – sum (integer, real, complex)4 MPI_PROD – product (integer, real, complex)5 MPI_LAND – logical AND (logical)6 MPI_LOR – logical OR (logical)7 MPI_LXOR – logical XOR (logical)8 MPI_BAND – bit-wise AND (integer, MPI_BYTE)9 MPI_BOR – bit-wise OR (integer, MPI_BYTE)
10 MPI_BXOR – bit-wise XOR (integer, MPI_BYTE)11 MPI_MAXLOC – max value and location (real, complex, double
precision )12 MPI_MINLOC – min value and location (real, complex, double
precision )
42 / 48
Calculation of π using MPI Operation Types
MPI Operations
1 MPI_MAX – maximum (integer, real, complex )2 MPI_MIN – minimum (integer, real, complex)3 MPI_SUM – sum (integer, real, complex)
4 MPI_PROD – product (integer, real, complex)5 MPI_LAND – logical AND (logical)6 MPI_LOR – logical OR (logical)7 MPI_LXOR – logical XOR (logical)8 MPI_BAND – bit-wise AND (integer, MPI_BYTE)9 MPI_BOR – bit-wise OR (integer, MPI_BYTE)
10 MPI_BXOR – bit-wise XOR (integer, MPI_BYTE)11 MPI_MAXLOC – max value and location (real, complex, double
precision )12 MPI_MINLOC – min value and location (real, complex, double
precision )
42 / 48
Calculation of π using MPI Operation Types
MPI Operations
1 MPI_MAX – maximum (integer, real, complex )2 MPI_MIN – minimum (integer, real, complex)3 MPI_SUM – sum (integer, real, complex)4 MPI_PROD – product (integer, real, complex)
5 MPI_LAND – logical AND (logical)6 MPI_LOR – logical OR (logical)7 MPI_LXOR – logical XOR (logical)8 MPI_BAND – bit-wise AND (integer, MPI_BYTE)9 MPI_BOR – bit-wise OR (integer, MPI_BYTE)
10 MPI_BXOR – bit-wise XOR (integer, MPI_BYTE)11 MPI_MAXLOC – max value and location (real, complex, double
precision )12 MPI_MINLOC – min value and location (real, complex, double
precision )
42 / 48
Calculation of π using MPI Operation Types
MPI Operations
1 MPI_MAX – maximum (integer, real, complex )2 MPI_MIN – minimum (integer, real, complex)3 MPI_SUM – sum (integer, real, complex)4 MPI_PROD – product (integer, real, complex)5 MPI_LAND – logical AND (logical)
6 MPI_LOR – logical OR (logical)7 MPI_LXOR – logical XOR (logical)8 MPI_BAND – bit-wise AND (integer, MPI_BYTE)9 MPI_BOR – bit-wise OR (integer, MPI_BYTE)
10 MPI_BXOR – bit-wise XOR (integer, MPI_BYTE)11 MPI_MAXLOC – max value and location (real, complex, double
precision )12 MPI_MINLOC – min value and location (real, complex, double
precision )
42 / 48
Calculation of π using MPI Operation Types
MPI Operations
1 MPI_MAX – maximum (integer, real, complex )2 MPI_MIN – minimum (integer, real, complex)3 MPI_SUM – sum (integer, real, complex)4 MPI_PROD – product (integer, real, complex)5 MPI_LAND – logical AND (logical)6 MPI_LOR – logical OR (logical)
7 MPI_LXOR – logical XOR (logical)8 MPI_BAND – bit-wise AND (integer, MPI_BYTE)9 MPI_BOR – bit-wise OR (integer, MPI_BYTE)
10 MPI_BXOR – bit-wise XOR (integer, MPI_BYTE)11 MPI_MAXLOC – max value and location (real, complex, double
precision )12 MPI_MINLOC – min value and location (real, complex, double
precision )
42 / 48
Calculation of π using MPI Operation Types
MPI Operations
1 MPI_MAX – maximum (integer, real, complex )2 MPI_MIN – minimum (integer, real, complex)3 MPI_SUM – sum (integer, real, complex)4 MPI_PROD – product (integer, real, complex)5 MPI_LAND – logical AND (logical)6 MPI_LOR – logical OR (logical)7 MPI_LXOR – logical XOR (logical)
8 MPI_BAND – bit-wise AND (integer, MPI_BYTE)9 MPI_BOR – bit-wise OR (integer, MPI_BYTE)
10 MPI_BXOR – bit-wise XOR (integer, MPI_BYTE)11 MPI_MAXLOC – max value and location (real, complex, double
precision )12 MPI_MINLOC – min value and location (real, complex, double
precision )
42 / 48
Calculation of π using MPI Operation Types
MPI Operations
1 MPI_MAX – maximum (integer, real, complex )2 MPI_MIN – minimum (integer, real, complex)3 MPI_SUM – sum (integer, real, complex)4 MPI_PROD – product (integer, real, complex)5 MPI_LAND – logical AND (logical)6 MPI_LOR – logical OR (logical)7 MPI_LXOR – logical XOR (logical)8 MPI_BAND – bit-wise AND (integer, MPI_BYTE)9 MPI_BOR – bit-wise OR (integer, MPI_BYTE)
10 MPI_BXOR – bit-wise XOR (integer, MPI_BYTE)11 MPI_MAXLOC – max value and location (real, complex, double
precision )12 MPI_MINLOC – min value and location (real, complex, double
precision )42 / 48
Calculation of π using MPI Resources
Resources and References
1 Gropp et al. Using MPI: Portable parallel programming with theMessage-Passing Interface. MIT Press (1997)
43 / 48
Calculation of π using MPI Resources
Makefile
1 mpif90 =/ opt / openmpi�1.6.1� i n t e l / b in / mpif90 �l i m f2 f o r o p t=�diag�d isab le 8290 �diag�d isab le 8291 �diag�d isab le 101453 mpirun =/ opt / openmpi�1.6.1� i n t e l / b in / mpirun4 mpiopt=�mca b t l tcp , s e l f ��mca b t l _ t c p _ i f _ i n c l u d e eth056 s r c r o o t = f p i7 s r c f i l e =$ ( s r c r o o t ) . f908 e x e f i l e =$ ( s r c r o o t ) . x9 numprocs=6
1011 a l l :12 $ ( mpif90 ) $ ( f o r o p t ) $ ( s r c f i l e ) �o $ ( e x e f i l e )1314 srun :15 t ime $ ( mpirun ) �mca b t l tcp , s e l f �np 1 . / $ ( e x e f i l e )1617 prun :18 t ime $ ( mpirun ) $ ( mpiopt ) ��h o s t f i l e . / mpihosts �np $ ( numprocs ) . / $ ( e x e f i l e )1920 e d i t :21 vim $ ( s r c f i l e )
44 / 48
Calculation of π using MPI Resources
script file
We will discuss this later.
45 / 48
Calculation of π using MPI Resources
Job output file – part
1 p i i s approx imate ly : 3.14159265358973892 Er ro r i s : 0.00000000000005423 S t a r t t ime = 1359673097.680334 End t ime = 1359673097.929175 Elapsed t ime = 0.2488420009613046 The number of processes = 67 0.08 user 0.04 system 0:01.84 elapsed 6%CPU (0 avgtex t +0avgdata 12896maxresident ) k8 0 inpu ts +0outputs (0 major+4125minor ) page fau l t s 0swaps
46 / 48
Calculation of π using MPI
Lab work
1 MPI codes are stored at /opt/cee618s13/class04/2 Copy examples under subdirectories of MPI-basic, MPI-bcase,
MPI-reduce and MPI-pi and study them.3 Type/enter “make” followed by “make prun”.4 Start your homework.
47 / 48
Calculation of π using MPI
Speed up test
1 Change the number of processes of MPI-pi application from 1 to 8and see how much speed up is acheieved.
48 / 48
Calculation of π using MPI
Speed up test
1 Change the number of processes of MPI-pi application from 1 to 8and see how much speed up is acheieved.
48 / 48
Calculation of π using MPI
Speed up test
1 Change the number of processes of MPI-pi application from 1 to 8and see how much speed up is acheieved.
48 / 48
Calculation of π using MPI
Speed up test
1 Change the number of processes of MPI-pi application from 1 to 8and see how much speed up is acheieved.
48 / 48
Calculation of π using MPI
Speed up test
1 Change the number of processes of MPI-pi application from 1 to 8and see how much speed up is acheieved.
48 / 48
top related