mpi

40
MPI Speaker 呂呂呂 Date 2007/06/01

Upload: zongying-lyu

Post on 19-Jun-2015

169 views

Category:

Software


0 download

DESCRIPTION

Message Passing Interface

TRANSCRIPT

Page 1: MPI

MPI

Speaker :呂宗螢Date : 2007/06/01

Page 2: MPI

Embedded and Parallel Systems Lab

2

Outline

Page 3: MPI

Embedded and Parallel Systems Lab

3

MPI MPI is a language-independent communications

protocol used to program parallel computers 分散式記憶體( Distributed-Memory ) SPMD(Single Program Multiple Data ) Fortran , C / C++

Page 4: MPI

Embedded and Parallel Systems Lab

4

MPI 需求及支援環境

Cluster EnvironmentWindows

Microsoft AD (Active Directory) serverMicrosoft cluster server

LinuxNFS (Network FileSystem)NIS (Network Information Services) 又稱 yellow

pagesSSHMPICH 2

Page 5: MPI

Embedded and Parallel Systems Lab

5

MPI 安裝http://www-unix.mcs.anl.gov/mpi/mpich/下載 mpich2-1.0.4p1.tar.gz

[shell]# tar –zxvf mpich2-1.0.4p1.tar.gz[shell]# mkdir /home/yourhome/mpich2[shell]# cd mpich2-1.0.4p1[shell]# ./configure –prefix=/home/yourhome/mpich2 // 建議自行建立目錄安裝[shell]# make[shell]# make install

再來是

[shell]# cd ~yourhome // 到自己 home 目錄下[shell]# vi .mpd.conf // 建立文件

內容為secretword=<secretword> (secretword 可以依自己喜好打 )Ex:

secretword=abcd1234

Page 6: MPI

Embedded and Parallel Systems Lab

6

MPI 安裝[shell]# chmod 600 mpd.conf[shell]# vi .bash_profiles

將 PATH=$PATH:$HOME/bin改成 PATH=$HOME/mpich2/bin:$PATH:$HOME/bin重登 server

[shell]# vi mpd.hosts // 在自己 home 目錄下建立 hosts list 文件

ex :

cluster1cluster2cluster3cluster4

Page 7: MPI

Embedded and Parallel Systems Lab

7

MPI constructs

MPI

Point-to-Point Communication

Collective Communication

Process Group Virtual Topology

BlockingMPI_Send()MPI_Receive()

NonblockingMPI_Isend()MPI_Irecv()

SynchronizationMPI_Barrier() Data ExchangeMPI_Bcast()MPI_Scatter()MPI_Gather()Mpi_Alltoall()

Collective ComputationMPI_Reduce()

MPI_Comm_group()MPI_Comm_create()MPI_Group_incl()MPI_Group_rank()MPI_Group_size()MPI_Comm_free()

MPI_Cart_create()MPI_Cart_coords()MPI_Cart_shift()

Page 8: MPI

Embedded and Parallel Systems Lab

8

MPI 程式基本架構 #include "mpi.h"

MPI_Init();

Do some work or MPI functionexample: MPI_Send() / MPI_Recv()

MPI_Finalize();

Page 9: MPI

Embedded and Parallel Systems Lab

9

MPI Ethernet Control and Data Flow

Source : Douglas M. Pase, “Performance of Voltaire InfiniBand in IBM 64-Bit Commodity HPC Clusters,” IBM WhitePapers, 2005

Page 10: MPI

Embedded and Parallel Systems Lab

10

MPI Communicator

0

12

3

456

7

8

MPI_COMM_WORLD

Page 11: MPI

Embedded and Parallel Systems Lab

11

MPI Function

function int MPI_Init( int *argc, char *argv[])

功能 起始 MPI 執行環境,必須在所有 MPI function 前使用,並可以將 main 的指令參數 (argc, argv) 傳送到所有 process

parameters int argc :參數數目char* argv[] :參數內容

return value int :如果執行成功回傳 MPI_SUCCESS ,0

function int MPI_Finzlize()

功能 結束 MPI 執行環境,在所有工作完成後必須呼叫

parameters

return value int :如果執行成功回傳 MPI_SUCCESS ,0

Page 12: MPI

Embedded and Parallel Systems Lab

12

MPI Function

function int MPI_Comm_size( MPI_Comm comm, int *size)

功能 取得總共有多少 process 數在該 communicator

parameters comm : IN , MPI_COMM_WORLDsize : OUT ,總計 process 數目

return value int :如果執行成功回傳 MPI_SUCCESS ,0

function int MPI_Comm_rank ( MPI_Comm comm, int *rank)

功能 取得 process 自己的 process ID

parameters comm : IN , MPI_COMM_WORLDrank : OUT ,目前 process ID

return value int :如果執行成功回傳 MPI_SUCCESS ,0

Page 13: MPI

Embedded and Parallel Systems Lab

13

MPI Functionfunction int MPI_Send(void* buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)

功能 傳資料到指定的 Process ,使用 Standard 模式parameters buf : IN 要傳送的資料 ( 變數 )

count : IN ,傳送多少筆datatype : IN ,設定傳送的資料型態dest : IN ,目標 Process IDtag : IN ,設定頻道comm : IN , MPI_COMM_WORLD

return value int :如果執行成功回傳 MPI_SUCCESS ,0

function int MPI_Recv(void* buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status)

功能 接收來自指定的 Process 資料parameters buf : OUT ,要接收的資料 ( 變數 )

count : IN ,接收多少筆datatype : IN ,設定接收的資料型態source : IN ,接收的 Process IDtag : IN ,設定頻道comm : IN , MPI_COMM_WORLDstatus : OUT ,取得 MPI_Status

return value int :如果執行成功回傳 MPI_SUCCESS ,0

Page 14: MPI

Embedded and Parallel Systems Lab

14

MPI Function

Status:指出來源的 process ID 和傳送的 tag ,在 C 是使用 MPI_Status 的資料型態typedef struct MPI_Status { int count; int cancelled; int MPI_SOURCE; //來源 ID int MPI_TAG; //來源傳送的 tag int MPI_ERROR; //錯誤控制碼 } MPI_Status;

function double MPI_Wtime()

功能 傳回一個時間 ( 秒數,浮點數 ) 代表目前時間,通常用來看程式執行的時間parameters

return value double :傳回時間

Page 15: MPI

Embedded and Parallel Systems Lab

15

MPI Function

function int MPI_Type_commit(MPI_Datatype *datatype);

功能 建立 datatype

parameters datatype : INOUT ,新的 datatype

return value int :如果執行成功回傳 MPI_SUCCESS ,0

function MPI_Type_free(MPI_Datatype *datatype);

功能 釋放 datatype

parameters datatype : INOUT ,需釋放的 datatype

return value int :如果執行成功回傳 MPI_SUCCESS ,0

Page 16: MPI

Embedded and Parallel Systems Lab

16

MPI Functionfunction int MPI_Type_contiguous (int count, MPI_Datatype oldtype, MPI_Datatype *newtype)

功能 將現有資料型態 (MPI_Datatype) ,簡單的重新定大小,形成新的資料型態,就是指將數個相同型態的資料整合成一個

parameters count : IN ,新型態的大小 ( 指有幾個 oldtype 組成 )oldtype : IN ,舊有的資料型態 (MPI_Datatype)newtype : OUT ,新的資料型態

return value int :如果執行成功回傳 MPI_SUCCESS ,0

Page 17: MPI

Embedded and Parallel Systems Lab

17

撰寫程式和執行的步驟

1. 啟動MPI環境mpdboot -n 4 -f mpd.hosts //-n 為啟動 pc 數量 , mpd.hosts 為 pc

清單

2. 撰寫MPI程式vi hello.c

3. Compilempicc hello.c –o hello.o  

4. 執行程式mpiexec –n 4 ./hello.o //-n 為 process 數量

5. 結束MPImpdallexit

Page 18: MPI

Embedded and Parallel Systems Lab

18

MPI example : hello.c#include "mpi.h"#include <stdio.h>#define SIZE 20

int main(int argc,char *argv[]){ int numtasks, rank, dest, source, rc, count, tag=1; char inmsg[SIZE]; char outmsg[SIZE];

double starttime, endtime; MPI_Status Stat; MPI_Datatype strtype;

MPI_Init(&argc,&argv); //起始MPI環境 MPI_Comm_rank(MPI_COMM_WORLD, &rank); //取得自己的 process ID

MPI_Type_contiguous(SIZE, MPI_CHAR, &strtype); //設定新的資料型態 string MPI_Type_commit(&strtype);   //建立新的資料型態 string

starttune=MPI_Wtime(); //取得目前時間

Page 19: MPI

Embedded and Parallel Systems Lab

19

MPI example : hello.c if (rank == 0) { dest = 1; source = 1; strcpy(outmsg,"Who are you?");

//傳送訊息到 process 0 rc = MPI_Send(outmsg, 1, strtype, dest, tag, MPI_COMM_WORLD); printf("process %d has sended message: %s\n",rank, outmsg);

//接收來自 process 1 的訊息 rc = MPI_Recv(inmsg, 1, strtype, source, tag, MPI_COMM_WORLD, &Stat); printf("process %d has received: %s\n",rank, inmsg); } else if (rank == 1) { dest = 0; source = 0; strcpy(outmsg,"I am process 1"); rc = MPI_Recv(inmsg, 1, strtype, source, tag, MPI_COMM_WORLD, &Stat); printf("process %d has received: %s\n",rank, inmsg); rc = MPI_Send(outmsg, 1 , strtype, dest, tag, MPI_COMM_WORLD); printf("process %d has sended message: %s\n",rank, outmsg); }

Page 20: MPI

Embedded and Parallel Systems Lab

20

MPI example : hello.c endtime=MPI_Wtime(); // 取得結束時間

//使用MPI_CHAR來計算實際收到多少資料 rc = MPI_Get_count(&Stat, MPI_CHAR, &count);

printf("Task %d: Received %d char(s) from task %d with tag %d and use

time is %f \n", rank, count, Stat.MPI_SOURCE, Stat.MPI_TAG,

endtime-starttime);

MPI_Type_free(&strtype); //釋放 string資料型態MPI_Finalize(); //結束MPI

}

process 0 has sended message: Who are you?

process 1 has received: Who are you?

process 1 has sended message: I am process 1

Task 1: Received 20 char(s) from task 0 with tag 1 and use time is 0.001302

process 0 has received: I am process 1

Task 0: Received 20 char(s) from task 1 with tag 1 and use time is 0.002133

Page 21: MPI

Embedded and Parallel Systems Lab

21

Collective Communication Routines

function int MPI_Barrier(MPI_Comm comm)

功能 當程式執行到 Barrier 便會 block ,等待所有其他 process 也執行到 Barrier ,當所有 Group 內的 process 均執行到 Barrier 便會取消 block 繼續往下執行

parameters comm : IN , MPI_COMM_WORLD

return value int :如果執行成功回傳 MPI_SUCCESS ,0

Types of Collective Operations: Synchronization : processes wait until all members of the group have reached

the synchronization point. Data Movement : broadcast, scatter/gather, all to all. Collective Computation (reductions) : one member of the group collects data

from the other members and performs an operation (min, max, add, multiply, etc.) on that data.

Page 22: MPI

Embedded and Parallel Systems Lab

22

MPI_Bcastfunction int MPI_Bcast(void* buffer, int count, MPI_Datatype datatype, int source(root), MPI_Comm

comm)

功能 將訊息廣播出去,讓所有人接收到相同的訊息parameters buffer : INOUT ,傳送的訊息,也是接收訊息的 buff

count : IN ,傳送多少個訊息datatype : IN ,傳送的資料型能source( 標準 root) : IN ,負責傳送訊息的 processcomm : IN , MPI_COMM_WORLD

return value int :如果執行成功回傳 MPI_SUCCESS ,0

Page 23: MPI

Embedded and Parallel Systems Lab

23

MPI_Gatherfunction int MPI_Gather(void* sendbuf, int sendcount, MPI_Datatype sendtype, void*

recvbuf, int recvcount, MPI_Datatype recvtype, int destine, MPI_Comm comm)

功能 將分散在各個 process 所傳送的訊息,整合起來,然後傳送到指定的 process接收

parameters sendbuf : IN ,傳送的訊息sendcount : IN ,傳送多少個sendtype : IN ,傳送的型態recvbuf : OUT ,接收訊息的 bufrecvcount : IN ,接收多少個recvtype : IN ,接收的型態destine : IN ,負責接收訊息的 processcomm : IN , MPI_COMM_WORLD

return value int :如果執行成功回傳 MPI_SUCCESS ,0

Page 24: MPI

Embedded and Parallel Systems Lab

24

MPI_Gather

Page 25: MPI

Embedded and Parallel Systems Lab

25

MPI_Allgather

function int MPI_Allgather(void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm)

功能 將分散在各個 process 所傳送的訊息,整合起來,然後廣播到所有 process

parameters sendbuf : IN ,傳送的訊息sendcount : IN ,傳送多少個sendtype : IN ,傳送的型態recvbuf : OUT ,接收訊息的 bufrecvcount : IN ,接收多少個recvtype : IN ,接收的型態comm : IN , MPI_COMM_WORLD

return value int :如果執行成功回傳 MPI_SUCCESS ,0

Page 26: MPI

Embedded and Parallel Systems Lab

26

MPI_Allgather

Page 27: MPI

Embedded and Parallel Systems Lab

27

MPI_Reduce

function int MPI_Reduce(void* sendbuf, void* recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int destine, MPI_Comm comm)

功能 在傳送時順便做一些 Operation(ex : MPI_SUM 做加總 ) ,然後將結果送到destine process

parameters sendbuf : IN ,傳送的訊息recvbuf : OUT ,接收訊息的 bufcount : IN ,傳送接收多少個datatype : IN ,傳送接收的資料型態op : IN ,想要做的動作destine : IN ,接收訊息的 process IDcomm : IN , MPI_COMM_WORLD

return value int :如果執行成功回傳 MPI_SUCCESS ,0

Page 28: MPI

Embedded and Parallel Systems Lab

28

MPI_Reduce

MPI Reduction Operation C Data Types

MPI_MAX maximum integer, float

MPI_MIN minimum integer, float

MPI_SUM sum integer, float

MPI_PROD product integer, float

MPI_LAND logical AND integer

MPI_BAND bit-wise AND integer, MPI_BYTE

MPI_LOR logical OR integer

MPI_BOR bit-wise OR integer, MPI_BYTE

MPI_LXOR logical XOR integer

MPI_BXOR bit-wise XOR integer, MPI_BYTE

MPI_MAXLOC max value and location float, double and long double

MPI_MINLOC min value and location float, double and long double

Page 29: MPI

Embedded and Parallel Systems Lab

29

MPI example : matrix.c(1)#include <mpi.h>#include <stdio.h>#include <stdlib.h>#define RANDOM_SEED 2882 //random seed#define MATRIX_SIZE 800 //sequare matrix width the same to height#define NODES 4//this is numbers of nodes. minimum is 1. don't use < 1#define TOTAL_SIZE (MATRIX_SIZE * MATRIX_SIZE)//total size of

MATRIX#define CHECK

int main(int argc, char *argv[]){int i,j,k;int node_id;int AA[MATRIX_SIZE][MATRIX_SIZE]; int BB[MATRIX_SIZE][MATRIX_SIZE]; int CC[MATRIX_SIZE][MATRIX_SIZE];

Page 30: MPI

Embedded and Parallel Systems Lab

30

MPI example : matrix.c(2)

#ifdef CHECKint _CC[MATRIX_SIZE][MATRIX_SIZE]; //sequence user, use to check the parallel result CC

#endifint check = 1;int print = 0;int computing = 0;double time,seqtime;int numtasks;int tag=1;int node_size;MPI_Status stat;

MPI_Datatype rowtype;srand( RANDOM_SEED );

Page 31: MPI

Embedded and Parallel Systems Lab

31

MPI example : matrix.c(3)MPI_Init(&argc,&argv);MPI_Comm_rank(MPI_COMM_WORLD, &node_id);

MPI_Comm_size(MPI_COMM_WORLD, &numtasks);if (numtasks != NODES){

printf("Must specify %d processors. Terminating.\n", NODES);MPI_Finalize();return 0;

}if (MATRIX_SIZE%NODES !=0){

printf("Must MATRIX_SIZE%NODES==0\n", NODES);MPI_Finalize();

return 0;}

MPI_Type_contiguous(MATRIX_SIZE, MPI_FLOAT, &rowtype); MPI_Type_commit(&rowtype);

Page 32: MPI

Embedded and Parallel Systems Lab

32

MPI example : matrix.c(4)/*create matrix A and Matrix B*/

if(node_id == 0){ for( i=0 ; i<MATRIX_SIZE ; i++){

for( j=0 ; j<MATRIX_SIZE ; j++){AA[i][j] = rand()%10; BB[i][j] = rand()%10;

} }}

/*send the matrix A and B to other node */node_size = MATRIX_SIZE / NODES;

Page 33: MPI

Embedded and Parallel Systems Lab

33

MPI example : matrix.c(5)//send AA

if (node_id == 0) for (i=1; i<NODES; i++) MPI_Send(&AA[i*node_size][0], node_size, rowtype, i, tag,

MPI_COMM_WORLD);else

MPI_Recv(&AA[node_id*node_size][0], node_size, rowtype, 0, tag, MPI_COMM_WORLD, &stat);//send BB

if (node_id == 0) for (i=1; i<NODES; i++) MPI_Send(&BB, MATRIX_SIZE, rowtype, i, tag,

MPI_COMM_WORLD);else MPI_Recv(&BB, MATRIX_SIZE, rowtype, 0, tag,

MPI_COMM_WORLD, &stat);

Page 34: MPI

Embedded and Parallel Systems Lab

34

MPI example : matrix.c(6)

/*computing C = A * B*/time = -MPI_Wtime();

for( i=node_id*node_size ; i<(node_id*node_size+node_size) ; i++){for( j=0 ; j<MATRIX_SIZE ; j++){

computing = 0;for( k=0 ; k<MATRIX_SIZE ; k++)

computing += AA[i][k] * BB[k][j];CC[i][j] = computing;

}}MPI_Allgather(&CC[node_id*node_size][0], node_size, rowtype, &CC, node_size, rowtype, MPI_COMM_WORLD);

time += MPI_Wtime();

Page 35: MPI

Embedded and Parallel Systems Lab

35

MPI example : matrix.c(7)

#ifdef CHECKseqtime = -MPI_Wtime();if(node_id == 0){ for( i=0 ; i<MATRIX_SIZE ; i++){ for( j=0 ; j<MATRIX_SIZE ; j++){ computing = 0; for( k=0 ; k<MATRIX_SIZE ; k++) computing += AA[i][k] * BB[k][j]; _CC[i][j] = computing; }

}}seqtime += MPI_Wtime();

Page 36: MPI

Embedded and Parallel Systems Lab

36

/* check result */if(node_id == 0){ for( i=0 ; i<MATRIX_SIZE; i++){

for( j=0 ; j<MATRIX_SIZE ; j++){ if( CC[i][j] != _CC[i][j]){

check = 0; break;

}}

}}

Page 37: MPI

Embedded and Parallel Systems Lab

37

MPI example : matrix.c(8)

/*print result */#endif

if(node_id ==0){printf("node_id=%d\ncheck=%s\nprocessing time:%f\n\

n",node_id,(check)?"success!":"failure!", time);#ifdef CHECK

printf("sequent time:%f\n", seqtime);#endif

}

MPI_Type_free(&rowtype); MPI_Finalize();

return 0;}

Page 38: MPI

Embedded and Parallel Systems Lab

38

Reference

Top 500 http://www.top500.org/ Maarten Van Steen, Andrew S. Tanenbaum, “Distributed Systems: Principles

and Paradigms ” System Threads Reference

http://www.unix.org/version2/whatsnew/threadsref.html Semaphone http://www.mkssoftware.com/docs/man3/sem_init.3.asp Richard Stones. Neil Matthew, “Beginning Linux Programming” W. Richard Stevens, “Networking APIs : Sockets and XTI“ William W.-Y. Liang , “Linux System Programming” Michael J. Quinn, “Parallel Programming in C with MPI and OpenMP” Introduction to Parallel Computing   http://

www.llnl.gov/computing/tutorials/parallel_comp/

Page 39: MPI

Embedded and Parallel Systems Lab

39

Reference

Michael J. Quinn, “Parallel Programming in C with MPI and OpenMP” Introduction to Parallel Computing   http://

www.llnl.gov/computing/tutorials/parallel_comp/ MPI standard http://www-unix.mcs.anl.gov/mpi/ MPI http://www.llnl.gov/computing/tutorials/mpi/

Page 40: MPI

Embedded and Parallel Systems Lab

40

books

Michael J. Quinn , “Parallel Programming in C with MPI and OpenMP, 1st Edition”

http://books.google.com.tw/books?id=tDxNyGSXg5IC&dq=parallel+programming+in+c+with+mpi+and+openmp&pg=PP1&ots=I0QWyWECXI&sig=YwyUkg9mKqWyxMnO1Hy7hkDc8dY&prev=http://www.google.com.tw/search%3Fsource%3Dig%26hl%3Dzh-TW%26q%3DParallel%2Bprogramming%2Bin%2BC%2Bwith%2Bmpi%2Band%2BopenMP%26meta%3D%26btnG%3DGoogle%2B%25E6%2590%259C%25E5%25B0%258B&sa=X&oi=print&ct=title#PPA529,M1