mpi
DESCRIPTION
Message Passing InterfaceTRANSCRIPT
MPI
Speaker :呂宗螢Date : 2007/06/01
Embedded and Parallel Systems Lab
2
Outline
Embedded and Parallel Systems Lab
3
MPI MPI is a language-independent communications
protocol used to program parallel computers 分散式記憶體( Distributed-Memory ) SPMD(Single Program Multiple Data ) Fortran , C / C++
Embedded and Parallel Systems Lab
4
MPI 需求及支援環境
Cluster EnvironmentWindows
Microsoft AD (Active Directory) serverMicrosoft cluster server
LinuxNFS (Network FileSystem)NIS (Network Information Services) 又稱 yellow
pagesSSHMPICH 2
Embedded and Parallel Systems Lab
5
MPI 安裝http://www-unix.mcs.anl.gov/mpi/mpich/下載 mpich2-1.0.4p1.tar.gz
[shell]# tar –zxvf mpich2-1.0.4p1.tar.gz[shell]# mkdir /home/yourhome/mpich2[shell]# cd mpich2-1.0.4p1[shell]# ./configure –prefix=/home/yourhome/mpich2 // 建議自行建立目錄安裝[shell]# make[shell]# make install
再來是
[shell]# cd ~yourhome // 到自己 home 目錄下[shell]# vi .mpd.conf // 建立文件
內容為secretword=<secretword> (secretword 可以依自己喜好打 )Ex:
secretword=abcd1234
Embedded and Parallel Systems Lab
6
MPI 安裝[shell]# chmod 600 mpd.conf[shell]# vi .bash_profiles
將 PATH=$PATH:$HOME/bin改成 PATH=$HOME/mpich2/bin:$PATH:$HOME/bin重登 server
[shell]# vi mpd.hosts // 在自己 home 目錄下建立 hosts list 文件
ex :
cluster1cluster2cluster3cluster4
Embedded and Parallel Systems Lab
7
MPI constructs
MPI
Point-to-Point Communication
Collective Communication
Process Group Virtual Topology
BlockingMPI_Send()MPI_Receive()
NonblockingMPI_Isend()MPI_Irecv()
SynchronizationMPI_Barrier() Data ExchangeMPI_Bcast()MPI_Scatter()MPI_Gather()Mpi_Alltoall()
Collective ComputationMPI_Reduce()
MPI_Comm_group()MPI_Comm_create()MPI_Group_incl()MPI_Group_rank()MPI_Group_size()MPI_Comm_free()
MPI_Cart_create()MPI_Cart_coords()MPI_Cart_shift()
Embedded and Parallel Systems Lab
8
MPI 程式基本架構 #include "mpi.h"
MPI_Init();
Do some work or MPI functionexample: MPI_Send() / MPI_Recv()
MPI_Finalize();
Embedded and Parallel Systems Lab
9
MPI Ethernet Control and Data Flow
Source : Douglas M. Pase, “Performance of Voltaire InfiniBand in IBM 64-Bit Commodity HPC Clusters,” IBM WhitePapers, 2005
Embedded and Parallel Systems Lab
10
MPI Communicator
0
12
3
456
7
8
MPI_COMM_WORLD
Embedded and Parallel Systems Lab
11
MPI Function
function int MPI_Init( int *argc, char *argv[])
功能 起始 MPI 執行環境,必須在所有 MPI function 前使用,並可以將 main 的指令參數 (argc, argv) 傳送到所有 process
parameters int argc :參數數目char* argv[] :參數內容
return value int :如果執行成功回傳 MPI_SUCCESS ,0
function int MPI_Finzlize()
功能 結束 MPI 執行環境,在所有工作完成後必須呼叫
parameters
return value int :如果執行成功回傳 MPI_SUCCESS ,0
Embedded and Parallel Systems Lab
12
MPI Function
function int MPI_Comm_size( MPI_Comm comm, int *size)
功能 取得總共有多少 process 數在該 communicator
parameters comm : IN , MPI_COMM_WORLDsize : OUT ,總計 process 數目
return value int :如果執行成功回傳 MPI_SUCCESS ,0
function int MPI_Comm_rank ( MPI_Comm comm, int *rank)
功能 取得 process 自己的 process ID
parameters comm : IN , MPI_COMM_WORLDrank : OUT ,目前 process ID
return value int :如果執行成功回傳 MPI_SUCCESS ,0
Embedded and Parallel Systems Lab
13
MPI Functionfunction int MPI_Send(void* buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)
功能 傳資料到指定的 Process ,使用 Standard 模式parameters buf : IN 要傳送的資料 ( 變數 )
count : IN ,傳送多少筆datatype : IN ,設定傳送的資料型態dest : IN ,目標 Process IDtag : IN ,設定頻道comm : IN , MPI_COMM_WORLD
return value int :如果執行成功回傳 MPI_SUCCESS ,0
function int MPI_Recv(void* buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status)
功能 接收來自指定的 Process 資料parameters buf : OUT ,要接收的資料 ( 變數 )
count : IN ,接收多少筆datatype : IN ,設定接收的資料型態source : IN ,接收的 Process IDtag : IN ,設定頻道comm : IN , MPI_COMM_WORLDstatus : OUT ,取得 MPI_Status
return value int :如果執行成功回傳 MPI_SUCCESS ,0
Embedded and Parallel Systems Lab
14
MPI Function
Status:指出來源的 process ID 和傳送的 tag ,在 C 是使用 MPI_Status 的資料型態typedef struct MPI_Status { int count; int cancelled; int MPI_SOURCE; //來源 ID int MPI_TAG; //來源傳送的 tag int MPI_ERROR; //錯誤控制碼 } MPI_Status;
function double MPI_Wtime()
功能 傳回一個時間 ( 秒數,浮點數 ) 代表目前時間,通常用來看程式執行的時間parameters
return value double :傳回時間
Embedded and Parallel Systems Lab
15
MPI Function
function int MPI_Type_commit(MPI_Datatype *datatype);
功能 建立 datatype
parameters datatype : INOUT ,新的 datatype
return value int :如果執行成功回傳 MPI_SUCCESS ,0
function MPI_Type_free(MPI_Datatype *datatype);
功能 釋放 datatype
parameters datatype : INOUT ,需釋放的 datatype
return value int :如果執行成功回傳 MPI_SUCCESS ,0
Embedded and Parallel Systems Lab
16
MPI Functionfunction int MPI_Type_contiguous (int count, MPI_Datatype oldtype, MPI_Datatype *newtype)
功能 將現有資料型態 (MPI_Datatype) ,簡單的重新定大小,形成新的資料型態,就是指將數個相同型態的資料整合成一個
parameters count : IN ,新型態的大小 ( 指有幾個 oldtype 組成 )oldtype : IN ,舊有的資料型態 (MPI_Datatype)newtype : OUT ,新的資料型態
return value int :如果執行成功回傳 MPI_SUCCESS ,0
Embedded and Parallel Systems Lab
17
撰寫程式和執行的步驟
1. 啟動MPI環境mpdboot -n 4 -f mpd.hosts //-n 為啟動 pc 數量 , mpd.hosts 為 pc
清單
2. 撰寫MPI程式vi hello.c
3. Compilempicc hello.c –o hello.o
4. 執行程式mpiexec –n 4 ./hello.o //-n 為 process 數量
5. 結束MPImpdallexit
Embedded and Parallel Systems Lab
18
MPI example : hello.c#include "mpi.h"#include <stdio.h>#define SIZE 20
int main(int argc,char *argv[]){ int numtasks, rank, dest, source, rc, count, tag=1; char inmsg[SIZE]; char outmsg[SIZE];
double starttime, endtime; MPI_Status Stat; MPI_Datatype strtype;
MPI_Init(&argc,&argv); //起始MPI環境 MPI_Comm_rank(MPI_COMM_WORLD, &rank); //取得自己的 process ID
MPI_Type_contiguous(SIZE, MPI_CHAR, &strtype); //設定新的資料型態 string MPI_Type_commit(&strtype); //建立新的資料型態 string
starttune=MPI_Wtime(); //取得目前時間
Embedded and Parallel Systems Lab
19
MPI example : hello.c if (rank == 0) { dest = 1; source = 1; strcpy(outmsg,"Who are you?");
//傳送訊息到 process 0 rc = MPI_Send(outmsg, 1, strtype, dest, tag, MPI_COMM_WORLD); printf("process %d has sended message: %s\n",rank, outmsg);
//接收來自 process 1 的訊息 rc = MPI_Recv(inmsg, 1, strtype, source, tag, MPI_COMM_WORLD, &Stat); printf("process %d has received: %s\n",rank, inmsg); } else if (rank == 1) { dest = 0; source = 0; strcpy(outmsg,"I am process 1"); rc = MPI_Recv(inmsg, 1, strtype, source, tag, MPI_COMM_WORLD, &Stat); printf("process %d has received: %s\n",rank, inmsg); rc = MPI_Send(outmsg, 1 , strtype, dest, tag, MPI_COMM_WORLD); printf("process %d has sended message: %s\n",rank, outmsg); }
Embedded and Parallel Systems Lab
20
MPI example : hello.c endtime=MPI_Wtime(); // 取得結束時間
//使用MPI_CHAR來計算實際收到多少資料 rc = MPI_Get_count(&Stat, MPI_CHAR, &count);
printf("Task %d: Received %d char(s) from task %d with tag %d and use
time is %f \n", rank, count, Stat.MPI_SOURCE, Stat.MPI_TAG,
endtime-starttime);
MPI_Type_free(&strtype); //釋放 string資料型態MPI_Finalize(); //結束MPI
}
process 0 has sended message: Who are you?
process 1 has received: Who are you?
process 1 has sended message: I am process 1
Task 1: Received 20 char(s) from task 0 with tag 1 and use time is 0.001302
process 0 has received: I am process 1
Task 0: Received 20 char(s) from task 1 with tag 1 and use time is 0.002133
Embedded and Parallel Systems Lab
21
Collective Communication Routines
function int MPI_Barrier(MPI_Comm comm)
功能 當程式執行到 Barrier 便會 block ,等待所有其他 process 也執行到 Barrier ,當所有 Group 內的 process 均執行到 Barrier 便會取消 block 繼續往下執行
parameters comm : IN , MPI_COMM_WORLD
return value int :如果執行成功回傳 MPI_SUCCESS ,0
Types of Collective Operations: Synchronization : processes wait until all members of the group have reached
the synchronization point. Data Movement : broadcast, scatter/gather, all to all. Collective Computation (reductions) : one member of the group collects data
from the other members and performs an operation (min, max, add, multiply, etc.) on that data.
Embedded and Parallel Systems Lab
22
MPI_Bcastfunction int MPI_Bcast(void* buffer, int count, MPI_Datatype datatype, int source(root), MPI_Comm
comm)
功能 將訊息廣播出去,讓所有人接收到相同的訊息parameters buffer : INOUT ,傳送的訊息,也是接收訊息的 buff
count : IN ,傳送多少個訊息datatype : IN ,傳送的資料型能source( 標準 root) : IN ,負責傳送訊息的 processcomm : IN , MPI_COMM_WORLD
return value int :如果執行成功回傳 MPI_SUCCESS ,0
Embedded and Parallel Systems Lab
23
MPI_Gatherfunction int MPI_Gather(void* sendbuf, int sendcount, MPI_Datatype sendtype, void*
recvbuf, int recvcount, MPI_Datatype recvtype, int destine, MPI_Comm comm)
功能 將分散在各個 process 所傳送的訊息,整合起來,然後傳送到指定的 process接收
parameters sendbuf : IN ,傳送的訊息sendcount : IN ,傳送多少個sendtype : IN ,傳送的型態recvbuf : OUT ,接收訊息的 bufrecvcount : IN ,接收多少個recvtype : IN ,接收的型態destine : IN ,負責接收訊息的 processcomm : IN , MPI_COMM_WORLD
return value int :如果執行成功回傳 MPI_SUCCESS ,0
Embedded and Parallel Systems Lab
24
MPI_Gather
Embedded and Parallel Systems Lab
25
MPI_Allgather
function int MPI_Allgather(void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm)
功能 將分散在各個 process 所傳送的訊息,整合起來,然後廣播到所有 process
parameters sendbuf : IN ,傳送的訊息sendcount : IN ,傳送多少個sendtype : IN ,傳送的型態recvbuf : OUT ,接收訊息的 bufrecvcount : IN ,接收多少個recvtype : IN ,接收的型態comm : IN , MPI_COMM_WORLD
return value int :如果執行成功回傳 MPI_SUCCESS ,0
Embedded and Parallel Systems Lab
26
MPI_Allgather
Embedded and Parallel Systems Lab
27
MPI_Reduce
function int MPI_Reduce(void* sendbuf, void* recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int destine, MPI_Comm comm)
功能 在傳送時順便做一些 Operation(ex : MPI_SUM 做加總 ) ,然後將結果送到destine process
parameters sendbuf : IN ,傳送的訊息recvbuf : OUT ,接收訊息的 bufcount : IN ,傳送接收多少個datatype : IN ,傳送接收的資料型態op : IN ,想要做的動作destine : IN ,接收訊息的 process IDcomm : IN , MPI_COMM_WORLD
return value int :如果執行成功回傳 MPI_SUCCESS ,0
Embedded and Parallel Systems Lab
28
MPI_Reduce
MPI Reduction Operation C Data Types
MPI_MAX maximum integer, float
MPI_MIN minimum integer, float
MPI_SUM sum integer, float
MPI_PROD product integer, float
MPI_LAND logical AND integer
MPI_BAND bit-wise AND integer, MPI_BYTE
MPI_LOR logical OR integer
MPI_BOR bit-wise OR integer, MPI_BYTE
MPI_LXOR logical XOR integer
MPI_BXOR bit-wise XOR integer, MPI_BYTE
MPI_MAXLOC max value and location float, double and long double
MPI_MINLOC min value and location float, double and long double
Embedded and Parallel Systems Lab
29
MPI example : matrix.c(1)#include <mpi.h>#include <stdio.h>#include <stdlib.h>#define RANDOM_SEED 2882 //random seed#define MATRIX_SIZE 800 //sequare matrix width the same to height#define NODES 4//this is numbers of nodes. minimum is 1. don't use < 1#define TOTAL_SIZE (MATRIX_SIZE * MATRIX_SIZE)//total size of
MATRIX#define CHECK
int main(int argc, char *argv[]){int i,j,k;int node_id;int AA[MATRIX_SIZE][MATRIX_SIZE]; int BB[MATRIX_SIZE][MATRIX_SIZE]; int CC[MATRIX_SIZE][MATRIX_SIZE];
Embedded and Parallel Systems Lab
30
MPI example : matrix.c(2)
#ifdef CHECKint _CC[MATRIX_SIZE][MATRIX_SIZE]; //sequence user, use to check the parallel result CC
#endifint check = 1;int print = 0;int computing = 0;double time,seqtime;int numtasks;int tag=1;int node_size;MPI_Status stat;
MPI_Datatype rowtype;srand( RANDOM_SEED );
Embedded and Parallel Systems Lab
31
MPI example : matrix.c(3)MPI_Init(&argc,&argv);MPI_Comm_rank(MPI_COMM_WORLD, &node_id);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);if (numtasks != NODES){
printf("Must specify %d processors. Terminating.\n", NODES);MPI_Finalize();return 0;
}if (MATRIX_SIZE%NODES !=0){
printf("Must MATRIX_SIZE%NODES==0\n", NODES);MPI_Finalize();
return 0;}
MPI_Type_contiguous(MATRIX_SIZE, MPI_FLOAT, &rowtype); MPI_Type_commit(&rowtype);
Embedded and Parallel Systems Lab
32
MPI example : matrix.c(4)/*create matrix A and Matrix B*/
if(node_id == 0){ for( i=0 ; i<MATRIX_SIZE ; i++){
for( j=0 ; j<MATRIX_SIZE ; j++){AA[i][j] = rand()%10; BB[i][j] = rand()%10;
} }}
/*send the matrix A and B to other node */node_size = MATRIX_SIZE / NODES;
Embedded and Parallel Systems Lab
33
MPI example : matrix.c(5)//send AA
if (node_id == 0) for (i=1; i<NODES; i++) MPI_Send(&AA[i*node_size][0], node_size, rowtype, i, tag,
MPI_COMM_WORLD);else
MPI_Recv(&AA[node_id*node_size][0], node_size, rowtype, 0, tag, MPI_COMM_WORLD, &stat);//send BB
if (node_id == 0) for (i=1; i<NODES; i++) MPI_Send(&BB, MATRIX_SIZE, rowtype, i, tag,
MPI_COMM_WORLD);else MPI_Recv(&BB, MATRIX_SIZE, rowtype, 0, tag,
MPI_COMM_WORLD, &stat);
Embedded and Parallel Systems Lab
34
MPI example : matrix.c(6)
/*computing C = A * B*/time = -MPI_Wtime();
for( i=node_id*node_size ; i<(node_id*node_size+node_size) ; i++){for( j=0 ; j<MATRIX_SIZE ; j++){
computing = 0;for( k=0 ; k<MATRIX_SIZE ; k++)
computing += AA[i][k] * BB[k][j];CC[i][j] = computing;
}}MPI_Allgather(&CC[node_id*node_size][0], node_size, rowtype, &CC, node_size, rowtype, MPI_COMM_WORLD);
time += MPI_Wtime();
Embedded and Parallel Systems Lab
35
MPI example : matrix.c(7)
#ifdef CHECKseqtime = -MPI_Wtime();if(node_id == 0){ for( i=0 ; i<MATRIX_SIZE ; i++){ for( j=0 ; j<MATRIX_SIZE ; j++){ computing = 0; for( k=0 ; k<MATRIX_SIZE ; k++) computing += AA[i][k] * BB[k][j]; _CC[i][j] = computing; }
}}seqtime += MPI_Wtime();
Embedded and Parallel Systems Lab
36
/* check result */if(node_id == 0){ for( i=0 ; i<MATRIX_SIZE; i++){
for( j=0 ; j<MATRIX_SIZE ; j++){ if( CC[i][j] != _CC[i][j]){
check = 0; break;
}}
}}
Embedded and Parallel Systems Lab
37
MPI example : matrix.c(8)
/*print result */#endif
if(node_id ==0){printf("node_id=%d\ncheck=%s\nprocessing time:%f\n\
n",node_id,(check)?"success!":"failure!", time);#ifdef CHECK
printf("sequent time:%f\n", seqtime);#endif
}
MPI_Type_free(&rowtype); MPI_Finalize();
return 0;}
Embedded and Parallel Systems Lab
38
Reference
Top 500 http://www.top500.org/ Maarten Van Steen, Andrew S. Tanenbaum, “Distributed Systems: Principles
and Paradigms ” System Threads Reference
http://www.unix.org/version2/whatsnew/threadsref.html Semaphone http://www.mkssoftware.com/docs/man3/sem_init.3.asp Richard Stones. Neil Matthew, “Beginning Linux Programming” W. Richard Stevens, “Networking APIs : Sockets and XTI“ William W.-Y. Liang , “Linux System Programming” Michael J. Quinn, “Parallel Programming in C with MPI and OpenMP” Introduction to Parallel Computing http://
www.llnl.gov/computing/tutorials/parallel_comp/
Embedded and Parallel Systems Lab
39
Reference
Michael J. Quinn, “Parallel Programming in C with MPI and OpenMP” Introduction to Parallel Computing http://
www.llnl.gov/computing/tutorials/parallel_comp/ MPI standard http://www-unix.mcs.anl.gov/mpi/ MPI http://www.llnl.gov/computing/tutorials/mpi/
Embedded and Parallel Systems Lab
40
books
Michael J. Quinn , “Parallel Programming in C with MPI and OpenMP, 1st Edition”
http://books.google.com.tw/books?id=tDxNyGSXg5IC&dq=parallel+programming+in+c+with+mpi+and+openmp&pg=PP1&ots=I0QWyWECXI&sig=YwyUkg9mKqWyxMnO1Hy7hkDc8dY&prev=http://www.google.com.tw/search%3Fsource%3Dig%26hl%3Dzh-TW%26q%3DParallel%2Bprogramming%2Bin%2BC%2Bwith%2Bmpi%2Band%2BopenMP%26meta%3D%26btnG%3DGoogle%2B%25E6%2590%259C%25E5%25B0%258B&sa=X&oi=print&ct=title#PPA529,M1