MPI is a language-independent communications protocol used to program parallel computers. It allows processes to communicate and synchronize through point-to-point and collective communication primitives. The document discusses how to install MPI on Linux clusters, describes common MPI functions for communication (e.g. MPI_Send, MPI_Recv) and collective operations (e.g. MPI_Bcast, MPI_Reduce). It also provides an example MPI program to demonstrate basic point-to-point communication between two processes.
3. Embedded and Parallel Systems Lab 3
MPI
MPI is a language-independent communications
protocol used to program parallel computers
分散式記憶體(Distributed-Memory)
SPMD(Single Program Multiple Data )
Fortran , C / C++
4. Embedded and Parallel Systems Lab 4
MPI需求及支援環境
Cluster Environment
Windows
Microsoft AD (Active Directory) server
Microsoft cluster server
Linux
NFS (Network FileSystem)
NIS (Network Information Services)又稱 yellow pages
SSH
MPICH 2
5. Embedded and Parallel Systems Lab 5
MPI 安裝
http://www-unix.mcs.anl.gov/mpi/mpich/
下載mpich2-1.0.4p1.tar.gz
[shell]# tar –zxvf mpich2-1.0.4p1.tar.gz
[shell]# mkdir /home/yourhome/mpich2
[shell]# cd mpich2-1.0.4p1
[shell]# ./configure –prefix=/home/yourhome/mpich2 //建議自行建立目錄安裝
[shell]# make
[shell]# make install
再來是
[shell]# cd ~yourhome //到自己home目錄下
[shell]# vi .mpd.conf //建立文件
內容為
secretword=<secretword> (secretword可以依自己喜好打)
Ex:
secretword=abcd1234
6. Embedded and Parallel Systems Lab 6
MPI 安裝
[shell]# chmod 600 mpd.conf
[shell]# vi .bash_profiles
將PATH=$PATH:$HOME/bin
改成PATH=$HOME/mpich2/bin:$PATH:$HOME/bin
重登server
[shell]# vi mpd.hosts //在自己home目錄下建立hosts list文件
ex:
cluster1
cluster2
cluster3
cluster4
7. Embedded and Parallel Systems Lab 7
MPI constructs
MPI
Point-to-Point
Communication
Collective
Communication
Process Group Virtual Topology
Blocking
MPI_Send()
MPI_Receive()
Nonblocking
MPI_Isend()
MPI_Irecv()
Synchronization
MPI_Barrier()
Data Exchange
MPI_Bcast()
MPI_Scatter()
MPI_Gather()
Mpi_Alltoall()
Collective
Computation
MPI_Reduce()
MPI_Comm_group()
MPI_Comm_create()
MPI_Group_incl()
MPI_Group_rank()
MPI_Group_size()
MPI_Comm_free()
MPI_Cart_create()
MPI_Cart_coords()
MPI_Cart_shift()
8. Embedded and Parallel Systems Lab 8
MPI程式基本架構
#include "mpi.h"
MPI_Init();
Do some work or MPI function
example:
MPI_Send() / MPI_Recv()
MPI_Finalize();
9. Embedded and Parallel Systems Lab 9
MPI Ethernet Control and Data Flow
Source : Douglas M. Pase, “Performance of Voltaire InfiniBand in IBM 64-Bit Commodity HPC Clusters,” IBM White
Papers, 2005
10. Embedded and Parallel Systems Lab 10
MPI Communicator
0
1
2
3
4
56
7
8
MPI_COMM_WORLD
11. Embedded and Parallel Systems Lab 11
MPI Function
function int MPI_Init( int *argc, char *argv[])
功能 起始MPI執行環境,必須在所有MPI function前使用,並可以將main的指令參
數(argc, argv)傳送到所有process
parameters int argc:參數數目
char* argv[]:參數內容
return value int:如果執行成功回傳MPI_SUCCESS,0
function int MPI_Finzlize()
功能 結束MPI執行環境,在所有工作完成後必須呼叫
parameters
return value int:如果執行成功回傳MPI_SUCCESS,0
12. Embedded and Parallel Systems Lab 12
MPI Function
function int MPI_Comm_size( MPI_Comm comm, int *size)
功能 取得總共有多少process數在該communicator
parameters comm:IN,MPI_COMM_WORLD
size:OUT,總計process數目
return value int:如果執行成功回傳MPI_SUCCESS,0
function int MPI_Comm_rank ( MPI_Comm comm, int *rank)
功能 取得 process自己的process ID
parameters comm:IN,MPI_COMM_WORLD
rank:OUT,目前process ID
return value int:如果執行成功回傳MPI_SUCCESS,0
13. Embedded and Parallel Systems Lab 13
MPI Function
function int MPI_Send(void* buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)
功能 傳資料到指定的Process,使用Standard模式
parameters buf:IN要傳送的資料(變數)
count:IN,傳送多少筆
datatype:IN,設定傳送的資料型態
dest:IN,目標Process ID
tag:IN,設定頻道
comm:IN,MPI_COMM_WORLD
return value int:如果執行成功回傳MPI_SUCCESS,0
function int MPI_Recv(void* buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm,
MPI_Status *status)
功能 接收來自指定的Process資料
parameters buf:OUT,要接收的資料(變數)
count:IN,接收多少筆
datatype:IN,設定接收的資料型態
source:IN,接收的Process ID
tag:IN,設定頻道
comm:IN,MPI_COMM_WORLD
status:OUT,取得MPI_Status
return value int:如果執行成功回傳MPI_SUCCESS,0
14. Embedded and Parallel Systems Lab 14
MPI Function
Status:指出來源的process ID和傳送的tag,在C是使用MPI_Status的資料型態
typedef struct MPI_Status {
int count;
int cancelled;
int MPI_SOURCE; //來源ID
int MPI_TAG; //來源傳送的tag
int MPI_ERROR; //錯誤控制碼
} MPI_Status;
function double MPI_Wtime()
功能 傳回一個時間(秒數,浮點數)代表目前時間,通常用來看程式執行的時間
parameters
return value double:傳回時間
15. Embedded and Parallel Systems Lab 15
MPI Function
function int MPI_Type_commit(MPI_Datatype *datatype);
功能 建立datatype
parameters datatype:INOUT,新的datatype
return value int:如果執行成功回傳MPI_SUCCESS,0
function MPI_Type_free(MPI_Datatype *datatype);
功能 釋放datatype
parameters datatype:INOUT,需釋放的datatype
return value int:如果執行成功回傳MPI_SUCCESS,0
16. Embedded and Parallel Systems Lab 16
MPI Function
function int MPI_Type_contiguous (int count, MPI_Datatype oldtype, MPI_Datatype *newtype)
功能 將現有資料型態(MPI_Datatype),簡單的重新定大小,形成新的資料型態,就是指將數個
相同型態的資料整合成一個
parameters count:IN,新型態的大小(指有幾個oldtype組成)
oldtype:IN,舊有的資料型態(MPI_Datatype)
newtype:OUT,新的資料型態
return value int:如果執行成功回傳MPI_SUCCESS,0
18. Embedded and Parallel Systems Lab 18
MPI example : hello.c
#include "mpi.h"
#include <stdio.h>
#define SIZE 20
int main(int argc,char *argv[])
{
int numtasks, rank, dest, source, rc, count, tag=1;
char inmsg[SIZE];
char outmsg[SIZE];
double starttime, endtime;
MPI_Status Stat;
MPI_Datatype strtype;
MPI_Init(&argc,&argv); //起始MPI環境
MPI_Comm_rank(MPI_COMM_WORLD, &rank); //取得自己的process ID
MPI_Type_contiguous(SIZE, MPI_CHAR, &strtype); //設定新的資料型態string
MPI_Type_commit(&strtype); //建立新的資料型態string
starttune=MPI_Wtime(); //取得目前時間
19. Embedded and Parallel Systems Lab 19
MPI example : hello.c
if (rank == 0) {
dest = 1;
source = 1;
strcpy(outmsg,"Who are you?");
//傳送訊息到process 0
rc = MPI_Send(outmsg, 1, strtype, dest, tag, MPI_COMM_WORLD);
printf("process %d has sended message: %sn",rank, outmsg);
//接收來自process 1 的訊息
rc = MPI_Recv(inmsg, 1, strtype, source, tag, MPI_COMM_WORLD, &Stat);
printf("process %d has received: %sn",rank, inmsg);
}
else if (rank == 1) {
dest = 0;
source = 0;
strcpy(outmsg,"I am process 1");
rc = MPI_Recv(inmsg, 1, strtype, source, tag, MPI_COMM_WORLD, &Stat);
printf("process %d has received: %sn",rank, inmsg);
rc = MPI_Send(outmsg, 1 , strtype, dest, tag, MPI_COMM_WORLD);
printf("process %d has sended message: %sn",rank, outmsg);
}
20. Embedded and Parallel Systems Lab 20
MPI example : hello.c
endtime=MPI_Wtime(); // 取得結束時間
//使用MPI_CHAR來計算實際收到多少資料
rc = MPI_Get_count(&Stat, MPI_CHAR, &count);
printf("Task %d: Received %d char(s) from task %d with tag %d and use
time is %f n", rank, count, Stat.MPI_SOURCE, Stat.MPI_TAG,
endtime-starttime);
MPI_Type_free(&strtype); //釋放string資料型態
MPI_Finalize(); //結束MPI
}
process 0 has sended message: Who are you?
process 1 has received: Who are you?
process 1 has sended message: I am process 1
Task 1: Received 20 char(s) from task 0 with tag 1 and use time is 0.001302
process 0 has received: I am process 1
Task 0: Received 20 char(s) from task 1 with tag 1 and use time is 0.002133
21. Embedded and Parallel Systems Lab 21
Collective Communication Routines
function int MPI_Barrier(MPI_Comm comm)
功能 當程式執行到Barrier便會block,等待所有其他process也執行到Barrier,當所
有Group內的process均執行到Barrier便會取消block繼續往下執行
parameters comm:IN,MPI_COMM_WORLD
return value int:如果執行成功回傳MPI_SUCCESS,0
Types of Collective Operations:
Synchronization : processes wait until all members of the group have reached
the synchronization point.
Data Movement : broadcast, scatter/gather, all to all.
Collective Computation (reductions) : one member of the group collects data
from the other members and performs an operation (min, max, add, multiply,
etc.) on that data.
22. Embedded and Parallel Systems Lab 22
MPI_Bcast
function int MPI_Bcast(void* buffer, int count, MPI_Datatype datatype, int source(root), MPI_Comm
comm)
功能 將訊息廣播出去,讓所有人接收到相同的訊息
parameters buffer:INOUT,傳送的訊息,也是接收訊息的buff
count:IN,傳送多少個訊息
datatype:IN,傳送的資料型能
source(標準root):IN,負責傳送訊息的process
comm:IN,MPI_COMM_WORLD
return value int:如果執行成功回傳MPI_SUCCESS,0
23. Embedded and Parallel Systems Lab 23
MPI_Gather
function int MPI_Gather(void* sendbuf, int sendcount, MPI_Datatype sendtype, void*
recvbuf, int recvcount, MPI_Datatype recvtype, int destine, MPI_Comm
comm)
功能 將分散在各個process 所傳送的訊息,整合起來,然後傳送到指定的process接
收
parameters sendbuf:IN,傳送的訊息
sendcount:IN,傳送多少個
sendtype:IN,傳送的型態
recvbuf:OUT,接收訊息的buf
recvcount:IN,接收多少個
recvtype:IN,接收的型態
destine:IN,負責接收訊息的process
comm:IN,MPI_COMM_WORLD
return value int:如果執行成功回傳MPI_SUCCESS,0
27. Embedded and Parallel Systems Lab 27
MPI_Reduce
function int MPI_Reduce(void* sendbuf, void* recvbuf, int count, MPI_Datatype datatype,
MPI_Op op, int destine, MPI_Comm comm)
功能 在傳送時順便做一些Operation(ex:MPI_SUM做加總),然後將結果送到
destine process
parameters sendbuf:IN,傳送的訊息
recvbuf:OUT,接收訊息的buf
count:IN,傳送接收多少個
datatype:IN,傳送接收的資料型態
op:IN,想要做的動作
destine:IN,接收訊息的process ID
comm:IN,MPI_COMM_WORLD
return value int:如果執行成功回傳MPI_SUCCESS,0
28. Embedded and Parallel Systems Lab 28
MPI_Reduce
MPI Reduction Operation C Data Types
MPI_MAX maximum integer, float
MPI_MIN minimum integer, float
MPI_SUM sum integer, float
MPI_PROD product integer, float
MPI_LAND logical AND integer
MPI_BAND bit-wise AND integer, MPI_BYTE
MPI_LOR logical OR integer
MPI_BOR bit-wise OR integer, MPI_BYTE
MPI_LXOR logical XOR integer
MPI_BXOR bit-wise XOR integer, MPI_BYTE
MPI_MAXLOC max value and location float, double and long double
MPI_MINLOC min value and location float, double and long double
29. Embedded and Parallel Systems Lab 29
MPI example : matrix.c(1)
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#define RANDOM_SEED 2882 //random seed
#define MATRIX_SIZE 800 //sequare matrix width the same to height
#define NODES 4//this is numbers of nodes. minimum is 1. don't use < 1
#define TOTAL_SIZE (MATRIX_SIZE * MATRIX_SIZE)//total size of
MATRIX
#define CHECK
int main(int argc, char *argv[]){
int i,j,k;
int node_id;
int AA[MATRIX_SIZE][MATRIX_SIZE];
int BB[MATRIX_SIZE][MATRIX_SIZE];
int CC[MATRIX_SIZE][MATRIX_SIZE];
30. Embedded and Parallel Systems Lab 30
MPI example : matrix.c(2)
#ifdef CHECK
int _CC[MATRIX_SIZE][MATRIX_SIZE]; //sequence user, use to check
the parallel result CC
#endif
int check = 1;
int print = 0;
int computing = 0;
double time,seqtime;
int numtasks;
int tag=1;
int node_size;
MPI_Status stat;
MPI_Datatype rowtype;
srand( RANDOM_SEED );
31. Embedded and Parallel Systems Lab 31
MPI example : matrix.c(3)
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &node_id);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
if (numtasks != NODES){
printf("Must specify %d processors. Terminating.n", NODES);
MPI_Finalize();
return 0;
}
if (MATRIX_SIZE%NODES !=0){
printf("Must MATRIX_SIZE%NODES==0n", NODES);
MPI_Finalize();
return 0;
}
MPI_Type_contiguous(MATRIX_SIZE, MPI_FLOAT, &rowtype);
MPI_Type_commit(&rowtype);
32. Embedded and Parallel Systems Lab 32
MPI example : matrix.c(4)
/*create matrix A and Matrix B*/
if(node_id == 0){
for( i=0 ; i<MATRIX_SIZE ; i++){
for( j=0 ; j<MATRIX_SIZE ; j++){
AA[i][j] = rand()%10;
BB[i][j] = rand()%10;
}
}
}
/*send the matrix A and B to other node */
node_size = MATRIX_SIZE / NODES;
33. Embedded and Parallel Systems Lab 33
MPI example : matrix.c(5)
//send AA
if (node_id == 0)
for (i=1; i<NODES; i++)
MPI_Send(&AA[i*node_size][0], node_size, rowtype, i, tag,
MPI_COMM_WORLD);
else
MPI_Recv(&AA[node_id*node_size][0], node_size, rowtype, 0, tag,
MPI_COMM_WORLD, &stat);
//send BB
if (node_id == 0)
for (i=1; i<NODES; i++)
MPI_Send(&BB, MATRIX_SIZE, rowtype, i, tag,
MPI_COMM_WORLD);
else
MPI_Recv(&BB, MATRIX_SIZE, rowtype, 0, tag,
MPI_COMM_WORLD, &stat);
34. Embedded and Parallel Systems Lab 34
MPI example : matrix.c(6)
/*computing C = A * B*/
time = -MPI_Wtime();
for( i=node_id*node_size ; i<(node_id*node_size+node_size) ; i++){
for( j=0 ; j<MATRIX_SIZE ; j++){
computing = 0;
for( k=0 ; k<MATRIX_SIZE ; k++)
computing += AA[i][k] * BB[k][j];
CC[i][j] = computing;
}
}
MPI_Allgather(&CC[node_id*node_size][0], node_size, rowtype, &CC,
node_size, rowtype, MPI_COMM_WORLD);
time += MPI_Wtime();
37. Embedded and Parallel Systems Lab 37
MPI example : matrix.c(8)
/*print result */
#endif
if(node_id ==0){
printf("node_id=%dncheck=%snprocessing
time:%fnn",node_id,(check)?"success!":"failure!", time);
#ifdef CHECK
printf("sequent time:%fn", seqtime);
#endif
}
MPI_Type_free(&rowtype);
MPI_Finalize();
return 0;
}
38. Embedded and Parallel Systems Lab 38
Reference
Top 500 http://www.top500.org/
Maarten Van Steen, Andrew S. Tanenbaum, “Distributed Systems: Principles
and Paradigms ”
System Threads Reference
http://www.unix.org/version2/whatsnew/threadsref.html
Semaphone http://www.mkssoftware.com/docs/man3/sem_init.3.asp
Richard Stones. Neil Matthew, “Beginning Linux Programming”
W. Richard Stevens, “Networking APIs:Sockets and XTI“
William W.-Y. Liang , “Linux System Programming”
Michael J. Quinn, “Parallel Programming in C with MPI and OpenMP”
Introduction to Parallel Computing
http://www.llnl.gov/computing/tutorials/parallel_comp/
39. Embedded and Parallel Systems Lab 39
Reference
Michael J. Quinn, “Parallel Programming in C with MPI and OpenMP”
Introduction to Parallel Computing
http://www.llnl.gov/computing/tutorials/parallel_comp/
MPI standard http://www-unix.mcs.anl.gov/mpi/
MPI http://www.llnl.gov/computing/tutorials/mpi/
40. Embedded and Parallel Systems Lab 40
books
Michael J. Quinn , “Parallel Programming in C
with MPI and OpenMP, 1st Edition”
http://books.google.com.tw/books?id=tDxNyGSXg5I
C&dq=parallel+programming+in+c+with+mpi+and+
openmp&pg=PP1&ots=I0QWyWECXI&sig=YwyUk
g9mKqWyxMnO1Hy7hkDc8dY&prev=http://www.g
oogle.com.tw/search%3Fsource%3Dig%26hl%3Dzh-
TW%26q%3DParallel%2Bprogramming%2Bin%2B
C%2Bwith%2Bmpi%2Band%2BopenMP%26meta%
3D%26btnG%3DGoogle%2B%25E6%2590%259C
%25E5%25B0%258B&sa=X&oi=print&ct=title#PP
A529,M1