2. Introduction to MPI/Ajit Nayak//2
What is MPI?
The Message Passing Interface, is a
standardised and portable message passing
system
It is designed by a group of researchers from
academia and industry to function on a wide
variety of parallel computer
The standard defines the syntax and semantics
of core library routines useful to a wide range
of users writing portable message-passing
programs in Fortran or C
3. Introduction to MPI/Ajit Nayak//3
The Environment
Clustering is used as a Parallel Computing
Environment
Clustering is nothing but a network (cluster) of
computers configured to work as one computer
This type of environment is cheaper (no more
than a general network environment!) than buying
an original parallel computer
A Parallel computer (of this kind) may be
deployed on the fly
4. Introduction to MPI/Ajit Nayak//4
Building the Environment
Choosing a Network Topology
• Network of workstations (NOW)
Choosing an Operating System
• Linux
Installing and configuring the Environment
• The network setup for master and slaves
Choosing, Installing and Configuring a MPI
implementation
• mpich-1.2.5 (free downloadable from
5. Introduction to MPI/Ajit Nayak//5
Network of Workstations
The cluster should be viewed as a special,
standalone entity.
It should have its own networking facilities that
are not shared with other systems.
A cluster should have one Master node that
serves as the central focus of the operational
aspects of the cluster.
Also it should have some Slave nodes, which
will be used as workstations.
6. Introduction to MPI/Ajit Nayak//6
A Pictorial view
Slave Nodes
Master Node
Ethernet Hub
Network of Workstations
7. Introduction to MPI/Ajit Nayak//7
Programming using MPI
MPI Assumes that the no of processes created
(for a program) are independent of processors
(computers) available (for the cluster).
The number of computers for a cluster is
decided by the network administrator
The number of processes for a program is
decided by the parallel programmer
i.e. no processes are created (or no processors
could be attached) during the execution of a
program.
8. Introduction to MPI/Ajit Nayak//8
How it works?
Each process is recognized by a unique integer
id called rank (0 .. p-1).
Each process is assigned to a computer
(Processor) in a round-robin fashion.
Process Zero always runs on Master
Process 0
P1
P2
P6
P5
P4 P3
9. Introduction to MPI/Ajit Nayak//9
Process Distribution
4 processors and 8 processes
Process 0 Assigned to Master
Process 1 Assigned to Slave 1
Process 2 Assigned to Slave 2
Process 3 Assigned to Slave 3
Process 4 Assigned to Master
Process 5 Assigned to Slave 1
Process 6 Assigned to Slave 2
Process 7 Assigned to Slave 3
10. Introduction to MPI/Ajit Nayak//10
Process Distribution
Master Node
Process Distribution
(P0, P4)
Slave 1 Slave 2 Slave 3
(P1, P5)
(P2, P6) (P3)
This programs
are called SPMD
programs!
12. Introduction to MPI/Ajit Nayak//12
Program Structure contd…
int MPI_Init( int *argc, char ***argv )
• It initializes the parallel environment.
• It must be called before any other MPI
routine
int MPI_Finalize( void )
• It cleans-up all MPI state
• After this invocation no MPI routine should
be called.
14. Introduction to MPI/Ajit Nayak//14
Sending information
int MPI_Send( void *buf, int count, MPI_Datatype
datatype, int dest, int tag, MPI_Comm comm)
• It is used for point-to-point communication, i.e. comm. between a
pair of processes. One side sending and other receiving.
•The send call blocks until the send buffer can be reclaimed
buf initial address of send buffer
count number of elements to send
datatype datatype of each entry
dest rank of destination
tag message tag
comm communicator
15. Introduction to MPI/Ajit Nayak//15
Receiving information
int MPI_Recv( void *buf, int count, MPI_Datatype
datatype, int source, int tag, MPI_Comm comm,
MPI_Status *status)
• It performs a blocking receive operation
buf initial address of send buffer
count max number of elements to receive
datatype datatype of each entry
dest rank of destination
tag message tag
comm communicator (defines a communication domain)
status return status (A structure having information
about source, tag and error)
16. Introduction to MPI/Ajit Nayak//16
A First Program
Problem:
• Root process receives a greeting message
from all other processes.
• Each process other than Root send a greeting
message to Root process
• Root process prints messages received from
other processes
We will use blocking send and recv calls to
perform this task
17. Introduction to MPI/Ajit Nayak//17
The Complete Program
#include<stdio.h>
#include<string.h>
#include “mpi.h”
int main (int argc, char *argv[ ]) {
int myRank, numProcs, dest;
int Root=0, tag=0; int i;
char msg[30];
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
MPI_Comm_size(MPI_COMM_WORLD, &numProcs);
if(myRank !=Root){
strcpy(msg, “Helo World”);
18. Introduction to MPI/Ajit Nayak//18
Greeting Program contd..
MPI_Send(msg, strlen(msg)+1, MPI_CHAR, Root, tag,
MPI_COMM_WORLD);
} //end if
else{
for(i=1; i < numProcs; i++){
MPI_Recv(msg, 30, MPI_CHAR, i, tag,
MPI_COMM_WORLD, &status);
printf(“%s From process %dn”, msg, i);
}//end for
} // end else
MPI_Finalize( );
} // End of program !
19. Introduction to MPI/Ajit Nayak//19
MPI Datatypes
MPI datatype C datatype
MPI_CHAR signed char
MPI_SHORT signed short int
MPI_INT signed int
MPI_LONG signed long int
MPI_UNSIGNED_CHAR unsigned char
MPI_UNSIGNED_SHORT unsigned short int
MPI_UNSIGNED unsigned int
MPI_UNSIGNED_LONG unsigned long int
MPI_FLOAT float
MPI_DOUBLE double
MPI_LONG_DOUBLE long double
20. Introduction to MPI/Ajit Nayak//20
Where /How to get it done?
Login to a specified user
Open a file in vi editor (vim greeting.c)
Write the source code, save and exit
In command prompt issue following commands
To Compile
mpicc -o <objFileName> <CFileName>
To Execute
mpiexec –n <4> ./<objFileName>
21. Introduction to MPI/Ajit Nayak//21
A Sum Program
Problem:
• To find the sum of n integers on p processes
• Where p=n and p=2q
22. Introduction to MPI/Ajit Nayak//22
Problem Understanding
0 1 2 3 4 5 6 7
6+220
1+5 9+130 4
No of processes (p)= 8
q=3 (p=2q)
0+1 2+3 4+5 6+70 2 64
23. Introduction to MPI/Ajit Nayak//23
Program
#include<stdio.h>
#include<math.h>
#include "mpi.h"
#define IS_INT(X) ( (X) == (int) (X)) ? 1:0
#define LOG2(X) log10(X)/log10(2)
int main(int argc, char *argv[]){
int myRank, numProcs, Root=0;
int source, destination, tag=0;
int iLevel, level, nextLevel , value, sum=0, ans;
float height; //height of the tree
MPI_Status status;
24. Introduction to MPI/Ajit Nayak//24
Program contd.
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
MPI_Comm_size(MPI_COMM_WORLD, &numProcs);
height=LOG2(numProcs);
25. Introduction to MPI/Ajit Nayak//25
Program contd.
/* if p!=2q */
if(!(IS_INT(height))){
if(MyRank==Root)
printf("n Error: Number of processes
should be power of 2...n");
MPI_Finalize();
exit(-1);
}//end error checking
sum=myRank;
26. Introduction to MPI/Ajit Nayak//26
Program contd.
//find sender and receiver in each level
for(ilevel=0; ilevel < height; ilevel++){
Level = pow(2, ilevel);
if((MyRank % level)== 0){
NextLevel = pow(2,(ilevel+1));
if((myRank% nextLevel)==0){ //if receiver
source = myRank+Level;
MPI_Recv(&value, 1, MPI_INT,
Source, tag, MPI_COMM_WORLD, &status);
sum += value;
}
27. Introduction to MPI/Ajit Nayak//27
Program contd.
else{//if sender
Destination=MyRank - Level;
MPI_Send(&sum, 1, MPI_INT,
destination, tag, MPI_COMM_WORLD);
}//end of else
}
} // end of for loop
if(MyRank==Root)
printf("nnMy Rank is %d and the Final Sum %d
nn",myRank,sum);
MPI_Finalize(); } //end of the program
29. Introduction to MPI/Ajit Nayak//29
Collective Communications
Collective communications transmit data
among all processes in a group.
Synchronizes processes without passing data.
MPI provides the following collective
communication
Global Communication Functions
Broadcast
Gather/Scatter etc.
Global Reduction Functions
Reduce
31. Introduction to MPI/Ajit Nayak//31
Broadcast
int MPI_Bcast(void *buf, int count, MPI_Datatype
datatype, int root, MPI_Comm comm);
• It broadcasts a message from root to all processes in the
group (including itself)
Broadcast
Processes
Elements
A0 A0
A0
A0
A0
32. Introduction to MPI/Ajit Nayak//32
Broadcasting How to?
. . .
int array[100];
int Root=0;
MPI_Bcast(array, 100, MPI_INT, Root,
MPI_COMM_WORLD);
. . .
This call will broadcast an array of 100
integers to all processes
We don‟t require a corresponding receive call in
all processes to receive this broadcast
33. Introduction to MPI/Ajit Nayak//33
Gather / Scatter
int MPI_Gather/Scatter(void *sendbuf, int
sendcount, MPI_Datatype sendtype, void
*recvdbuf, int recvcount, MPI_Datatype
recvtype, int root, MPI_Comm comm);
Processes
Elements
Gather
Scatter
A0
A1
A2
A3
A0 A1 A2 A3
34. Introduction to MPI/Ajit Nayak//34
Gather / Scatter
• Each process (including Root) sends the
contents of its send buffer to the root
process
• The root process receives the messages and
stores them in rank order
• recvcount argument at the root indicates the
no of items it receives from each process
100
Proc 0
100
Proc 1
100
Proc 2
Proc 0 100 100 100
Gather
Scatter
35. Introduction to MPI/Ajit Nayak//35
Gather / Scatter contd.
if (myRank==0)
rbuf=(int *) malloc (size*100*sizeof(int));
MPI_Gather(sendArray, 100, MPI_INT, rbuf,
100, MPI_INT, root, MPI_COMM_WORLD);
• Similarly in case of Scatter, Root process
distributes items to all processes in rank order
rbuf = ( int *) malloc (recvCount * sizeof(int));
if (myRank==0)
MPI_Scatter(sendbuf, sendCount, MPI_INT, rbuf,
recvCount, MPI_INT, root, MPI_COMM_WORLD);
36. Introduction to MPI/Ajit Nayak//36
Gather to All
This is like gather, except all process receives the
same result.
Processes
Elements
D0
C0
B0
A0
D0C0B0A0
D0C0B0A0
D0C0B0A0
D0C0B0A0
AllGather
The data received from jth process is placed at jth
block of rbuf.
int MPI_Allgather(void *sbuf, int scount, MPI_Datatype stype, void
*rbuf, int rcount, MPI_Datatype rtype, MPI_COMM_WORLD);
37. Introduction to MPI/Ajit Nayak//37
All to All
Each process sends distinct data to each of the
receivers.
Processes
Elements
D3D2D1D0
C3C2C1C0
B3B2B1B0
A3A2A1A0
D3C3B3A3
D2C2B2A2
D1C1B1A1
D0C0B0A0
Alltoall
The jth block sent from process i is received by
process j and is placed in the ith block of rbuf.
int MPI_Alltoall(void *sbuf, int scount, MPI_Datatype stype, void
*rbuf, int rcount, MPI_Datatype rtype, MPI_COMM_WORLD);
38. Introduction to MPI/Ajit Nayak//38
Next Program
Problem
• Matrix Vector Product ( A X = B )
Input: A [ 1..m, 1..n ], X[ 1..n ]
Output: B[ 1..m ]
Algorithm:
1. an 8x8 matrix has been divided into 4 (2x8) sub-matrices.
2. Each sub-matrix is stored in the local memory of each
process.
3. A local matrix-vector product is performed at each
process.
4. Then the results are assembled from each process onto
Root process.
39. Introduction to MPI/Ajit Nayak//39
The Program
#include <stdio.h>
#include "mpi.h"
int main( int argc, char **argv ){
int a[2][8], b[8], cpart[2], ctotal[8];
int rank, size, i, k;
MPI_Init( &argc, &argv );
MPI_Comm_rank( MPI_COMM_WORLD, &rank);
MPI_Comm_size( MPI_COMM_WORLD, &size );
40. Introduction to MPI/Ajit Nayak//40
Matrix vector Prog contd..
if (size != 4) {
printf("Error!:# of processors must be equal to 4n");
printf("Programm aborting....n");
MPI_Abort(MPI_COMM_WORLD, 1); }
for (i=0;i<2;i++) // setting values for a[1..2, 1..8]
for (k=0;k<8;k++) a[i][k] = rank*(k+1);
printf("process %d:n",rank); //printing values of „a‟
for (i=0;i<2;i++){
for (k=0;k<8;k++)printf("%dt",a[i][k]);
printf("n");
}
printf("n");
41. Introduction to MPI/Ajit Nayak//41
Matrix vector prod. Prog. contd..
for (k=0;k<8;k++) b[k] = k+1; // value of „b‟
for (i=0;i<2;i++){
cpart[i] = 0;
for (k=0;k<8;k++) cpart[i] = cpart[i] + a[i][k] * b[k];
}
printf("After multiplication process %d:n",rank);
for (k=0;k<2;k++)
printf("%dt",cpart[k]);
printf("n");
42. Introduction to MPI/Ajit Nayak//42
Matrix vector prod Prog contd..
MPI_Gather(cpart, 2, MPI_INT, ctotal, 2,
MPI_INT,0, MPI_COMM_WORLD);
if(rank==0){
printf("Vector b:n");
for (k=0;k<8;k++) printf("%dt",b[k]);
printf(“n result isn");
for (k=0;k<8;k++)
printf("%dt",ctotal[k]);
printf("n");
} // end of if
MPI_Finalize();
} // end of main program
48. Introduction to MPI/Ajit Nayak//48
Predefined Operations
Name Meaning
MPI_MAX maximum
MPI_MIN minimum
MPI_SUM sum
MPI_PROD product
MPI_LAND logical and
MPI_BAND bit-wise and
MPI_LOR logical or
MPI_BOR bit-wise or
MPI_LXOR logical xor
MPI_BXOR bit-wise xor
MPI_MAXLOC max value and location
MPI_MINLOC min value and location
49. Introduction to MPI/Ajit Nayak//49
Practice Programs
Find sum of integers in the range(0:500000) using
blocking calls (send, recv)
• Method 1: (simple)
Divide different ranges in in different processes
Process with rank greater than 0, sends the value of its
result to root process. Process 0 calculates and prints
final sum
• Method 2: (Linear array)
Process n-1 sends its sum to process n-2, n-2 adds its
own sum with received sum and sends the resulting sum
to n-3.
This process continues till it reaches at process 0.
50. Introduction to MPI/Ajit Nayak//50
Practice Programs
• Method 3: (hypercube) use the hypercube algorithm.
Find sum of above range of integers using non-
blocking calls (reduce)
Modify Matrix vector product program to work with
any size Matrix.
51. Introduction to MPI/Ajit Nayak//51
Other Utility-Calls
double MPI_Wtime(void)
• returns a double floating point number of
seconds, since some arbitrary point of time in
the past.
• It can be used to find the execution-time of
programs or ‘section of programs’. The time
interval can be measured by calling this
routine at the beginning and at the end of
program/section and subtracting the values
returned.
52. Introduction to MPI/Ajit Nayak//52
Other Utility-Calls
int MPI_Barrier (MPI_Comm comm)
• MPI_Barrier blocks the calling process until all processes in
communicator have entered the function.
int MPI_Sendrecv (void *sendbuf, int sendcount,
MPI_Datatype sendtype, int dest, int sendtag, void
*recvbuf , int recvcount, MPI_Datatype recvtype, int
source, int recvtag, MPI_Comm comm, MPI_Status
*status)
• performs both a send and a receive. Can be matched with
ordinary send and recv.
• No deadlock arises
53. Introduction to MPI/Ajit Nayak//53
References
The standardization forum:
• http://mpi-forum.org/
Software
• https://www.mpich.org/
Books
Peter. S. Pacheco – Parallel Programming with MPI,
Morgan Kaufmann Publishers
M. Snir, S. Otto, S H-Lederman, D Walker, J Dongarra
– MPI: The Complete Reference, The MIT Press,
Cambridge, Massachusetts, London