SlideShare a Scribd company logo
1 of 99
Download to read offline
Parallel program design
Speaker:呂宗螢
Date:2007/06/01
Embedded and Parallel Systems Lab 2
Outline
● Introduction
● Parallel Algorithm Design
● Parallel keyword
● pthread
● OpenMP
● MPI
● Conclusion
Embedded and Parallel Systems Lab 3
Introduction
■ Why Use Parallel Computing?
● Save time
● Solve larger problems
● Provide concurrency
● Cost savings
● Multi-core CPU掘起
◆ Intel® Core™2 duo
◆ Intel® Core™2 Quad
◆ AMD Opteron
◆ AMD Phenom
◆ Xbox360
◆ PS3
Embedded and Parallel Systems Lab 4
Introduction
■ Parallel computing
● It is the use of a parallel computer to reduce the time
needed to solve a single computational problem.
■ Parallel programming
● It is a language that allows you to explicitly indicate how
different portions of the computation may be executed
concurrently by different processors.
■ 將一個程式分成n個不同的部份,使之能夠同時執
行降低執行時間,其最後結果與原本程式相同
Embedded and Parallel Systems Lab 5
Introduction
Serial
Source : http://www.llnl.gov/computing/tutorials/parallel_comp
Embedded and Parallel Systems Lab 6
Introduction
■ Who’s Doing Parallel Computing
Embedded and Parallel Systems Lab 7
Introduction
■ What are the using if for?
Embedded and Parallel Systems Lab 8
Introduction
■ 常見的平行
● 管線(Pipeline)
● fork
● 執行緒(Thread)
● 對稱式多處理機
(Symmetric MultiProcessors, SMP)
● 叢集運算(Cluster)
● 網格運算(Grid)
◆ SETI@Home
◆ Folding@Home (ps3)
Embedded and Parallel Systems Lab 9
Introduction
■ 常見的平行程式
■ 以記憶體來分
● 分散式記憶體為主(distribute shared)
◆ 訊息傳遞(message passing)為主
➢ PVM (Parallel Virtual Machine )
➢ MPI (Message Passing Interface)
● 以共享記憶體為主(shared memory )
◆ DSM (distribute shared memory)
◆ Fork
◆ thread
◆ OpenMP
Embedded and Parallel Systems Lab 10
Introduction
■ Flynn's Classical Taxonomy
M I M D
Multiple Instruction,
Multiple Data
M I S D
Multiple Instruction,
Single Data
S I M D
Single Instruction,
Multiple Data
S I S D
Single Instruction,
Single Data
Embedded and Parallel Systems Lab 11
Introduction
SISD SIMD
Source : http://www.llnl.gov/computing/tutorials/parallel_comp
Embedded and Parallel Systems Lab 12
Introduction
M I S D
M I M D
Source : http://www.llnl.gov/computing/tutorials/parallel_comp
Embedded and Parallel Systems Lab 13
Introduction
■ Amdahl’s Law
Best you could ever hope to do:
Embedded and Parallel Systems Lab 14
Parallel Algorithm Design
■ Ian Foster
■ Four-step process for designing parallel algorithm
1. Partitioning
2. Communication
3. Agglomeration
4. Mapping
■ 平行化的大原則
● Maximize processor utilization
● Minimize communication overhead
● Load balancing
Embedded and Parallel Systems Lab 15
Parallel Algorithm Design
■ Partitioning
● Process of dividing the computation and the data
into pieces.
● Domain decomposition
● Functional decomposition
Problem
Embedded and Parallel Systems Lab 16
Parallel Algorithm Design
■ Communication
● Local communication
● Global communication
Embedded and Parallel Systems Lab 17
Parallel Algorithm Design
■ Agglomeration
● Increasing the locality (combining tasks that are
connected by a channel eliminates)
● Combining sending and receiving task
Embedded and Parallel Systems Lab 18
Parallel Algorithm Design
■ Mapping
● Process of assigning tasks to processor
A
C D
B
E
G
F
H
I
A
C D
B
E
G
F
HI
Embedded and Parallel Systems Lab 19
Foster’s parallel algorithm design
Problem A
C D
B
E
G
F
H
I
A
C
D
&
FB
E
G H
I
A
C
B
E
G HI
D
&
F
Mapping
Partitioning
Communication
Agglomeration
Embedded and Parallel Systems Lab 20
Parallel Example : matrix
A B C
X =
X
=X
X
X
=
=
=
merge
P1
P2
P3
P4
Embedded and Parallel Systems Lab 21
Decision Tree
Source : Michael J. Quinn, “Parallel Programming in C with MPI and OpenMP”
Embedded and Parallel Systems Lab 22
Parallel keyword
■ private data
● 擁有獨立私有的資料,不受其他process所影響
■ share data
● 資料為共享的,所有process均可得之,並會受其他process執行所影
響
■ barrier
● 資料同步化使用, process執行至此會等待,直到所有process執行
均執行到此才會繼續執行
■ reduction
● 將所有process所運算結果,合併起來(ex:sum , max , min)
■ atomic
● 使該記憶體位置為連動,意思為存取該記憶體位置時,不受其他
process所影響,避免相競現像(race conditions)
■ critical
● 臨界區域,使該區域執行時,同時只能有一個process執行,避免相
競現像(race conditions)
Embedded and Parallel Systems Lab 23
Thread
Embedded and Parallel Systems Lab 24
pthread
■ What is a Thread?
● A thread is a logical flow that runs in the context of a
process.
● Multiply threads can running concurrently in a single
process.
● Each thread has its own thread context
◆ a unique integer thread ID (TID)
◆ stack
◆ stack pointer
◆ program counter
◆ general-purpose registers
◆ condition codes
Source : William W.-Y. Liang , “Linux System Programming”
Embedded and Parallel Systems Lab 25
Thread v.s. Processes
■ Process:
● When a process executes a fork call, a new copy of the
process is created with its own variables and its own PID.
● This new process is scheduled independently, and (in
general) executes almost independently of the process that
created is.
■ Thread:
● When we create a new thread in a process, the new thread
of execution gets its own stack (and hence local variables)
but shares global variables, file descriptors, signal handlers,
and its current directory state with the process that created
it.
Source : William W.-Y. Liang , “Linux System Programming”
Embedded and Parallel Systems Lab 26
pthread Function
If ok return 0
If error return error number (>=0)
Return value
tid:ID for the created thread
att::thread attribute object, if NULL 為default attribute
func:thread function
arg:argument for the thread
Parameters
Create a new thread of execution功能
int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *
(*func)(void*), void *arg)
Function
Function int pthread_join(pthread_t tid, void **thread_return)
功能 Blocks the calling thread until the specified thread terninates
Parameters tid:ID for the created thread
thread_return:buffer for the returned value
Return value If ok return 0
If error return error number (>=0)
Embedded and Parallel Systems Lab 27
pthread Function
noneReturn value
retval :Thread return value. If not NULL,retval = thread_return
(pthread_join)
Parameters
Terminates the calling thread功能
void pthread_exit(void * retval)Function
Function pthread_t pthread_self(void);
功能 Return current thread ID
Parameters none
Return value Thread ID (unsigned long int)
Embedded and Parallel Systems Lab 28
Example: thread.c
#include <stdio.h>
#include <pthread.h>
char message[]="Example:create new thread";
void *thread_function(void *arg){
pthread_t tid = pthread_self();
printf("thread_function is runningn");
printf("new ID:%u Argument is %sn", tid, (char*)arg);
pthread_exit("new thread endn");
}
int main(void){
pthread_t new_thread;
pthread_t master_thread = pthread_self();
void *thread_result;
pthread_create(&new_thread, NULL, thread_function, (void*)message);
pthread_join(new_thread, &thread_result);
printf("nmaster ID:%u the new thread return valus is:%sn",
master_thread,(char*)thread_result);
return 0;
}
Embedded and Parallel Systems Lab 29
pthread Attribute
If ok return 0
If error return error number (>=0)
Return value
attr: thread attribute objectParameters
Initialize a thread attributes object.功能
int pthread_attr_init (pthread_attr_t *attr);Function
Function
int pthread_attr_destroy(pthread_attr_t *attr)
功能 Destory a thread attributes object.
Parameters attr: thread attribute object
Return value If ok return 0
If error return error number (>=0)
Embedded and Parallel Systems Lab 30
pthread Attribute
Thread’s stack sizestacksize
Thread’s stack addressstackaddr
(PAGESIZE bytes)Thread’s guard sizeguardsize
PTHREAD_INHERIT_SCHED:thread attribute從建立者繼承
PTHREAD_EXPLICIT_SCHED :thread屬性由thread attribute
(pthread_attr_t)來決定
Thread’s scheduling inhertienceinheritsched
Argument (blue is default)FunctionAttribute
Threads’ scheduling parametersschedparam
SCHED_FIFO:first in first out
SCHED_RR:round robin
SCHED_OTHER:沒有優先權
Thread’s scheduling policyschedpolicy
PTHREAD_CREATE_DETACHED:當thread結束時,會將所有資源
都釋放掉
PTHREAD_CREATE_JOINABLE:當thread結束時,它的thread ID
和結束狀態會保留,直到行程中的有 thread去對它呼叫pthread_join
Threads’ detach state.detachstate
PTHREAD_SCOPE_SYSTEM、PTHREAD_SCOPE_PROCESS,
But linux only have
PTHREAD_SCOPE_SYSTEM
Thread’s scope.scope
Embedded and Parallel Systems Lab 31
Get pthread Attribute
■ int pthread_attr_getdetachstate(const pthread_attr_t *attr, int *detachstate);
■ int pthread_attr_getguardsize(const pthread_attr_t *attr, size_t *guardsize);
■ int pthread_attr_getinheritsched(const pthread_attr_t *attr, int
*inheritsched);
■ int pthread_attr_getschedparam(const pthread_attr_t *attr, struct
sched_param *param);
■ int pthread_attr_getschedpolicy(const pthread_attr_t *attr, int *policy);
■ int pthread_attr_getscope(const pthread_attr_t *attr, int *scope);
■ int pthread_attr_getstackaddr(const pthread_attr_t *attr, void **stackaddr);
■ int pthread_attr_getstacksize(const pthread_attr_t *attr, size_t *stacksize);
Embedded and Parallel Systems Lab 32
Set pthread Attribute
■ int pthread_attr_setdetachstate(pthread_attr_t *attr, int detachstate);
■ int pthread_attr_setguardsize(pthread_attr_t *attr, size_t guardsize);
■ int pthread_attr_setinheritsched(pthread_attr_t *attr, int inheritsched);
■ int pthread_attr_setschedparam(pthread_attr_t *attr, const struct
sched_param *param);
■ int pthread_attr_setschedpolicy(pthread_attr_t *attr, int policy);
■ int pthread_attr_setscope(pthread_attr_t *attr, int scope);
■ int pthread_attr_setstackaddr(pthread_attr_t *attr, void *stackaddr);
■ int pthread_attr_setstacksize(pthread_attr_t *attr, size_t stacksize);
Embedded and Parallel Systems Lab 33
OpenMP Directive Table
Specifies that a variable is private to a thread.threadprivate
Lets you specify that a section of code should be executed on a single thread, not necessarily
the master thread.
single
Identifies code sections to be divided among all threads.sections
Defines a parallel region, which is code that will be executed by multiple threads in parallel.parallel
Specifies that code under a parallelized for loop should be executed like a sequential loop.ordered
Specifies that only the master threadshould execute a section of the program.master
Causes the work done in a for loop inside a parallel region to be divided among threads.for
Specifies that all threads have the same view of memory for all shared objects.flush
Specifies that code is only executed on one thread at a time.critical
Synchronizes all threads in a team; all threads pause at the barrier, until all threads execute the
barrier.
barrier
Specifies that a memory location that will be updated atomically.atomic
DescriptionDirective
Source :http://msdn2.microsoft.com/zh-tw/library/0ca2w8dk(VS.80).aspx
Embedded and Parallel Systems Lab 34
OpenMP Clause Table
Specifies that one or more variables should be shared among all threads.shared
Applies to the for directive. Have fourt method: static 、dynamic、guided、runtimeschedule
Specifies that one or more variables that are private to each thread are the subject of a reduction
operation at the end of the parallel region.
reduction
Specifies that each thread should have its own instance of a variable.private
Required on a parallel for statement if an ordered directive is to be used in the loop.ordered
Sets the number of threads in a thread team.num_threads
Overrides the barrier implicit in a directive.nowait
Specifies that the enclosing context's version of the variable is set equal to the private version of
whichever thread executes the final iteration (for-loop construct) or last section (#pragma sections).
lastprivate
Specifies whether a loop should be executed in parallel or in serial.if
Specifies that each thread should have its own instance of a variable, and that the variable should be
initialized with the value of the variable, because it exists before the parallel construct.
firstprivate
Specifies the behavior of unscoped variables in a parallel region.default
Specifies that one or more variables should be shared among all threads.copyprivate
Allows threads to access the master thread's value, for a threadprivate variable.copyin
DescriptionClause
Source :http://msdn2.microsoft.com/zh-tw/library/0ca2w8dk(VS.80).aspx
Embedded and Parallel Systems Lab 35
Reference
■ System Threads Reference http://www.unix.org/version2/whatsnew/threadsref.
html
■ Semaphone http://www.mkssoftware.com/docs/man3/sem_init.3.asp
■ Richard Stones. Neil Matthew, “Beginning Linux Programming”
■ William W.-Y. Liang , “Linux System Programming”
Embedded and Parallel Systems Lab 36
OpenMP
Embedded and Parallel Systems Lab 37
OpenMP
■ OpenMP 2.5
■ Multi-threaded & Share memory
■ Fortran、C / C++
■ 基本語法
● #pragma omp directive [clause]
■ OpenMP 需求及支援環境
● Windows
◆ Virtual studio 2005 standard
◆ Intel ® C++ Compiler 9.1
● Linux
◆ gcc 4.2.0
◆ Omni
● Xbox 360 & PS3
Embedded and Parallel Systems Lab 38
■ 於程式最前面#include <omp.h>
■ Virtual studio 2005 standard
● 專案/專案屬性/組態屬性/c/c++/語言
◆ 將OpenMP支援改為yes
Embedded and Parallel Systems Lab 39
OpenMP Constructs
Embedded and Parallel Systems Lab 40
Types of Work-Sharing Constructs
■ Loop:shares iterations of a loop
across the team. Represents a type of
"data parallelism".
Source : http://www.llnl.gov/computing/tutorials/openMP/
■ Sections:breaks work into separate,
discrete sections. Each section is executed
by a thread. Can be used to implement a
type of "functional parallelism".
Embedded and Parallel Systems Lab 41
Types of Work-Sharing Constructs
■ single:將程式於一個執行緒執行(於一個子執行緒執行,但不會在
master thread執行)
Source : http://www.llnl.gov/computing/tutorials/openMP/
Embedded and Parallel Systems Lab 42
Loop working sharing
#pragma omp parallel for
for( int i , i <10000, i++)
for( int j , j <100 , j++)
function(i);
#pragma omp parallel
{大括號必須斷行,不能接於parallel後
#pragma omp for
for( int i , i <10000, i++)
for( int j , j <100 , j++)
function(i);
}
=
parallel for只能使用迴圈的index 為 int 型態,且執行次數是可預知的
Thread 0 (Master)
for( i = 0 , i <5000, i++)
for( int j , j <100 , j++)
function(i);
Thread 1
for( i = 5000 , i <10000, i++)
for( int j , j <100 , j++)
function(i);
於雙執行緒的cpu執行時情形
Embedded and Parallel Systems Lab 43
OpenMP example : log.cpp
#include <omp.h>
#pragma omp parallel for num_threads(2) //將for迴圈平均分給2個threads
for (y=2;y<BufSizeY-2;y++)
for (x=2;x<BufSizeX-2;x++)
for (z=0;z<BufSizeBand;z++) {
addr=(y*BufSizeX+x)*BufSizeBand+z;
ans = (BYTE)(*(InBuf+addr))*16+
(BYTE)(*(InBuf+((y*BufSizeX+x+1)*BufSizeBand+z)))*(-2) +
(BYTE)(*(InBuf+((y*BufSizeX+x-1)*BufSizeBand+z)))*(-2) +
(BYTE)(*(InBuf+(((y+1)*BufSizeX+x)*BufSizeBand+z)))*(-2)+
(BYTE)(*(InBuf+(((y-1)*BufSizeX+x)*BufSizeBand+z)))*(-2)+
(BYTE)(*(InBuf+((y*BufSizeX+x+2)*BufSizeBand+z)))*(-1)+
(BYTE)(*(InBuf+((y*BufSizeX+x-2)*BufSizeBand+z)))*(-1)+
(BYTE)(*(InBuf+(((y+2)*BufSizeX+x)*BufSizeBand+z)))*(-1)+
(BYTE)(*(InBuf+(((y-2)*BufSizeX+x)*BufSizeBand+z)))*(-1)+
(BYTE)(*(InBuf+(((y+1)*BufSizeX+x+1)*BufSizeBand+z)))*(-1) +
(BYTE)(*(InBuf+(((y+1)*BufSizeX+x-1)*BufSizeBand+z)))*(-1)+
(BYTE)(*(InBuf+(((y-1)*BufSizeX+x+1)*BufSizeBand+z)))*(-1)+
(BYTE)(*(InBuf+(((y-1)*BufSizeX+x-1)*BufSizeBand+z)))*(-1);
*(OutBuf+addr)=abs(ans)/8;
}
Embedded and Parallel Systems Lab 44
Source image
Source image Out image
Convert Log Image
Embedded and Parallel Systems Lab 45
Sections Working Share
int main(int argc, char* argv[]) {
#pragma omp parallel sections
{
#pragma omp section
{
toPNG();
}
#pragma omp section
{
toJPG();
}
#pragma omp section
{
toTIF();
}
}
}
Input image
toPNG
toJPG
toTIF
Embedded and Parallel Systems Lab 46
OpenMP notice
int Fe[10];
Fe[0] = 0;
Fe[1] = 1;
#pragma omp parallel for num_threads(2)
for( i = 2; i < 10; ++ i )
Fe[i] = Fe[i-1] + Fe[i-2];
■Data dependent
#pragma omp parallel
{
#pragma omp for
for( int i = 0; i < 1000000; ++ i )
sum += i;
}
■Race conditions
Embedded and Parallel Systems Lab 47
OpenMP notice
■ DeadLock
#pragma omp parallel
private(me)
{
int me;
me = omp_get_thread_num ();
if (me == 0) goto Master;
#pragma omp barrier
Master:
#pragma omp single
write(*,*) ”done”
}
Embedded and Parallel Systems Lab 48
OpenMP example:matrix(1)
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#define RANDOM_SEED 2882 //random seed
#define VECTOR_SIZE 4 //sequare matrix width the same to height
#define MATRIX_SIZE (VECTOR_SIZE * VECTOR_SIZE) //total size of
MATRIX
int main(int argc, char *argv[]){
int i,j,k;
int node_id;
int *AA; //sequence use & check the d2mce right or fault
int *BB; //sequence use
int *CC; //sequence use
int computing;
int _vector_size = VECTOR_SIZE;
int _matrix_size = MATRIX_SIZE;
char c[10];
Embedded and Parallel Systems Lab 49
OpenMP example:matrix(2)
if(argc > 1){
for( i = 1 ; i < argc ;){
if(strcmp(argv[i],"-s") == 0){
_vector_size = atoi(argv[i+1]);
_matrix_size =_vector_size * _vector_size;
i+=2;
}
else{
printf("the argument only have:n");
printf("-s: the size of vector ex: -s 256n");
return 0;
}
}
}
AA =(int *)malloc(sizeof(int) * _matrix_size);
BB =(int *)malloc(sizeof(int) * _matrix_size);
CC =(int *)malloc(sizeof(int) * _matrix_size);
Embedded and Parallel Systems Lab 50
OpenMP example:matrix(3)
srand( RANDOM_SEED );
/* create matrix A and Matrix B */
for( i=0 ; i< _matrix_size ; i++){
AA[i] = rand()%10;
BB[i] = rand()%10;
}
/* computing C = A * B */
#pragma omp parallel for private(computing, j , k)
for( i=0 ; i < _vector_size ; i++){
for( j=0 ; j < _vector_size ; j++){
computing =0;
for( k=0 ; k < _vector_size ; k++)
computing += AA[ i*_vector_size + k ] * BB[
k*_vector_size + j ];
CC[ i*_vector_size + j ] = computing;
}
}
Embedded and Parallel Systems Lab 51
OpenMP example:matrix(4)
printf("nVector_size:%dn", _vector_size);
printf("Matrix_size:%dn", _matrix_size);
printf("Processing time:%fn", time);
return 0;
}
Embedded and Parallel Systems Lab 52
OpenMP Directive Table
Specifies that a variable is private to a thread.threadprivate
Lets you specify that a section of code should be executed on a single thread, not necessarily
the master thread.
single
Identifies code sections to be divided among all threads.sections
Defines a parallel region, which is code that will be executed by multiple threads in parallel.parallel
Specifies that code under a parallelized for loop should be executed like a sequential loop.ordered
Specifies that only the master threadshould execute a section of the program.master
Causes the work done in a for loop inside a parallel region to be divided among threads.for
Specifies that all threads have the same view of memory for all shared objects.flush
Specifies that code is only executed on one thread at a time.critical
Synchronizes all threads in a team; all threads pause at the barrier, until all threads execute the
barrier.
barrier
Specifies that a memory location that will be updated atomically.atomic
DescriptionDirective
Source :http://msdn2.microsoft.com/zh-tw/library/0ca2w8dk(VS.80).aspx
Embedded and Parallel Systems Lab 53
OpenMP Clause Table
Specifies that one or more variables should be shared among all threads.shared
Applies to the for directive. Have fourt method: static 、dynamic、guided、runtimeschedule
Specifies that one or more variables that are private to each thread are the subject of a reduction
operation at the end of the parallel region.
reduction
Specifies that each thread should have its own instance of a variable.private
Required on a parallel for statement if an ordered directive is to be used in the loop.ordered
Sets the number of threads in a thread team.num_threads
Overrides the barrier implicit in a directive.nowait
Specifies that the enclosing context's version of the variable is set equal to the private version of
whichever thread executes the final iteration (for-loop construct) or last section (#pragma sections).
lastprivate
Specifies whether a loop should be executed in parallel or in serial.if
Specifies that each thread should have its own instance of a variable, and that the variable should be
initialized with the value of the variable, because it exists before the parallel construct.
firstprivate
Specifies the behavior of unscoped variables in a parallel region.default
Specifies that one or more variables should be shared among all threads.copyprivate
Allows threads to access the master thread's value, for a threadprivate variable.copyin
DescriptionClause
Source :http://msdn2.microsoft.com/zh-tw/library/0ca2w8dk(VS.80).aspx
Embedded and Parallel Systems Lab 54
Reference
■ Michael J. Quinn, “Parallel Programming in C with MPI and OpenMP”
■ Introduction to Parallel Computing http://www.llnl.
gov/computing/tutorials/parallel_comp/
■ OpenMP standard http://www.openmp.org/drupal/
■ OpenMP MSDN tutorial http://msdn2.microsoft.com/en-us/library/tt15eb9t
(VS.80).aspx
■ OpenMP tutorial http://www.llnl.gov/computing/tutorials/openMP/#DO
■ Kang Su Gatlin , Pete Isensee, “Reap the Benefits of Multithreading without
All the Work” ,MSDN Magazine
Embedded and Parallel Systems Lab 55
MPI
Embedded and Parallel Systems Lab 56
MPI
■ MPI is a language-independent communications
protocol used to program parallel computers
■ 分散式記憶體(Distributed-Memory)
■ SPMD(Single Program Multiple Data )
■ Fortran , C / C++
Embedded and Parallel Systems Lab 57
MPI需求及支援環境
■ Cluster Environment
● Windows
◆ Microsoft AD (Active Directory) server
◆ Microsoft cluster server
● Linux
◆ NFS (Network FileSystem)
◆ NIS (Network Information Services)又稱 yellow pages
◆ SSH
◆ MPICH 2
Embedded and Parallel Systems Lab 58
MPI 安裝
http://www-unix.mcs.anl.gov/mpi/mpich/
下載mpich2-1.0.4p1.tar.gz
[shell]# tar –zxvf mpich2-1.0.4p1.tar.gz
[shell]# mkdir /home/yourhome/mpich2
[shell]# cd mpich2-1.0.4p1
[shell]# ./configure –prefix=/home/yourhome/mpich2 //建議自行建立目錄安
裝
[shell]# make
[shell]# make install
再來是
[shell]# cd ~yourhome //到自己home目錄下
[shell]# vi .mpd.conf //建立文件
內容為
secretword=<secretword> (secretword可以依自己喜好打)
Ex:
secretword=abcd1234
Embedded and Parallel Systems Lab 59
MPI 安裝
[shell]# chmod 600 mpd.conf
[shell]# vi .bash_profiles
將PATH=$PATH:$HOME/bin
改成PATH=$HOME/mpich2/bin:$PATH:$HOME/bin
重登server
[shell]# vi mpd.hosts //在自己home目錄下建立hosts list文件
ex:
cluster1
cluster2
cluster3
cluster4
Embedded and Parallel Systems Lab 60
MPI constructs
Embedded and Parallel Systems Lab 61
MPI程式基本架構
#include "mpi.h"
MPI_Init();
Do some work or MPI function
example:
MPI_Send() / MPI_Recv()
MPI_Finalize();
Embedded and Parallel Systems Lab 62
MPI Ethernet Control and Data Flow
Source : Douglas M. Pase, “Performance of Voltaire InfiniBand in IBM 64-Bit Commodity HPC Clusters,” IBM White
Papers, 2005
Embedded and Parallel Systems Lab 63
MPI Communicator
0
1
2
3
4
56
7
8
MPI_COMM_WORLD
Embedded and Parallel Systems Lab 64
MPI Function
int:如果執行成功回傳MPI_SUCCESS,0return value
int argc:參數數目
char* argv[]:參數內容
parameters
起始MPI執行環境,必須在所有MPI function前使用,並可以將main的指令參數
(argc, argv)傳送到所有process
功能
int MPI_Init( int *argc, char *argv[])function
int:如果執行成功回傳MPI_SUCCESS,0return value
parameters
結束MPI執行環境,在所有工作完成後必須呼叫功能
int MPI_Finzlize()function
Embedded and Parallel Systems Lab 65
MPI Function
int:如果執行成功回傳MPI_SUCCESS,0return value
comm:IN,MPI_COMM_WORLD
size:OUT,總計process數目
parameters
取得總共有多少process數在該communicator功能
int MPI_Comm_size( MPI_Comm comm, int *size)function
int:如果執行成功回傳MPI_SUCCESS,0return value
comm:IN,MPI_COMM_WORLD
rank:OUT,目前process ID
parameters
取得 process自己的process ID功能
int MPI_Comm_rank ( MPI_Comm comm, int *rank)function
Embedded and Parallel Systems Lab 66
MPI Function
int:如果執行成功回傳MPI_SUCCESS,0return value
buf:IN要傳送的資料(變數)
count:IN,傳送多少筆
datatype:IN,設定傳送的資料型態
dest:IN,目標Process ID
tag:IN,設定頻道
comm:IN,MPI_COMM_WORLD
parameters
傳資料到指定的Process,使用Standard模式功能
int MPI_Send(void* buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)function
int:如果執行成功回傳MPI_SUCCESS,0return value
buf:OUT,要接收的資料(變數)
count:IN,接收多少筆
datatype:IN,設定接收的資料型態
source:IN,接收的Process ID
tag:IN,設定頻道
comm:IN,MPI_COMM_WORLD
status:OUT,取得MPI_Status
parameters
接收來自指定的Process資料功能
int MPI_Recv(void* buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm,
MPI_Status *status)
function
Embedded and Parallel Systems Lab 67
MPI Function
■ Status:指出來源的process ID和傳送的tag,在C是使用MPI_Status的資料型態
typedef struct MPI_Status {
int count;
int cancelled;
int MPI_SOURCE; //來源ID
int MPI_TAG; //來源傳送的tag
int MPI_ERROR; //錯誤控制碼
} MPI_Status;
double:傳回時間return value
parameters
傳回一個時間(秒數,浮點數)代表目前時間,通常用來看程式執行的時間功能
double MPI_Wtime()function
Embedded and Parallel Systems Lab 68
MPI Function
int:如果執行成功回傳MPI_SUCCESS,0return value
datatype:INOUT,新的datatypeparameters
建立datatype功能
int MPI_Type_commit(MPI_Datatype *datatype);function
int:如果執行成功回傳MPI_SUCCESS,0return value
datatype:INOUT,需釋放的datatypeparameters
釋放datatype功能
MPI_Type_free(MPI_Datatype *datatype);function
Embedded and Parallel Systems Lab 69
MPI Function
int:如果執行成功回傳 MPI_SUCCESS,0return value
count:IN,新型態的大小(指有幾個oldtype組成)
oldtype:IN,舊有的資料型態(MPI_Datatype)
newtype:OUT,新的資料型態
parameters
將現有資料型態(MPI_Datatype),簡單的重新定大小,形成新的資料型態,就是指將數個
相同型態的資料整合成一個
功能
int MPI_Type_contiguous (int count, MPI_Datatype oldtype, MPI_Datatype *newtype)function
Embedded and Parallel Systems Lab 70
撰寫程式和執行的步驟
1. 啟動MPI環境
mpdboot -n 4 -f mpd.hosts //-n為啟動pc數量, mpd.hosts為pc清單
2. 撰寫MPI程式
vi hello.c
3. Compile
mpicc hello.c –o hello.o
 
4. 執行程式
mpiexec –n 4 ./hello.o//-n為process數量
5. 結束MPI
mpdallexit
Embedded and Parallel Systems Lab 71
MPI example : hello.c
#include "mpi.h"
#include <stdio.h>
#define SIZE 20
int main(int argc,char *argv[])
{
int numtasks, rank, dest, source, rc, count, tag=1;
char inmsg[SIZE];
char outmsg[SIZE];
double starttime, endtime;
MPI_Status Stat;
MPI_Datatype strtype;
MPI_Init(&argc,&argv); //起始MPI環境
MPI_Comm_rank(MPI_COMM_WORLD, &rank); //取得自己的process ID
MPI_Type_contiguous(SIZE, MPI_CHAR, &strtype); //設定新的資料型態string
MPI_Type_commit(&strtype); //建立新的資料型態string
starttune=MPI_Wtime(); //取得目前時間
Embedded and Parallel Systems Lab 72
MPI example : hello.c
if (rank == 0) {
dest = 1;
source = 1;
strcpy(outmsg,"Who are you?");
//傳送訊息到process 0
rc = MPI_Send(outmsg, 1, strtype, dest, tag, MPI_COMM_WORLD);
printf("process %d has sended message: %sn",rank, outmsg);
//接收來自process 1 的訊息
rc = MPI_Recv(inmsg, 1, strtype, source, tag, MPI_COMM_WORLD, &Stat);
printf("process %d has received: %sn",rank, inmsg);
}
else if (rank == 1) {
dest = 0;
source = 0;
strcpy(outmsg,"I am process 1");
rc = MPI_Recv(inmsg, 1, strtype, source, tag, MPI_COMM_WORLD, &Stat);
printf("process %d has received: %sn",rank, inmsg);
rc = MPI_Send(outmsg, 1 , strtype, dest, tag, MPI_COMM_WORLD);
printf("process %d has sended message: %sn",rank, outmsg);
}
Embedded and Parallel Systems Lab 73
MPI example : hello.c
endtime=MPI_Wtime(); // 取得結束時間
//使用MPI_CHAR來計算實際收到多少資料
rc = MPI_Get_count(&Stat, MPI_CHAR, &count);
printf("Task %d: Received %d char(s) from task %d with tag %d and use
time is %f n", rank, count, Stat.MPI_SOURCE, Stat.MPI_TAG,
endtime-starttime);
MPI_Type_free(&strtype); //釋放string資料型態
MPI_Finalize(); //結束MPI
}
process 0 has sended message: Who are you?
process 1 has received: Who are you?
process 1 has sended message: I am process 1
Task 1: Received 20 char(s) from task 0 with tag 1 and use time is 0.001302
process 0 has received: I am process 1
Task 0: Received 20 char(s) from task 1 with tag 1 and use time is 0.002133
Embedded and Parallel Systems Lab 74
openMP vs. MPI
No
No
Yes
Yes
No
Yes
MPI
Yes / NoYesreduction
YesYesbarrier
Yes / NoYesatomic
YesYescritical
YesYesshare data
YesYesprivate data
DSMopenMP
Embedded and Parallel Systems Lab 75
int:如果執行成功回傳MPI_SUCCESS,0return value
comm:IN,MPI_COMM_WORLDparameters
當程式執行到Barrier便會block,等待所有其他process也執行到Barrier,當所有
Group內的process均執行到Barrier便會取消block繼續往下執行
功能
int MPI_Barrier(MPI_Comm comm)function
■ Types of Collective Operations:
● Synchronization : processes wait until all members of the group have reached
the synchronization point.
● Data Movement : broadcast, scatter/gather, all to all.
● Collective Computation (reductions) : one member of the group collects data
from the other members and performs an operation (min, max, add, multiply,
etc.) on that data.
Embedded and Parallel Systems Lab 76
MPI_Bcast
int:如果執行成功回傳 MPI_SUCCESS,0return value
buffer:INOUT,傳送的訊息,也是接收訊息的 buff
count:IN,傳送多少個訊息
datatype:IN,傳送的資料型能
source(標準root):IN,負責傳送訊息的process
comm:IN,MPI_COMM_WORLD
parameters
將訊息廣播出去,讓所有人接收到相同的訊息功能
int MPI_Bcast(void* buffer, int count, MPI_Datatype datatype, int source(root), MPI_Comm
comm)
function
Embedded and Parallel Systems Lab 77
MPI_Gather
int:如果執行成功回傳MPI_SUCCESS,0return value
sendbuf:IN,傳送的訊息
sendcount:IN,傳送多少個
sendtype:IN,傳送的型態
recvbuf:OUT,接收訊息的buf
recvcount:IN,接收多少個
recvtype:IN,接收的型態
destine:IN,負責接收訊息的process
comm:IN,MPI_COMM_WORLD
parameters
將分散在各個process 所傳送的訊息,整合起來,然後傳送到指定的process接收功能
int MPI_Gather(void* sendbuf, int sendcount, MPI_Datatype sendtype, void*
recvbuf, int recvcount, MPI_Datatype recvtype, int destine, MPI_Comm comm)
function
Embedded and Parallel Systems Lab 78
MPI_Gather
Embedded and Parallel Systems Lab 79
MPI_Allgather
int:如果執行成功回傳MPI_SUCCESS,0return value
sendbuf:IN,傳送的訊息
sendcount:IN,傳送多少個
sendtype:IN,傳送的型態
recvbuf:OUT,接收訊息的buf
recvcount:IN,接收多少個
recvtype:IN,接收的型態
comm:IN,MPI_COMM_WORLD
parameters
將分散在各個process 所傳送的訊息,整合起來,然後廣播到所有process功能
int MPI_Allgather(void* sendbuf, int sendcount, MPI_Datatype sendtype, void*
recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm)
function
Embedded and Parallel Systems Lab 80
MPI_Allgather
Embedded and Parallel Systems Lab 81
MPI_Reduce
int:如果執行成功回傳MPI_SUCCESS,0return value
sendbuf:IN,傳送的訊息
recvbuf:OUT,接收訊息的buf
count:IN,傳送接收多少個
datatype:IN,傳送接收的資料型態
op:IN,想要做的動作
destine:IN,接收訊息的process ID
comm:IN,MPI_COMM_WORLD
parameters
在傳送時順便做一些Operation(ex:MPI_SUM做加總),然後將結果送到destine
process
功能
int MPI_Reduce(void* sendbuf, void* recvbuf, int count, MPI_Datatype datatype,
MPI_Op op, int destine, MPI_Comm comm)
function
Embedded and Parallel Systems Lab 82
MPI_Reduce
float, double and long doublemin value and locationMPI_MINLOC
float, double and long doublemax value and locationMPI_MAXLOC
integer, MPI_BYTEbit-wise XORMPI_BXOR
integerlogical XORMPI_LXOR
integer, MPI_BYTEbit-wise ORMPI_BOR
integerlogical ORMPI_LOR
integer, MPI_BYTEbit-wise ANDMPI_BAND
integerlogical ANDMPI_LAND
integer, floatproductMPI_PROD
integer, floatsumMPI_SUM
integer, floatminimumMPI_MIN
integer, floatmaximumMPI_MAX
C Data TypesMPI Reduction Operation
Embedded and Parallel Systems Lab 83
MPI example : matrix.c(1)
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#define RANDOM_SEED 2882 //random seed
#define MATRIX_SIZE 800 //sequare matrix width the same to height
#define NODES 4//this is numbers of nodes. minimum is 1. don't use < 1
#define TOTAL_SIZE (MATRIX_SIZE * MATRIX_SIZE)//total size of
MATRIX
#define CHECK
int main(int argc, char *argv[]){
int i,j,k;
int node_id;
int AA[MATRIX_SIZE][MATRIX_SIZE];
int BB[MATRIX_SIZE][MATRIX_SIZE];
int CC[MATRIX_SIZE][MATRIX_SIZE];
Embedded and Parallel Systems Lab 84
MPI example : matrix.c(2)
#ifdef CHECK
int _CC[MATRIX_SIZE][MATRIX_SIZE]; //sequence user, use to check
the parallel result CC
#endif
int check = 1;
int print = 0;
int computing = 0;
double time,seqtime;
int numtasks;
int tag=1;
int node_size;
MPI_Status stat;
MPI_Datatype rowtype;
srand( RANDOM_SEED );
Embedded and Parallel Systems Lab 85
MPI example : matrix.c(3)
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &node_id);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
if (numtasks != NODES){
printf("Must specify %d processors. Terminating.n", NODES);
MPI_Finalize();
return 0;
}
if (MATRIX_SIZE%NODES !=0){
printf("Must MATRIX_SIZE%NODES==0n", NODES);
MPI_Finalize();
return 0;
}
MPI_Type_contiguous(MATRIX_SIZE, MPI_FLOAT, &rowtype);
MPI_Type_commit(&rowtype);
Embedded and Parallel Systems Lab 86
MPI example : matrix.c(4)
/*create matrix A and Matrix B*/
if(node_id == 0){
for( i=0 ; i<MATRIX_SIZE ; i++){
for( j=0 ; j<MATRIX_SIZE ; j++){
AA[i][j] = rand()%10;
BB[i][j] = rand()%10;
}
}
}
/*send the matrix A and B to other node */
node_size = MATRIX_SIZE / NODES;
Embedded and Parallel Systems Lab 87
MPI example : matrix.c(5)
//send AA
if (node_id == 0)
for (i=1; i<NODES; i++)
MPI_Send(&AA[i*node_size][0], node_size, rowtype, i, tag,
MPI_COMM_WORLD);
else
MPI_Recv(&AA[node_id*node_size][0], node_size, rowtype, 0, tag,
MPI_COMM_WORLD, &stat);
//send BB
if (node_id == 0)
for (i=1; i<NODES; i++)
MPI_Send(&BB, MATRIX_SIZE, rowtype, i, tag,
MPI_COMM_WORLD);
else
MPI_Recv(&BB, MATRIX_SIZE, rowtype, 0, tag,
MPI_COMM_WORLD, &stat);
Embedded and Parallel Systems Lab 88
MPI example : matrix.c(6)
/*computing C = A * B*/
time = -MPI_Wtime();
for( i=node_id*node_size ; i<(node_id*node_size+node_size) ; i++){
for( j=0 ; j<MATRIX_SIZE ; j++){
computing = 0;
for( k=0 ; k<MATRIX_SIZE ; k++)
computing += AA[i][k] * BB[k][j];
CC[i][j] = computing;
}
}
MPI_Allgather(&CC[node_id*node_size][0], node_size, rowtype, &CC,
node_size, rowtype, MPI_COMM_WORLD);
time += MPI_Wtime();
Embedded and Parallel Systems Lab 89
MPI example : matrix.c(7)
#ifdef CHECK
seqtime = -MPI_Wtime();
if(node_id == 0){
for( i=0 ; i<MATRIX_SIZE ; i++){
for( j=0 ; j<MATRIX_SIZE ; j++){
computing = 0;
for( k=0 ; k<MATRIX_SIZE ; k++)
computing += AA[i][k] * BB[k][j];
_CC[i][j] = computing;
}
}
}
seqtime += MPI_Wtime();
Embedded and Parallel Systems Lab 90
/* check result */
if(node_id == 0){
for( i=0 ; i<MATRIX_SIZE; i++){
for( j=0 ; j<MATRIX_SIZE ; j++){
if( CC[i][j] != _CC[i][j]){
check = 0;
break;
}
}
}
}
Embedded and Parallel Systems Lab 91
MPI example : matrix.c(8)
/*print result */
#endif
if(node_id ==0){
printf("node_id=%dncheck=%snprocessing time:%fnn",node_id,
(check)?"success!":"failure!", time);
#ifdef CHECK
printf("sequent time:%fn", seqtime);
#endif
}
MPI_Type_free(&rowtype);
MPI_Finalize();
return 0;
}
Embedded and Parallel Systems Lab 92
Reference
■ Top 500 http://www.top500.org/
■ Maarten Van Steen, Andrew S. Tanenbaum, “Distributed Systems: Principles
and Paradigms ”
■ System Threads Reference http://www.unix.org/version2/whatsnew/threadsref.
html
■ Semaphone http://www.mkssoftware.com/docs/man3/sem_init.3.asp
■ Richard Stones. Neil Matthew, “Beginning Linux Programming”
■ W. Richard Stevens, “Networking APIs:Sockets and XTI“
■ William W.-Y. Liang , “Linux System Programming”
■ Michael J. Quinn, “Parallel Programming in C with MPI and OpenMP”
■ Introduction to Parallel Computing http://www.llnl.
gov/computing/tutorials/parallel_comp/
Embedded and Parallel Systems Lab 93
Reference
■ Michael J. Quinn, “Parallel Programming in C with MPI and OpenMP”
■ Introduction to Parallel Computing http://www.llnl.
gov/computing/tutorials/parallel_comp/
■ MPI standard http://www-unix.mcs.anl.gov/mpi/
■ MPI http://www.llnl.gov/computing/tutorials/mpi/
Embedded and Parallel Systems Lab 94
Conclusion
■ 如何想出好的平行演算法是非常困難的。
■ 開發工具及除錯工具普遍不足
■ 新一代的語言
● IBM的X10、Sun的Fortress、Cray的Chapel
◆ X10是以java1.4為基礎來擴充的語言
async(place.factory.place(1)){
for (int i=1 ; i<=10 ; i+=2 )
ans += i;
}
Embedded and Parallel Systems Lab 95
Reference
■ Top 500 http://www.top500.org/
■ Maarten Van Steen, Andrew S. Tanenbaum, “Distributed Systems: Principles
and Paradigms ”
■ System Threads Reference http://www.unix.org/version2/whatsnew/threadsref.
html
■ Semaphone http://www.mkssoftware.com/docs/man3/sem_init.3.asp
■ Richard Stones. Neil Matthew, “Beginning Linux Programming”
■ W. Richard Stevens, “Networking APIs:Sockets and XTI“
■ William W.-Y. Liang , “Linux System Programming”
■ Michael J. Quinn, “Parallel Programming in C with MPI and OpenMP”
■ Introduction to Parallel Computing http://www.llnl.
gov/computing/tutorials/parallel_comp/
Embedded and Parallel Systems Lab 96
Reference
■ MPI standard http://www-unix.mcs.anl.gov/mpi/
■ MPI http://www.llnl.gov/computing/tutorials/mpi/
■ OpenMP standard http://www.openmp.org/drupal/
■ OpenMP MSDN tutorial http://msdn2.microsoft.com/en-us/library/tt15eb9t
(VS.80).aspx
■ OpenMP tutorial http://www.llnl.gov/computing/tutorials/openMP/#DO
■ Kang Su Gatlin , Pete Isensee, “Reap the Benefits of Multithreading
without All the Work” ,MSDN Magazine
■ Gary Anthes “Languages for Supercomputing Get 'Suped' Up”,
Computerword March 12, 2007
■ IBM X10 research http://domino.research.ibm.
com/comm/research_projects.nsf/pages/x10.X10-presentations.html
Embedded and Parallel Systems Lab 97
The End
Thank you very much!
Embedded and Parallel Systems Lab 98
附錄
■ Pipeline
Embedded and Parallel Systems Lab 99
Pipeline

More Related Content

What's hot

Operating system memory management
Operating system memory managementOperating system memory management
Operating system memory managementrprajat007
 
Operating Systems: Process Scheduling
Operating Systems: Process SchedulingOperating Systems: Process Scheduling
Operating Systems: Process SchedulingDamian T. Gordon
 
operating system question bank
operating system question bankoperating system question bank
operating system question bankrajatdeep kaur
 
Ms dos boot process
Ms dos boot process Ms dos boot process
Ms dos boot process Zahra Sadeghi
 
Real-Time Scheduling
Real-Time SchedulingReal-Time Scheduling
Real-Time Schedulingsathish sak
 
Multilevel queue scheduling
Multilevel queue schedulingMultilevel queue scheduling
Multilevel queue schedulingAditiPawaskar5
 
8 memory management strategies
8 memory management strategies8 memory management strategies
8 memory management strategiesDr. Loganathan R
 
Process management in os
Process management in osProcess management in os
Process management in osMiong Lazaro
 
Cpu scheduling in operating System.
Cpu scheduling in operating System.Cpu scheduling in operating System.
Cpu scheduling in operating System.Ravi Kumar Patel
 
Operating System-Ch8 memory management
Operating System-Ch8 memory managementOperating System-Ch8 memory management
Operating System-Ch8 memory managementSyaiful Ahdan
 
Address Binding Scheme
Address Binding SchemeAddress Binding Scheme
Address Binding SchemeRajesh Piryani
 

What's hot (20)

Chapter 7 - Deadlocks
Chapter 7 - DeadlocksChapter 7 - Deadlocks
Chapter 7 - Deadlocks
 
Process synchronization
Process synchronizationProcess synchronization
Process synchronization
 
Distributed Operating System_1
Distributed Operating System_1Distributed Operating System_1
Distributed Operating System_1
 
Operating system memory management
Operating system memory managementOperating system memory management
Operating system memory management
 
Operating Systems: Process Scheduling
Operating Systems: Process SchedulingOperating Systems: Process Scheduling
Operating Systems: Process Scheduling
 
operating system question bank
operating system question bankoperating system question bank
operating system question bank
 
Ms dos boot process
Ms dos boot process Ms dos boot process
Ms dos boot process
 
Real-Time Scheduling
Real-Time SchedulingReal-Time Scheduling
Real-Time Scheduling
 
Process threads operating system.
Process threads operating system.Process threads operating system.
Process threads operating system.
 
Multilevel queue scheduling
Multilevel queue schedulingMultilevel queue scheduling
Multilevel queue scheduling
 
8 memory management strategies
8 memory management strategies8 memory management strategies
8 memory management strategies
 
Process management in os
Process management in osProcess management in os
Process management in os
 
Scheduling
SchedulingScheduling
Scheduling
 
Cpu scheduling in operating System.
Cpu scheduling in operating System.Cpu scheduling in operating System.
Cpu scheduling in operating System.
 
Operating System-Ch8 memory management
Operating System-Ch8 memory managementOperating System-Ch8 memory management
Operating System-Ch8 memory management
 
Address Binding Scheme
Address Binding SchemeAddress Binding Scheme
Address Binding Scheme
 
Ch1-Operating System Concepts
Ch1-Operating System ConceptsCh1-Operating System Concepts
Ch1-Operating System Concepts
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
CS6401 OPERATING SYSTEMS Unit 2
CS6401 OPERATING SYSTEMS Unit 2CS6401 OPERATING SYSTEMS Unit 2
CS6401 OPERATING SYSTEMS Unit 2
 
Deadlock
DeadlockDeadlock
Deadlock
 

Viewers also liked

提高 Code 品質心得
提高 Code 品質心得提高 Code 品質心得
提高 Code 品質心得ZongYing Lyu
 
Performance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memoryPerformance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memoryZongYing Lyu
 
以AWS Lambda與Amazon API Gateway打造無伺服器後端
以AWS Lambda與Amazon API Gateway打造無伺服器後端以AWS Lambda與Amazon API Gateway打造無伺服器後端
以AWS Lambda與Amazon API Gateway打造無伺服器後端Amazon Web Services
 
Developing for Windows 8 based devices
Developing for Windows 8 based devicesDeveloping for Windows 8 based devices
Developing for Windows 8 based devicesAneeb_Khawar
 
Lights in world
Lights in worldLights in world
Lights in worldAlka Sahni
 
Fit notes and work
Fit notes and workFit notes and work
Fit notes and workJane Coombs
 
Programme on Governance and Reforms in Cooperatives for UCB and Credit Societies
Programme on Governance and Reforms in Cooperatives for UCB and Credit SocietiesProgramme on Governance and Reforms in Cooperatives for UCB and Credit Societies
Programme on Governance and Reforms in Cooperatives for UCB and Credit Societiesvamnicom123
 
Experian lunchsessie 18 juli: Hoe zet ik mijn klantdata om in klantwaarde?
Experian lunchsessie 18 juli: Hoe zet ik mijn klantdata om in klantwaarde?Experian lunchsessie 18 juli: Hoe zet ik mijn klantdata om in klantwaarde?
Experian lunchsessie 18 juli: Hoe zet ik mijn klantdata om in klantwaarde?experiannederland
 
Tata cara perijinan pendakian g
Tata cara perijinan pendakian gTata cara perijinan pendakian g
Tata cara perijinan pendakian gUlfann
 
ApresentaMilenniumPrime
ApresentaMilenniumPrimeApresentaMilenniumPrime
ApresentaMilenniumPrimeAndre Santos
 
Experian Lunchsessie 1 augustus: Ik “like” ROI op mijn social budget!
Experian Lunchsessie 1 augustus: Ik “like” ROI op mijn social budget!Experian Lunchsessie 1 augustus: Ik “like” ROI op mijn social budget!
Experian Lunchsessie 1 augustus: Ik “like” ROI op mijn social budget!experiannederland
 
Programme on Strategic Management and Management of Change
Programme on Strategic Management and Management of ChangeProgramme on Strategic Management and Management of Change
Programme on Strategic Management and Management of Changevamnicom123
 
Health supervision policy for the workplace
Health supervision policy for the workplaceHealth supervision policy for the workplace
Health supervision policy for the workplaceJane Coombs
 

Viewers also liked (20)

提高 Code 品質心得
提高 Code 品質心得提高 Code 品質心得
提高 Code 品質心得
 
Performance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memoryPerformance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memory
 
Cvs
CvsCvs
Cvs
 
Vue.js
Vue.jsVue.js
Vue.js
 
以AWS Lambda與Amazon API Gateway打造無伺服器後端
以AWS Lambda與Amazon API Gateway打造無伺服器後端以AWS Lambda與Amazon API Gateway打造無伺服器後端
以AWS Lambda與Amazon API Gateway打造無伺服器後端
 
Developing for Windows 8 based devices
Developing for Windows 8 based devicesDeveloping for Windows 8 based devices
Developing for Windows 8 based devices
 
Lights in world
Lights in worldLights in world
Lights in world
 
Fit notes and work
Fit notes and workFit notes and work
Fit notes and work
 
Cs437 lecture 09
Cs437 lecture 09Cs437 lecture 09
Cs437 lecture 09
 
Programme on Governance and Reforms in Cooperatives for UCB and Credit Societies
Programme on Governance and Reforms in Cooperatives for UCB and Credit SocietiesProgramme on Governance and Reforms in Cooperatives for UCB and Credit Societies
Programme on Governance and Reforms in Cooperatives for UCB and Credit Societies
 
Experian lunchsessie 18 juli: Hoe zet ik mijn klantdata om in klantwaarde?
Experian lunchsessie 18 juli: Hoe zet ik mijn klantdata om in klantwaarde?Experian lunchsessie 18 juli: Hoe zet ik mijn klantdata om in klantwaarde?
Experian lunchsessie 18 juli: Hoe zet ik mijn klantdata om in klantwaarde?
 
Tata cara perijinan pendakian g
Tata cara perijinan pendakian gTata cara perijinan pendakian g
Tata cara perijinan pendakian g
 
ApresentaMilenniumPrime
ApresentaMilenniumPrimeApresentaMilenniumPrime
ApresentaMilenniumPrime
 
Ramya mmwt
Ramya mmwtRamya mmwt
Ramya mmwt
 
Pelota
PelotaPelota
Pelota
 
Psy final (1)
Psy final (1)Psy final (1)
Psy final (1)
 
Experian Lunchsessie 1 augustus: Ik “like” ROI op mijn social budget!
Experian Lunchsessie 1 augustus: Ik “like” ROI op mijn social budget!Experian Lunchsessie 1 augustus: Ik “like” ROI op mijn social budget!
Experian Lunchsessie 1 augustus: Ik “like” ROI op mijn social budget!
 
Forever Living Products… where ordinary people achieve extraordinary results
Forever Living Products… where ordinary people achieve extraordinary resultsForever Living Products… where ordinary people achieve extraordinary results
Forever Living Products… where ordinary people achieve extraordinary results
 
Programme on Strategic Management and Management of Change
Programme on Strategic Management and Management of ChangeProgramme on Strategic Management and Management of Change
Programme on Strategic Management and Management of Change
 
Health supervision policy for the workplace
Health supervision policy for the workplaceHealth supervision policy for the workplace
Health supervision policy for the workplace
 

Similar to Parallel program design

OpenHPI - Parallel Programming Concepts - Week 3
OpenHPI - Parallel Programming Concepts - Week 3OpenHPI - Parallel Programming Concepts - Week 3
OpenHPI - Parallel Programming Concepts - Week 3Peter Tröger
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with GpuRohit Khatana
 
Jvm profiling under the hood
Jvm profiling under the hoodJvm profiling under the hood
Jvm profiling under the hoodRichardWarburton
 
2016 NCTU P4 Workshop
2016 NCTU P4 Workshop2016 NCTU P4 Workshop
2016 NCTU P4 WorkshopYi Tseng
 
Create C++ Applications with the Persistent Memory Development Kit
Create C++ Applications with the Persistent Memory Development KitCreate C++ Applications with the Persistent Memory Development Kit
Create C++ Applications with the Persistent Memory Development KitIntel® Software
 
Robust C++ Task Systems Through Compile-time Checks
Robust C++ Task Systems Through Compile-time ChecksRobust C++ Task Systems Through Compile-time Checks
Robust C++ Task Systems Through Compile-time ChecksStoyan Nikolov
 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeDmitri Nesteruk
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudAndrea Righi
 
Chapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYA
Chapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYAChapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYA
Chapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYAMaulik Borsaniya
 
.Net Multithreading and Parallelization
.Net Multithreading and Parallelization.Net Multithreading and Parallelization
.Net Multithreading and ParallelizationDmitri Nesteruk
 
24-02-18 Rejender pratap.pdf
24-02-18 Rejender pratap.pdf24-02-18 Rejender pratap.pdf
24-02-18 Rejender pratap.pdfFrangoCamila
 
1032 cs208 g operation system ip camera case share.v0.2
1032 cs208 g operation system ip camera case share.v0.21032 cs208 g operation system ip camera case share.v0.2
1032 cs208 g operation system ip camera case share.v0.2Stanley Ho
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaFerdinand Jamitzky
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryoguest40fc7cd
 
Toub parallelism tour_oct2009
Toub parallelism tour_oct2009Toub parallelism tour_oct2009
Toub parallelism tour_oct2009nkaluva
 

Similar to Parallel program design (20)

OpenHPI - Parallel Programming Concepts - Week 3
OpenHPI - Parallel Programming Concepts - Week 3OpenHPI - Parallel Programming Concepts - Week 3
OpenHPI - Parallel Programming Concepts - Week 3
 
Threads
ThreadsThreads
Threads
 
Chap7 slides
Chap7 slidesChap7 slides
Chap7 slides
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with Gpu
 
Jvm profiling under the hood
Jvm profiling under the hoodJvm profiling under the hood
Jvm profiling under the hood
 
2016 NCTU P4 Workshop
2016 NCTU P4 Workshop2016 NCTU P4 Workshop
2016 NCTU P4 Workshop
 
Create C++ Applications with the Persistent Memory Development Kit
Create C++ Applications with the Persistent Memory Development KitCreate C++ Applications with the Persistent Memory Development Kit
Create C++ Applications with the Persistent Memory Development Kit
 
Robust C++ Task Systems Through Compile-time Checks
Robust C++ Task Systems Through Compile-time ChecksRobust C++ Task Systems Through Compile-time Checks
Robust C++ Task Systems Through Compile-time Checks
 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/Invoke
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloud
 
Chapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYA
Chapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYAChapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYA
Chapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYA
 
.Net Multithreading and Parallelization
.Net Multithreading and Parallelization.Net Multithreading and Parallelization
.Net Multithreading and Parallelization
 
24-02-18 Rejender pratap.pdf
24-02-18 Rejender pratap.pdf24-02-18 Rejender pratap.pdf
24-02-18 Rejender pratap.pdf
 
1032 cs208 g operation system ip camera case share.v0.2
1032 cs208 g operation system ip camera case share.v0.21032 cs208 g operation system ip camera case share.v0.2
1032 cs208 g operation system ip camera case share.v0.2
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cuda
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryo
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Balancing Power & Performance Webinar
Balancing Power & Performance WebinarBalancing Power & Performance Webinar
Balancing Power & Performance Webinar
 
GCF
GCFGCF
GCF
 
Toub parallelism tour_oct2009
Toub parallelism tour_oct2009Toub parallelism tour_oct2009
Toub parallelism tour_oct2009
 

More from ZongYing Lyu

Architecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory systemArchitecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory systemZongYing Lyu
 
A deep dive into energy efficient multi core processor
A deep dive into energy efficient multi core processorA deep dive into energy efficient multi core processor
A deep dive into energy efficient multi core processorZongYing Lyu
 
Libckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixZongYing Lyu
 
Device Driver - Chapter 6字元驅動程式的進階作業
Device Driver - Chapter 6字元驅動程式的進階作業Device Driver - Chapter 6字元驅動程式的進階作業
Device Driver - Chapter 6字元驅動程式的進階作業ZongYing Lyu
 
Device Driver - Chapter 3字元驅動程式
Device Driver - Chapter 3字元驅動程式Device Driver - Chapter 3字元驅動程式
Device Driver - Chapter 3字元驅動程式ZongYing Lyu
 
Web coding principle
Web coding principleWeb coding principle
Web coding principleZongYing Lyu
 
Consistency protocols
Consistency protocolsConsistency protocols
Consistency protocolsZongYing Lyu
 
Compiler optimization
Compiler optimizationCompiler optimization
Compiler optimizationZongYing Lyu
 
MPI use c language
MPI use c languageMPI use c language
MPI use c languageZongYing Lyu
 

More from ZongYing Lyu (12)

Architecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory systemArchitecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory system
 
A deep dive into energy efficient multi core processor
A deep dive into energy efficient multi core processorA deep dive into energy efficient multi core processor
A deep dive into energy efficient multi core processor
 
Libckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unix
 
Device Driver - Chapter 6字元驅動程式的進階作業
Device Driver - Chapter 6字元驅動程式的進階作業Device Driver - Chapter 6字元驅動程式的進階作業
Device Driver - Chapter 6字元驅動程式的進階作業
 
Device Driver - Chapter 3字元驅動程式
Device Driver - Chapter 3字元驅動程式Device Driver - Chapter 3字元驅動程式
Device Driver - Chapter 3字元驅動程式
 
Web coding principle
Web coding principleWeb coding principle
Web coding principle
 
SCRUM
SCRUMSCRUM
SCRUM
 
Consistency protocols
Consistency protocolsConsistency protocols
Consistency protocols
 
Compiler optimization
Compiler optimizationCompiler optimization
Compiler optimization
 
MPI use c language
MPI use c languageMPI use c language
MPI use c language
 
MPI
MPIMPI
MPI
 
OpenMP
OpenMPOpenMP
OpenMP
 

Recently uploaded

CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 

Recently uploaded (20)

CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 

Parallel program design

  • 2. Embedded and Parallel Systems Lab 2 Outline ● Introduction ● Parallel Algorithm Design ● Parallel keyword ● pthread ● OpenMP ● MPI ● Conclusion
  • 3. Embedded and Parallel Systems Lab 3 Introduction ■ Why Use Parallel Computing? ● Save time ● Solve larger problems ● Provide concurrency ● Cost savings ● Multi-core CPU掘起 ◆ Intel® Core™2 duo ◆ Intel® Core™2 Quad ◆ AMD Opteron ◆ AMD Phenom ◆ Xbox360 ◆ PS3
  • 4. Embedded and Parallel Systems Lab 4 Introduction ■ Parallel computing ● It is the use of a parallel computer to reduce the time needed to solve a single computational problem. ■ Parallel programming ● It is a language that allows you to explicitly indicate how different portions of the computation may be executed concurrently by different processors. ■ 將一個程式分成n個不同的部份,使之能夠同時執 行降低執行時間,其最後結果與原本程式相同
  • 5. Embedded and Parallel Systems Lab 5 Introduction Serial Source : http://www.llnl.gov/computing/tutorials/parallel_comp
  • 6. Embedded and Parallel Systems Lab 6 Introduction ■ Who’s Doing Parallel Computing
  • 7. Embedded and Parallel Systems Lab 7 Introduction ■ What are the using if for?
  • 8. Embedded and Parallel Systems Lab 8 Introduction ■ 常見的平行 ● 管線(Pipeline) ● fork ● 執行緒(Thread) ● 對稱式多處理機 (Symmetric MultiProcessors, SMP) ● 叢集運算(Cluster) ● 網格運算(Grid) ◆ SETI@Home ◆ Folding@Home (ps3)
  • 9. Embedded and Parallel Systems Lab 9 Introduction ■ 常見的平行程式 ■ 以記憶體來分 ● 分散式記憶體為主(distribute shared) ◆ 訊息傳遞(message passing)為主 ➢ PVM (Parallel Virtual Machine ) ➢ MPI (Message Passing Interface) ● 以共享記憶體為主(shared memory ) ◆ DSM (distribute shared memory) ◆ Fork ◆ thread ◆ OpenMP
  • 10. Embedded and Parallel Systems Lab 10 Introduction ■ Flynn's Classical Taxonomy M I M D Multiple Instruction, Multiple Data M I S D Multiple Instruction, Single Data S I M D Single Instruction, Multiple Data S I S D Single Instruction, Single Data
  • 11. Embedded and Parallel Systems Lab 11 Introduction SISD SIMD Source : http://www.llnl.gov/computing/tutorials/parallel_comp
  • 12. Embedded and Parallel Systems Lab 12 Introduction M I S D M I M D Source : http://www.llnl.gov/computing/tutorials/parallel_comp
  • 13. Embedded and Parallel Systems Lab 13 Introduction ■ Amdahl’s Law Best you could ever hope to do:
  • 14. Embedded and Parallel Systems Lab 14 Parallel Algorithm Design ■ Ian Foster ■ Four-step process for designing parallel algorithm 1. Partitioning 2. Communication 3. Agglomeration 4. Mapping ■ 平行化的大原則 ● Maximize processor utilization ● Minimize communication overhead ● Load balancing
  • 15. Embedded and Parallel Systems Lab 15 Parallel Algorithm Design ■ Partitioning ● Process of dividing the computation and the data into pieces. ● Domain decomposition ● Functional decomposition Problem
  • 16. Embedded and Parallel Systems Lab 16 Parallel Algorithm Design ■ Communication ● Local communication ● Global communication
  • 17. Embedded and Parallel Systems Lab 17 Parallel Algorithm Design ■ Agglomeration ● Increasing the locality (combining tasks that are connected by a channel eliminates) ● Combining sending and receiving task
  • 18. Embedded and Parallel Systems Lab 18 Parallel Algorithm Design ■ Mapping ● Process of assigning tasks to processor A C D B E G F H I A C D B E G F HI
  • 19. Embedded and Parallel Systems Lab 19 Foster’s parallel algorithm design Problem A C D B E G F H I A C D & FB E G H I A C B E G HI D & F Mapping Partitioning Communication Agglomeration
  • 20. Embedded and Parallel Systems Lab 20 Parallel Example : matrix A B C X = X =X X X = = = merge P1 P2 P3 P4
  • 21. Embedded and Parallel Systems Lab 21 Decision Tree Source : Michael J. Quinn, “Parallel Programming in C with MPI and OpenMP”
  • 22. Embedded and Parallel Systems Lab 22 Parallel keyword ■ private data ● 擁有獨立私有的資料,不受其他process所影響 ■ share data ● 資料為共享的,所有process均可得之,並會受其他process執行所影 響 ■ barrier ● 資料同步化使用, process執行至此會等待,直到所有process執行 均執行到此才會繼續執行 ■ reduction ● 將所有process所運算結果,合併起來(ex:sum , max , min) ■ atomic ● 使該記憶體位置為連動,意思為存取該記憶體位置時,不受其他 process所影響,避免相競現像(race conditions) ■ critical ● 臨界區域,使該區域執行時,同時只能有一個process執行,避免相 競現像(race conditions)
  • 23. Embedded and Parallel Systems Lab 23 Thread
  • 24. Embedded and Parallel Systems Lab 24 pthread ■ What is a Thread? ● A thread is a logical flow that runs in the context of a process. ● Multiply threads can running concurrently in a single process. ● Each thread has its own thread context ◆ a unique integer thread ID (TID) ◆ stack ◆ stack pointer ◆ program counter ◆ general-purpose registers ◆ condition codes Source : William W.-Y. Liang , “Linux System Programming”
  • 25. Embedded and Parallel Systems Lab 25 Thread v.s. Processes ■ Process: ● When a process executes a fork call, a new copy of the process is created with its own variables and its own PID. ● This new process is scheduled independently, and (in general) executes almost independently of the process that created is. ■ Thread: ● When we create a new thread in a process, the new thread of execution gets its own stack (and hence local variables) but shares global variables, file descriptors, signal handlers, and its current directory state with the process that created it. Source : William W.-Y. Liang , “Linux System Programming”
  • 26. Embedded and Parallel Systems Lab 26 pthread Function If ok return 0 If error return error number (>=0) Return value tid:ID for the created thread att::thread attribute object, if NULL 為default attribute func:thread function arg:argument for the thread Parameters Create a new thread of execution功能 int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void * (*func)(void*), void *arg) Function Function int pthread_join(pthread_t tid, void **thread_return) 功能 Blocks the calling thread until the specified thread terninates Parameters tid:ID for the created thread thread_return:buffer for the returned value Return value If ok return 0 If error return error number (>=0)
  • 27. Embedded and Parallel Systems Lab 27 pthread Function noneReturn value retval :Thread return value. If not NULL,retval = thread_return (pthread_join) Parameters Terminates the calling thread功能 void pthread_exit(void * retval)Function Function pthread_t pthread_self(void); 功能 Return current thread ID Parameters none Return value Thread ID (unsigned long int)
  • 28. Embedded and Parallel Systems Lab 28 Example: thread.c #include <stdio.h> #include <pthread.h> char message[]="Example:create new thread"; void *thread_function(void *arg){ pthread_t tid = pthread_self(); printf("thread_function is runningn"); printf("new ID:%u Argument is %sn", tid, (char*)arg); pthread_exit("new thread endn"); } int main(void){ pthread_t new_thread; pthread_t master_thread = pthread_self(); void *thread_result; pthread_create(&new_thread, NULL, thread_function, (void*)message); pthread_join(new_thread, &thread_result); printf("nmaster ID:%u the new thread return valus is:%sn", master_thread,(char*)thread_result); return 0; }
  • 29. Embedded and Parallel Systems Lab 29 pthread Attribute If ok return 0 If error return error number (>=0) Return value attr: thread attribute objectParameters Initialize a thread attributes object.功能 int pthread_attr_init (pthread_attr_t *attr);Function Function int pthread_attr_destroy(pthread_attr_t *attr) 功能 Destory a thread attributes object. Parameters attr: thread attribute object Return value If ok return 0 If error return error number (>=0)
  • 30. Embedded and Parallel Systems Lab 30 pthread Attribute Thread’s stack sizestacksize Thread’s stack addressstackaddr (PAGESIZE bytes)Thread’s guard sizeguardsize PTHREAD_INHERIT_SCHED:thread attribute從建立者繼承 PTHREAD_EXPLICIT_SCHED :thread屬性由thread attribute (pthread_attr_t)來決定 Thread’s scheduling inhertienceinheritsched Argument (blue is default)FunctionAttribute Threads’ scheduling parametersschedparam SCHED_FIFO:first in first out SCHED_RR:round robin SCHED_OTHER:沒有優先權 Thread’s scheduling policyschedpolicy PTHREAD_CREATE_DETACHED:當thread結束時,會將所有資源 都釋放掉 PTHREAD_CREATE_JOINABLE:當thread結束時,它的thread ID 和結束狀態會保留,直到行程中的有 thread去對它呼叫pthread_join Threads’ detach state.detachstate PTHREAD_SCOPE_SYSTEM、PTHREAD_SCOPE_PROCESS, But linux only have PTHREAD_SCOPE_SYSTEM Thread’s scope.scope
  • 31. Embedded and Parallel Systems Lab 31 Get pthread Attribute ■ int pthread_attr_getdetachstate(const pthread_attr_t *attr, int *detachstate); ■ int pthread_attr_getguardsize(const pthread_attr_t *attr, size_t *guardsize); ■ int pthread_attr_getinheritsched(const pthread_attr_t *attr, int *inheritsched); ■ int pthread_attr_getschedparam(const pthread_attr_t *attr, struct sched_param *param); ■ int pthread_attr_getschedpolicy(const pthread_attr_t *attr, int *policy); ■ int pthread_attr_getscope(const pthread_attr_t *attr, int *scope); ■ int pthread_attr_getstackaddr(const pthread_attr_t *attr, void **stackaddr); ■ int pthread_attr_getstacksize(const pthread_attr_t *attr, size_t *stacksize);
  • 32. Embedded and Parallel Systems Lab 32 Set pthread Attribute ■ int pthread_attr_setdetachstate(pthread_attr_t *attr, int detachstate); ■ int pthread_attr_setguardsize(pthread_attr_t *attr, size_t guardsize); ■ int pthread_attr_setinheritsched(pthread_attr_t *attr, int inheritsched); ■ int pthread_attr_setschedparam(pthread_attr_t *attr, const struct sched_param *param); ■ int pthread_attr_setschedpolicy(pthread_attr_t *attr, int policy); ■ int pthread_attr_setscope(pthread_attr_t *attr, int scope); ■ int pthread_attr_setstackaddr(pthread_attr_t *attr, void *stackaddr); ■ int pthread_attr_setstacksize(pthread_attr_t *attr, size_t stacksize);
  • 33. Embedded and Parallel Systems Lab 33 OpenMP Directive Table Specifies that a variable is private to a thread.threadprivate Lets you specify that a section of code should be executed on a single thread, not necessarily the master thread. single Identifies code sections to be divided among all threads.sections Defines a parallel region, which is code that will be executed by multiple threads in parallel.parallel Specifies that code under a parallelized for loop should be executed like a sequential loop.ordered Specifies that only the master threadshould execute a section of the program.master Causes the work done in a for loop inside a parallel region to be divided among threads.for Specifies that all threads have the same view of memory for all shared objects.flush Specifies that code is only executed on one thread at a time.critical Synchronizes all threads in a team; all threads pause at the barrier, until all threads execute the barrier. barrier Specifies that a memory location that will be updated atomically.atomic DescriptionDirective Source :http://msdn2.microsoft.com/zh-tw/library/0ca2w8dk(VS.80).aspx
  • 34. Embedded and Parallel Systems Lab 34 OpenMP Clause Table Specifies that one or more variables should be shared among all threads.shared Applies to the for directive. Have fourt method: static 、dynamic、guided、runtimeschedule Specifies that one or more variables that are private to each thread are the subject of a reduction operation at the end of the parallel region. reduction Specifies that each thread should have its own instance of a variable.private Required on a parallel for statement if an ordered directive is to be used in the loop.ordered Sets the number of threads in a thread team.num_threads Overrides the barrier implicit in a directive.nowait Specifies that the enclosing context's version of the variable is set equal to the private version of whichever thread executes the final iteration (for-loop construct) or last section (#pragma sections). lastprivate Specifies whether a loop should be executed in parallel or in serial.if Specifies that each thread should have its own instance of a variable, and that the variable should be initialized with the value of the variable, because it exists before the parallel construct. firstprivate Specifies the behavior of unscoped variables in a parallel region.default Specifies that one or more variables should be shared among all threads.copyprivate Allows threads to access the master thread's value, for a threadprivate variable.copyin DescriptionClause Source :http://msdn2.microsoft.com/zh-tw/library/0ca2w8dk(VS.80).aspx
  • 35. Embedded and Parallel Systems Lab 35 Reference ■ System Threads Reference http://www.unix.org/version2/whatsnew/threadsref. html ■ Semaphone http://www.mkssoftware.com/docs/man3/sem_init.3.asp ■ Richard Stones. Neil Matthew, “Beginning Linux Programming” ■ William W.-Y. Liang , “Linux System Programming”
  • 36. Embedded and Parallel Systems Lab 36 OpenMP
  • 37. Embedded and Parallel Systems Lab 37 OpenMP ■ OpenMP 2.5 ■ Multi-threaded & Share memory ■ Fortran、C / C++ ■ 基本語法 ● #pragma omp directive [clause] ■ OpenMP 需求及支援環境 ● Windows ◆ Virtual studio 2005 standard ◆ Intel ® C++ Compiler 9.1 ● Linux ◆ gcc 4.2.0 ◆ Omni ● Xbox 360 & PS3
  • 38. Embedded and Parallel Systems Lab 38 ■ 於程式最前面#include <omp.h> ■ Virtual studio 2005 standard ● 專案/專案屬性/組態屬性/c/c++/語言 ◆ 將OpenMP支援改為yes
  • 39. Embedded and Parallel Systems Lab 39 OpenMP Constructs
  • 40. Embedded and Parallel Systems Lab 40 Types of Work-Sharing Constructs ■ Loop:shares iterations of a loop across the team. Represents a type of "data parallelism". Source : http://www.llnl.gov/computing/tutorials/openMP/ ■ Sections:breaks work into separate, discrete sections. Each section is executed by a thread. Can be used to implement a type of "functional parallelism".
  • 41. Embedded and Parallel Systems Lab 41 Types of Work-Sharing Constructs ■ single:將程式於一個執行緒執行(於一個子執行緒執行,但不會在 master thread執行) Source : http://www.llnl.gov/computing/tutorials/openMP/
  • 42. Embedded and Parallel Systems Lab 42 Loop working sharing #pragma omp parallel for for( int i , i <10000, i++) for( int j , j <100 , j++) function(i); #pragma omp parallel {大括號必須斷行,不能接於parallel後 #pragma omp for for( int i , i <10000, i++) for( int j , j <100 , j++) function(i); } = parallel for只能使用迴圈的index 為 int 型態,且執行次數是可預知的 Thread 0 (Master) for( i = 0 , i <5000, i++) for( int j , j <100 , j++) function(i); Thread 1 for( i = 5000 , i <10000, i++) for( int j , j <100 , j++) function(i); 於雙執行緒的cpu執行時情形
  • 43. Embedded and Parallel Systems Lab 43 OpenMP example : log.cpp #include <omp.h> #pragma omp parallel for num_threads(2) //將for迴圈平均分給2個threads for (y=2;y<BufSizeY-2;y++) for (x=2;x<BufSizeX-2;x++) for (z=0;z<BufSizeBand;z++) { addr=(y*BufSizeX+x)*BufSizeBand+z; ans = (BYTE)(*(InBuf+addr))*16+ (BYTE)(*(InBuf+((y*BufSizeX+x+1)*BufSizeBand+z)))*(-2) + (BYTE)(*(InBuf+((y*BufSizeX+x-1)*BufSizeBand+z)))*(-2) + (BYTE)(*(InBuf+(((y+1)*BufSizeX+x)*BufSizeBand+z)))*(-2)+ (BYTE)(*(InBuf+(((y-1)*BufSizeX+x)*BufSizeBand+z)))*(-2)+ (BYTE)(*(InBuf+((y*BufSizeX+x+2)*BufSizeBand+z)))*(-1)+ (BYTE)(*(InBuf+((y*BufSizeX+x-2)*BufSizeBand+z)))*(-1)+ (BYTE)(*(InBuf+(((y+2)*BufSizeX+x)*BufSizeBand+z)))*(-1)+ (BYTE)(*(InBuf+(((y-2)*BufSizeX+x)*BufSizeBand+z)))*(-1)+ (BYTE)(*(InBuf+(((y+1)*BufSizeX+x+1)*BufSizeBand+z)))*(-1) + (BYTE)(*(InBuf+(((y+1)*BufSizeX+x-1)*BufSizeBand+z)))*(-1)+ (BYTE)(*(InBuf+(((y-1)*BufSizeX+x+1)*BufSizeBand+z)))*(-1)+ (BYTE)(*(InBuf+(((y-1)*BufSizeX+x-1)*BufSizeBand+z)))*(-1); *(OutBuf+addr)=abs(ans)/8; }
  • 44. Embedded and Parallel Systems Lab 44 Source image Source image Out image Convert Log Image
  • 45. Embedded and Parallel Systems Lab 45 Sections Working Share int main(int argc, char* argv[]) { #pragma omp parallel sections { #pragma omp section { toPNG(); } #pragma omp section { toJPG(); } #pragma omp section { toTIF(); } } } Input image toPNG toJPG toTIF
  • 46. Embedded and Parallel Systems Lab 46 OpenMP notice int Fe[10]; Fe[0] = 0; Fe[1] = 1; #pragma omp parallel for num_threads(2) for( i = 2; i < 10; ++ i ) Fe[i] = Fe[i-1] + Fe[i-2]; ■Data dependent #pragma omp parallel { #pragma omp for for( int i = 0; i < 1000000; ++ i ) sum += i; } ■Race conditions
  • 47. Embedded and Parallel Systems Lab 47 OpenMP notice ■ DeadLock #pragma omp parallel private(me) { int me; me = omp_get_thread_num (); if (me == 0) goto Master; #pragma omp barrier Master: #pragma omp single write(*,*) ”done” }
  • 48. Embedded and Parallel Systems Lab 48 OpenMP example:matrix(1) #include <omp.h> #include <stdio.h> #include <stdlib.h> #define RANDOM_SEED 2882 //random seed #define VECTOR_SIZE 4 //sequare matrix width the same to height #define MATRIX_SIZE (VECTOR_SIZE * VECTOR_SIZE) //total size of MATRIX int main(int argc, char *argv[]){ int i,j,k; int node_id; int *AA; //sequence use & check the d2mce right or fault int *BB; //sequence use int *CC; //sequence use int computing; int _vector_size = VECTOR_SIZE; int _matrix_size = MATRIX_SIZE; char c[10];
  • 49. Embedded and Parallel Systems Lab 49 OpenMP example:matrix(2) if(argc > 1){ for( i = 1 ; i < argc ;){ if(strcmp(argv[i],"-s") == 0){ _vector_size = atoi(argv[i+1]); _matrix_size =_vector_size * _vector_size; i+=2; } else{ printf("the argument only have:n"); printf("-s: the size of vector ex: -s 256n"); return 0; } } } AA =(int *)malloc(sizeof(int) * _matrix_size); BB =(int *)malloc(sizeof(int) * _matrix_size); CC =(int *)malloc(sizeof(int) * _matrix_size);
  • 50. Embedded and Parallel Systems Lab 50 OpenMP example:matrix(3) srand( RANDOM_SEED ); /* create matrix A and Matrix B */ for( i=0 ; i< _matrix_size ; i++){ AA[i] = rand()%10; BB[i] = rand()%10; } /* computing C = A * B */ #pragma omp parallel for private(computing, j , k) for( i=0 ; i < _vector_size ; i++){ for( j=0 ; j < _vector_size ; j++){ computing =0; for( k=0 ; k < _vector_size ; k++) computing += AA[ i*_vector_size + k ] * BB[ k*_vector_size + j ]; CC[ i*_vector_size + j ] = computing; } }
  • 51. Embedded and Parallel Systems Lab 51 OpenMP example:matrix(4) printf("nVector_size:%dn", _vector_size); printf("Matrix_size:%dn", _matrix_size); printf("Processing time:%fn", time); return 0; }
  • 52. Embedded and Parallel Systems Lab 52 OpenMP Directive Table Specifies that a variable is private to a thread.threadprivate Lets you specify that a section of code should be executed on a single thread, not necessarily the master thread. single Identifies code sections to be divided among all threads.sections Defines a parallel region, which is code that will be executed by multiple threads in parallel.parallel Specifies that code under a parallelized for loop should be executed like a sequential loop.ordered Specifies that only the master threadshould execute a section of the program.master Causes the work done in a for loop inside a parallel region to be divided among threads.for Specifies that all threads have the same view of memory for all shared objects.flush Specifies that code is only executed on one thread at a time.critical Synchronizes all threads in a team; all threads pause at the barrier, until all threads execute the barrier. barrier Specifies that a memory location that will be updated atomically.atomic DescriptionDirective Source :http://msdn2.microsoft.com/zh-tw/library/0ca2w8dk(VS.80).aspx
  • 53. Embedded and Parallel Systems Lab 53 OpenMP Clause Table Specifies that one or more variables should be shared among all threads.shared Applies to the for directive. Have fourt method: static 、dynamic、guided、runtimeschedule Specifies that one or more variables that are private to each thread are the subject of a reduction operation at the end of the parallel region. reduction Specifies that each thread should have its own instance of a variable.private Required on a parallel for statement if an ordered directive is to be used in the loop.ordered Sets the number of threads in a thread team.num_threads Overrides the barrier implicit in a directive.nowait Specifies that the enclosing context's version of the variable is set equal to the private version of whichever thread executes the final iteration (for-loop construct) or last section (#pragma sections). lastprivate Specifies whether a loop should be executed in parallel or in serial.if Specifies that each thread should have its own instance of a variable, and that the variable should be initialized with the value of the variable, because it exists before the parallel construct. firstprivate Specifies the behavior of unscoped variables in a parallel region.default Specifies that one or more variables should be shared among all threads.copyprivate Allows threads to access the master thread's value, for a threadprivate variable.copyin DescriptionClause Source :http://msdn2.microsoft.com/zh-tw/library/0ca2w8dk(VS.80).aspx
  • 54. Embedded and Parallel Systems Lab 54 Reference ■ Michael J. Quinn, “Parallel Programming in C with MPI and OpenMP” ■ Introduction to Parallel Computing http://www.llnl. gov/computing/tutorials/parallel_comp/ ■ OpenMP standard http://www.openmp.org/drupal/ ■ OpenMP MSDN tutorial http://msdn2.microsoft.com/en-us/library/tt15eb9t (VS.80).aspx ■ OpenMP tutorial http://www.llnl.gov/computing/tutorials/openMP/#DO ■ Kang Su Gatlin , Pete Isensee, “Reap the Benefits of Multithreading without All the Work” ,MSDN Magazine
  • 55. Embedded and Parallel Systems Lab 55 MPI
  • 56. Embedded and Parallel Systems Lab 56 MPI ■ MPI is a language-independent communications protocol used to program parallel computers ■ 分散式記憶體(Distributed-Memory) ■ SPMD(Single Program Multiple Data ) ■ Fortran , C / C++
  • 57. Embedded and Parallel Systems Lab 57 MPI需求及支援環境 ■ Cluster Environment ● Windows ◆ Microsoft AD (Active Directory) server ◆ Microsoft cluster server ● Linux ◆ NFS (Network FileSystem) ◆ NIS (Network Information Services)又稱 yellow pages ◆ SSH ◆ MPICH 2
  • 58. Embedded and Parallel Systems Lab 58 MPI 安裝 http://www-unix.mcs.anl.gov/mpi/mpich/ 下載mpich2-1.0.4p1.tar.gz [shell]# tar –zxvf mpich2-1.0.4p1.tar.gz [shell]# mkdir /home/yourhome/mpich2 [shell]# cd mpich2-1.0.4p1 [shell]# ./configure –prefix=/home/yourhome/mpich2 //建議自行建立目錄安 裝 [shell]# make [shell]# make install 再來是 [shell]# cd ~yourhome //到自己home目錄下 [shell]# vi .mpd.conf //建立文件 內容為 secretword=<secretword> (secretword可以依自己喜好打) Ex: secretword=abcd1234
  • 59. Embedded and Parallel Systems Lab 59 MPI 安裝 [shell]# chmod 600 mpd.conf [shell]# vi .bash_profiles 將PATH=$PATH:$HOME/bin 改成PATH=$HOME/mpich2/bin:$PATH:$HOME/bin 重登server [shell]# vi mpd.hosts //在自己home目錄下建立hosts list文件 ex: cluster1 cluster2 cluster3 cluster4
  • 60. Embedded and Parallel Systems Lab 60 MPI constructs
  • 61. Embedded and Parallel Systems Lab 61 MPI程式基本架構 #include "mpi.h" MPI_Init(); Do some work or MPI function example: MPI_Send() / MPI_Recv() MPI_Finalize();
  • 62. Embedded and Parallel Systems Lab 62 MPI Ethernet Control and Data Flow Source : Douglas M. Pase, “Performance of Voltaire InfiniBand in IBM 64-Bit Commodity HPC Clusters,” IBM White Papers, 2005
  • 63. Embedded and Parallel Systems Lab 63 MPI Communicator 0 1 2 3 4 56 7 8 MPI_COMM_WORLD
  • 64. Embedded and Parallel Systems Lab 64 MPI Function int:如果執行成功回傳MPI_SUCCESS,0return value int argc:參數數目 char* argv[]:參數內容 parameters 起始MPI執行環境,必須在所有MPI function前使用,並可以將main的指令參數 (argc, argv)傳送到所有process 功能 int MPI_Init( int *argc, char *argv[])function int:如果執行成功回傳MPI_SUCCESS,0return value parameters 結束MPI執行環境,在所有工作完成後必須呼叫功能 int MPI_Finzlize()function
  • 65. Embedded and Parallel Systems Lab 65 MPI Function int:如果執行成功回傳MPI_SUCCESS,0return value comm:IN,MPI_COMM_WORLD size:OUT,總計process數目 parameters 取得總共有多少process數在該communicator功能 int MPI_Comm_size( MPI_Comm comm, int *size)function int:如果執行成功回傳MPI_SUCCESS,0return value comm:IN,MPI_COMM_WORLD rank:OUT,目前process ID parameters 取得 process自己的process ID功能 int MPI_Comm_rank ( MPI_Comm comm, int *rank)function
  • 66. Embedded and Parallel Systems Lab 66 MPI Function int:如果執行成功回傳MPI_SUCCESS,0return value buf:IN要傳送的資料(變數) count:IN,傳送多少筆 datatype:IN,設定傳送的資料型態 dest:IN,目標Process ID tag:IN,設定頻道 comm:IN,MPI_COMM_WORLD parameters 傳資料到指定的Process,使用Standard模式功能 int MPI_Send(void* buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)function int:如果執行成功回傳MPI_SUCCESS,0return value buf:OUT,要接收的資料(變數) count:IN,接收多少筆 datatype:IN,設定接收的資料型態 source:IN,接收的Process ID tag:IN,設定頻道 comm:IN,MPI_COMM_WORLD status:OUT,取得MPI_Status parameters 接收來自指定的Process資料功能 int MPI_Recv(void* buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status) function
  • 67. Embedded and Parallel Systems Lab 67 MPI Function ■ Status:指出來源的process ID和傳送的tag,在C是使用MPI_Status的資料型態 typedef struct MPI_Status { int count; int cancelled; int MPI_SOURCE; //來源ID int MPI_TAG; //來源傳送的tag int MPI_ERROR; //錯誤控制碼 } MPI_Status; double:傳回時間return value parameters 傳回一個時間(秒數,浮點數)代表目前時間,通常用來看程式執行的時間功能 double MPI_Wtime()function
  • 68. Embedded and Parallel Systems Lab 68 MPI Function int:如果執行成功回傳MPI_SUCCESS,0return value datatype:INOUT,新的datatypeparameters 建立datatype功能 int MPI_Type_commit(MPI_Datatype *datatype);function int:如果執行成功回傳MPI_SUCCESS,0return value datatype:INOUT,需釋放的datatypeparameters 釋放datatype功能 MPI_Type_free(MPI_Datatype *datatype);function
  • 69. Embedded and Parallel Systems Lab 69 MPI Function int:如果執行成功回傳 MPI_SUCCESS,0return value count:IN,新型態的大小(指有幾個oldtype組成) oldtype:IN,舊有的資料型態(MPI_Datatype) newtype:OUT,新的資料型態 parameters 將現有資料型態(MPI_Datatype),簡單的重新定大小,形成新的資料型態,就是指將數個 相同型態的資料整合成一個 功能 int MPI_Type_contiguous (int count, MPI_Datatype oldtype, MPI_Datatype *newtype)function
  • 70. Embedded and Parallel Systems Lab 70 撰寫程式和執行的步驟 1. 啟動MPI環境 mpdboot -n 4 -f mpd.hosts //-n為啟動pc數量, mpd.hosts為pc清單 2. 撰寫MPI程式 vi hello.c 3. Compile mpicc hello.c –o hello.o   4. 執行程式 mpiexec –n 4 ./hello.o//-n為process數量 5. 結束MPI mpdallexit
  • 71. Embedded and Parallel Systems Lab 71 MPI example : hello.c #include "mpi.h" #include <stdio.h> #define SIZE 20 int main(int argc,char *argv[]) { int numtasks, rank, dest, source, rc, count, tag=1; char inmsg[SIZE]; char outmsg[SIZE]; double starttime, endtime; MPI_Status Stat; MPI_Datatype strtype; MPI_Init(&argc,&argv); //起始MPI環境 MPI_Comm_rank(MPI_COMM_WORLD, &rank); //取得自己的process ID MPI_Type_contiguous(SIZE, MPI_CHAR, &strtype); //設定新的資料型態string MPI_Type_commit(&strtype); //建立新的資料型態string starttune=MPI_Wtime(); //取得目前時間
  • 72. Embedded and Parallel Systems Lab 72 MPI example : hello.c if (rank == 0) { dest = 1; source = 1; strcpy(outmsg,"Who are you?"); //傳送訊息到process 0 rc = MPI_Send(outmsg, 1, strtype, dest, tag, MPI_COMM_WORLD); printf("process %d has sended message: %sn",rank, outmsg); //接收來自process 1 的訊息 rc = MPI_Recv(inmsg, 1, strtype, source, tag, MPI_COMM_WORLD, &Stat); printf("process %d has received: %sn",rank, inmsg); } else if (rank == 1) { dest = 0; source = 0; strcpy(outmsg,"I am process 1"); rc = MPI_Recv(inmsg, 1, strtype, source, tag, MPI_COMM_WORLD, &Stat); printf("process %d has received: %sn",rank, inmsg); rc = MPI_Send(outmsg, 1 , strtype, dest, tag, MPI_COMM_WORLD); printf("process %d has sended message: %sn",rank, outmsg); }
  • 73. Embedded and Parallel Systems Lab 73 MPI example : hello.c endtime=MPI_Wtime(); // 取得結束時間 //使用MPI_CHAR來計算實際收到多少資料 rc = MPI_Get_count(&Stat, MPI_CHAR, &count); printf("Task %d: Received %d char(s) from task %d with tag %d and use time is %f n", rank, count, Stat.MPI_SOURCE, Stat.MPI_TAG, endtime-starttime); MPI_Type_free(&strtype); //釋放string資料型態 MPI_Finalize(); //結束MPI } process 0 has sended message: Who are you? process 1 has received: Who are you? process 1 has sended message: I am process 1 Task 1: Received 20 char(s) from task 0 with tag 1 and use time is 0.001302 process 0 has received: I am process 1 Task 0: Received 20 char(s) from task 1 with tag 1 and use time is 0.002133
  • 74. Embedded and Parallel Systems Lab 74 openMP vs. MPI No No Yes Yes No Yes MPI Yes / NoYesreduction YesYesbarrier Yes / NoYesatomic YesYescritical YesYesshare data YesYesprivate data DSMopenMP
  • 75. Embedded and Parallel Systems Lab 75 int:如果執行成功回傳MPI_SUCCESS,0return value comm:IN,MPI_COMM_WORLDparameters 當程式執行到Barrier便會block,等待所有其他process也執行到Barrier,當所有 Group內的process均執行到Barrier便會取消block繼續往下執行 功能 int MPI_Barrier(MPI_Comm comm)function ■ Types of Collective Operations: ● Synchronization : processes wait until all members of the group have reached the synchronization point. ● Data Movement : broadcast, scatter/gather, all to all. ● Collective Computation (reductions) : one member of the group collects data from the other members and performs an operation (min, max, add, multiply, etc.) on that data.
  • 76. Embedded and Parallel Systems Lab 76 MPI_Bcast int:如果執行成功回傳 MPI_SUCCESS,0return value buffer:INOUT,傳送的訊息,也是接收訊息的 buff count:IN,傳送多少個訊息 datatype:IN,傳送的資料型能 source(標準root):IN,負責傳送訊息的process comm:IN,MPI_COMM_WORLD parameters 將訊息廣播出去,讓所有人接收到相同的訊息功能 int MPI_Bcast(void* buffer, int count, MPI_Datatype datatype, int source(root), MPI_Comm comm) function
  • 77. Embedded and Parallel Systems Lab 77 MPI_Gather int:如果執行成功回傳MPI_SUCCESS,0return value sendbuf:IN,傳送的訊息 sendcount:IN,傳送多少個 sendtype:IN,傳送的型態 recvbuf:OUT,接收訊息的buf recvcount:IN,接收多少個 recvtype:IN,接收的型態 destine:IN,負責接收訊息的process comm:IN,MPI_COMM_WORLD parameters 將分散在各個process 所傳送的訊息,整合起來,然後傳送到指定的process接收功能 int MPI_Gather(void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, int destine, MPI_Comm comm) function
  • 78. Embedded and Parallel Systems Lab 78 MPI_Gather
  • 79. Embedded and Parallel Systems Lab 79 MPI_Allgather int:如果執行成功回傳MPI_SUCCESS,0return value sendbuf:IN,傳送的訊息 sendcount:IN,傳送多少個 sendtype:IN,傳送的型態 recvbuf:OUT,接收訊息的buf recvcount:IN,接收多少個 recvtype:IN,接收的型態 comm:IN,MPI_COMM_WORLD parameters 將分散在各個process 所傳送的訊息,整合起來,然後廣播到所有process功能 int MPI_Allgather(void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm) function
  • 80. Embedded and Parallel Systems Lab 80 MPI_Allgather
  • 81. Embedded and Parallel Systems Lab 81 MPI_Reduce int:如果執行成功回傳MPI_SUCCESS,0return value sendbuf:IN,傳送的訊息 recvbuf:OUT,接收訊息的buf count:IN,傳送接收多少個 datatype:IN,傳送接收的資料型態 op:IN,想要做的動作 destine:IN,接收訊息的process ID comm:IN,MPI_COMM_WORLD parameters 在傳送時順便做一些Operation(ex:MPI_SUM做加總),然後將結果送到destine process 功能 int MPI_Reduce(void* sendbuf, void* recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int destine, MPI_Comm comm) function
  • 82. Embedded and Parallel Systems Lab 82 MPI_Reduce float, double and long doublemin value and locationMPI_MINLOC float, double and long doublemax value and locationMPI_MAXLOC integer, MPI_BYTEbit-wise XORMPI_BXOR integerlogical XORMPI_LXOR integer, MPI_BYTEbit-wise ORMPI_BOR integerlogical ORMPI_LOR integer, MPI_BYTEbit-wise ANDMPI_BAND integerlogical ANDMPI_LAND integer, floatproductMPI_PROD integer, floatsumMPI_SUM integer, floatminimumMPI_MIN integer, floatmaximumMPI_MAX C Data TypesMPI Reduction Operation
  • 83. Embedded and Parallel Systems Lab 83 MPI example : matrix.c(1) #include <mpi.h> #include <stdio.h> #include <stdlib.h> #define RANDOM_SEED 2882 //random seed #define MATRIX_SIZE 800 //sequare matrix width the same to height #define NODES 4//this is numbers of nodes. minimum is 1. don't use < 1 #define TOTAL_SIZE (MATRIX_SIZE * MATRIX_SIZE)//total size of MATRIX #define CHECK int main(int argc, char *argv[]){ int i,j,k; int node_id; int AA[MATRIX_SIZE][MATRIX_SIZE]; int BB[MATRIX_SIZE][MATRIX_SIZE]; int CC[MATRIX_SIZE][MATRIX_SIZE];
  • 84. Embedded and Parallel Systems Lab 84 MPI example : matrix.c(2) #ifdef CHECK int _CC[MATRIX_SIZE][MATRIX_SIZE]; //sequence user, use to check the parallel result CC #endif int check = 1; int print = 0; int computing = 0; double time,seqtime; int numtasks; int tag=1; int node_size; MPI_Status stat; MPI_Datatype rowtype; srand( RANDOM_SEED );
  • 85. Embedded and Parallel Systems Lab 85 MPI example : matrix.c(3) MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &node_id); MPI_Comm_size(MPI_COMM_WORLD, &numtasks); if (numtasks != NODES){ printf("Must specify %d processors. Terminating.n", NODES); MPI_Finalize(); return 0; } if (MATRIX_SIZE%NODES !=0){ printf("Must MATRIX_SIZE%NODES==0n", NODES); MPI_Finalize(); return 0; } MPI_Type_contiguous(MATRIX_SIZE, MPI_FLOAT, &rowtype); MPI_Type_commit(&rowtype);
  • 86. Embedded and Parallel Systems Lab 86 MPI example : matrix.c(4) /*create matrix A and Matrix B*/ if(node_id == 0){ for( i=0 ; i<MATRIX_SIZE ; i++){ for( j=0 ; j<MATRIX_SIZE ; j++){ AA[i][j] = rand()%10; BB[i][j] = rand()%10; } } } /*send the matrix A and B to other node */ node_size = MATRIX_SIZE / NODES;
  • 87. Embedded and Parallel Systems Lab 87 MPI example : matrix.c(5) //send AA if (node_id == 0) for (i=1; i<NODES; i++) MPI_Send(&AA[i*node_size][0], node_size, rowtype, i, tag, MPI_COMM_WORLD); else MPI_Recv(&AA[node_id*node_size][0], node_size, rowtype, 0, tag, MPI_COMM_WORLD, &stat); //send BB if (node_id == 0) for (i=1; i<NODES; i++) MPI_Send(&BB, MATRIX_SIZE, rowtype, i, tag, MPI_COMM_WORLD); else MPI_Recv(&BB, MATRIX_SIZE, rowtype, 0, tag, MPI_COMM_WORLD, &stat);
  • 88. Embedded and Parallel Systems Lab 88 MPI example : matrix.c(6) /*computing C = A * B*/ time = -MPI_Wtime(); for( i=node_id*node_size ; i<(node_id*node_size+node_size) ; i++){ for( j=0 ; j<MATRIX_SIZE ; j++){ computing = 0; for( k=0 ; k<MATRIX_SIZE ; k++) computing += AA[i][k] * BB[k][j]; CC[i][j] = computing; } } MPI_Allgather(&CC[node_id*node_size][0], node_size, rowtype, &CC, node_size, rowtype, MPI_COMM_WORLD); time += MPI_Wtime();
  • 89. Embedded and Parallel Systems Lab 89 MPI example : matrix.c(7) #ifdef CHECK seqtime = -MPI_Wtime(); if(node_id == 0){ for( i=0 ; i<MATRIX_SIZE ; i++){ for( j=0 ; j<MATRIX_SIZE ; j++){ computing = 0; for( k=0 ; k<MATRIX_SIZE ; k++) computing += AA[i][k] * BB[k][j]; _CC[i][j] = computing; } } } seqtime += MPI_Wtime();
  • 90. Embedded and Parallel Systems Lab 90 /* check result */ if(node_id == 0){ for( i=0 ; i<MATRIX_SIZE; i++){ for( j=0 ; j<MATRIX_SIZE ; j++){ if( CC[i][j] != _CC[i][j]){ check = 0; break; } } } }
  • 91. Embedded and Parallel Systems Lab 91 MPI example : matrix.c(8) /*print result */ #endif if(node_id ==0){ printf("node_id=%dncheck=%snprocessing time:%fnn",node_id, (check)?"success!":"failure!", time); #ifdef CHECK printf("sequent time:%fn", seqtime); #endif } MPI_Type_free(&rowtype); MPI_Finalize(); return 0; }
  • 92. Embedded and Parallel Systems Lab 92 Reference ■ Top 500 http://www.top500.org/ ■ Maarten Van Steen, Andrew S. Tanenbaum, “Distributed Systems: Principles and Paradigms ” ■ System Threads Reference http://www.unix.org/version2/whatsnew/threadsref. html ■ Semaphone http://www.mkssoftware.com/docs/man3/sem_init.3.asp ■ Richard Stones. Neil Matthew, “Beginning Linux Programming” ■ W. Richard Stevens, “Networking APIs:Sockets and XTI“ ■ William W.-Y. Liang , “Linux System Programming” ■ Michael J. Quinn, “Parallel Programming in C with MPI and OpenMP” ■ Introduction to Parallel Computing http://www.llnl. gov/computing/tutorials/parallel_comp/
  • 93. Embedded and Parallel Systems Lab 93 Reference ■ Michael J. Quinn, “Parallel Programming in C with MPI and OpenMP” ■ Introduction to Parallel Computing http://www.llnl. gov/computing/tutorials/parallel_comp/ ■ MPI standard http://www-unix.mcs.anl.gov/mpi/ ■ MPI http://www.llnl.gov/computing/tutorials/mpi/
  • 94. Embedded and Parallel Systems Lab 94 Conclusion ■ 如何想出好的平行演算法是非常困難的。 ■ 開發工具及除錯工具普遍不足 ■ 新一代的語言 ● IBM的X10、Sun的Fortress、Cray的Chapel ◆ X10是以java1.4為基礎來擴充的語言 async(place.factory.place(1)){ for (int i=1 ; i<=10 ; i+=2 ) ans += i; }
  • 95. Embedded and Parallel Systems Lab 95 Reference ■ Top 500 http://www.top500.org/ ■ Maarten Van Steen, Andrew S. Tanenbaum, “Distributed Systems: Principles and Paradigms ” ■ System Threads Reference http://www.unix.org/version2/whatsnew/threadsref. html ■ Semaphone http://www.mkssoftware.com/docs/man3/sem_init.3.asp ■ Richard Stones. Neil Matthew, “Beginning Linux Programming” ■ W. Richard Stevens, “Networking APIs:Sockets and XTI“ ■ William W.-Y. Liang , “Linux System Programming” ■ Michael J. Quinn, “Parallel Programming in C with MPI and OpenMP” ■ Introduction to Parallel Computing http://www.llnl. gov/computing/tutorials/parallel_comp/
  • 96. Embedded and Parallel Systems Lab 96 Reference ■ MPI standard http://www-unix.mcs.anl.gov/mpi/ ■ MPI http://www.llnl.gov/computing/tutorials/mpi/ ■ OpenMP standard http://www.openmp.org/drupal/ ■ OpenMP MSDN tutorial http://msdn2.microsoft.com/en-us/library/tt15eb9t (VS.80).aspx ■ OpenMP tutorial http://www.llnl.gov/computing/tutorials/openMP/#DO ■ Kang Su Gatlin , Pete Isensee, “Reap the Benefits of Multithreading without All the Work” ,MSDN Magazine ■ Gary Anthes “Languages for Supercomputing Get 'Suped' Up”, Computerword March 12, 2007 ■ IBM X10 research http://domino.research.ibm. com/comm/research_projects.nsf/pages/x10.X10-presentations.html
  • 97. Embedded and Parallel Systems Lab 97 The End Thank you very much!
  • 98. Embedded and Parallel Systems Lab 98 附錄 ■ Pipeline
  • 99. Embedded and Parallel Systems Lab 99 Pipeline