Download - OpenMP
OpenMP
Speaker :呂宗螢Date : 2007/06/01
Embedded and Parallel Systems Lab
2
Outline
Embedded and Parallel Systems Lab
3
OpenMP
OpenMP 2.5 Multi-threaded & Share memory Fortran 、 C / C++ 基本語法
#pragma omp directive [clause] OpenMP 需求及支援環境
Windows Virtual studio 2005 standard Intel ® C++ Compiler 9.1
Linux gcc 4.2.0 Omni
Xbox 360 & PS3
Embedded and Parallel Systems Lab
4
Windows
於程式最前面 #include <omp.h> Virtual studio 2005 standard
專案 / 專案屬性 / 組態屬性 /c/c++/ 語言將 OpenMP 支援改為 yes
Embedded and Parallel Systems Lab
5
Linux
gcc 4.2 如果沒有請至 GNU gcc 下載 gcc
http://gcc.gnu.org/以 gcc 4.2.1 為例
1. 解開 gcc tar -zxvf gcc-4.2.1.tar.gz2. 進到該目錄 cd gcc-4.2.13. 設定 configure ,並安裝至 /opt/gcc-4.2.1 ./configure -prefix=/opt/gcc-4.2.1/4. 編繹 make 5. 安裝 make install
Embedded and Parallel Systems Lab
6
OpenMP Constructs
Embedded and Parallel Systems Lab
7
Types of Work-Sharing Constructs Loop : shares iterations of a loop
across the team. Represents a type of "data parallelism".
Source : http://www.llnl.gov/computing/tutorials/openMP/
Sections : breaks work into separate, discrete sections. Each section is executed by a thread. Can be used to implement a type of "functional parallelism".
Embedded and Parallel Systems Lab
8
Types of Work-Sharing Constructs
single :將程式於一個執行緒執行 ( 於一個子執行緒執行,但不會在master thread 執行 )
Source : http://www.llnl.gov/computing/tutorials/openMP/
Embedded and Parallel Systems Lab
9
Loop working sharing
#pragma omp parallel for
for( int i , i <10000, i++)
for( int j , j <100 , j++)
function(i);
#pragma omp parallel
{\\大括號必須斷行,不能接於 parallel後#pragma omp for
for( int i , i <10000, i++)
for( int j , j <100 , j++)
function(i);
}
=
parallel for 只能使用迴圈的 index 為 int 型態,且執行次數是可預知的
Thread 0 (Master)
for( i = 0 , i <5000, i++)
for( int j , j <100 , j++)
function(i);
Thread 1
for( i = 5000 , i <10000, i++)
for( int j , j <100 , j++)
function(i);
於雙執行緒的 cpu 執行時情形
Embedded and Parallel Systems Lab
10
OpenMP example : log.cpp#include <omp.h>#pragma omp parallel for num_threads(2) // 將 for迴圈平均分給 2 個 threads for (y=2;y<BufSizeY-2;y++) for (x=2;x<BufSizeX-2;x++)
for (z=0;z<BufSizeBand;z++) {addr=(y*BufSizeX+x)*BufSizeBand+z; ans = (BYTE)(*(InBuf+addr))*16+ (BYTE)(*(InBuf+((y*BufSizeX+x+1)*BufSizeBand+z)))*(-2) +
(BYTE)(*(InBuf+((y*BufSizeX+x-1)*BufSizeBand+z)))*(-2) +
(BYTE)(*(InBuf+(((y+1)*BufSizeX+x)*BufSizeBand+z)))*(-2)+ (BYTE)(*(InBuf+(((y-1)*BufSizeX+x)*BufSizeBand+z)))*(-2)+
(BYTE)(*(InBuf+((y*BufSizeX+x+2)*BufSizeBand+z)))*(-1)+
(BYTE)(*(InBuf+((y*BufSizeX+x-2)*BufSizeBand+z)))*(-1)+ (BYTE)(*(InBuf+(((y+2)*BufSizeX+x)*BufSizeBand+z)))*(-1)+ (BYTE)(*(InBuf+(((y-2)*BufSizeX+x)*BufSizeBand+z)))*(-1)+
(BYTE)(*(InBuf+(((y+1)*BufSizeX+x+1)*BufSizeBand+z)))*(-1) + (BYTE)(*(InBuf+(((y+1)*BufSizeX+x-1)*BufSizeBand+z)))*(-1)+
(BYTE)(*(InBuf+(((y-1)*BufSizeX+x+1)*BufSizeBand+z)))*(-1)+ (BYTE)(*(InBuf+(((y-1)*BufSizeX+x-1)*BufSizeBand+z)))*(-1);*(OutBuf+addr)=abs(ans)/8;}
Embedded and Parallel Systems Lab
11
Source image Out image
Convert Log Image
Embedded and Parallel Systems Lab
12
Sections Working Shareint main(int argc, char* argv[]) {
#pragma omp parallel sections {
#pragma omp section {
toPNG(); } #pragma omp section {
toJPG(); } #pragma omp section {
toTIF();}
}
}
Input image
toPNG
toJPG
toTIF
Embedded and Parallel Systems Lab
13
OpenMP notice
int Fe[10]; Fe[0] = 0;Fe[1] = 1; #pragma omp parallel for num_threads(2)for( i = 2; i < 10; ++ i )
Fe[i] = Fe[i-1] + Fe[i-2];
Data dependent
#pragma omp parallel {
#pragma omp for for( int i = 0; i < 1000000; ++ i )
sum += i; }
Race conditions
Embedded and Parallel Systems Lab
14
OpenMP notice
DeadLock#pragma omp parallel
private(me)
{
int me;
me = omp_get_thread_num ();
if (me == 0) goto Master;
#pragma omp barrier
Master:
#pragma omp single
write(*,*) ”done”
}
Embedded and Parallel Systems Lab
15
OpenMP example:matrix(1)#include <omp.h>#include <stdio.h>#include <stdlib.h>#define RANDOM_SEED 2882 //random seed#define VECTOR_SIZE 4 //sequare matrix width the same to height
#define MATRIX_SIZE (VECTOR_SIZE * VECTOR_SIZE) //total size of
MATRIXint main(int argc, char *argv[]){
int i,j,k;int node_id;int *AA; //sequence use & check the d2mce right or faultint *BB; //sequence useint *CC; //sequence useint computing;int _vector_size = VECTOR_SIZE;int _matrix_size = MATRIX_SIZE;char c[10];
Embedded and Parallel Systems Lab
16
OpenMP example:matrix(2)if(argc > 1){
for( i = 1 ; i < argc ;){ if(strcmp(argv[i],"-s") == 0){ _vector_size = atoi(argv[i+1]); _matrix_size =_vector_size * _vector_size; i+=2; } else{ printf("the argument only have:\n"); printf("-s: the size of vector ex: -s 256\n"); return 0; } } }
AA =(int *)malloc(sizeof(int) * _matrix_size);BB =(int *)malloc(sizeof(int) * _matrix_size);CC =(int *)malloc(sizeof(int) * _matrix_size);
Embedded and Parallel Systems Lab
17
OpenMP example:matrix(3)srand( RANDOM_SEED );/* create matrix A and Matrix B */
for( i=0 ; i< _matrix_size ; i++){AA[i] = rand()%10; BB[i] = rand()%10;
}/* computing C = A * B */
#pragma omp parallel for private(computing, j , k)for( i=0 ; i < _vector_size ; i++){
for( j=0 ; j < _vector_size ; j++){computing =0;for( k=0 ; k < _vector_size ; k++)
computing += AA[ i*_vector_size + k ] * BB[ k*_vector_size + j ];
CC[ i*_vector_size + j ] = computing;}
}
Embedded and Parallel Systems Lab
18
OpenMP example:matrix(4)
printf("\nVector_size:%d\n", _vector_size);
printf("Matrix_size:%d\n", _matrix_size);
printf("Processing time:%f\n", time);
return 0;
}
Embedded and Parallel Systems Lab
19
OpenMP Directive TableDirective Description
atomic Specifies that a memory location that will be updated atomically.
barrierSynchronizes all threads in a team; all threads pause at the barrier, until all threads execute the barrier.
critical Specifies that code is only executed on one thread at a time.
flush Specifies that all threads have the same view of memory for all shared objects.
for Causes the work done in a for loop inside a parallel region to be divided among threads.
master Specifies that only the master threadshould execute a section of the program.
ordered Specifies that code under a parallelized for loop should be executed like a sequential loop.
parallel Defines a parallel region, which is code that will be executed by multiple threads in parallel.
sections Identifies code sections to be divided among all threads.
singleLets you specify that a section of code should be executed on a single thread, not necessarily the master thread.
threadprivate Specifies that a variable is private to a thread.
Source :http://msdn2.microsoft.com/zh-tw/library/0ca2w8dk(VS.80).aspx
Embedded and Parallel Systems Lab
20
OpenMP Clause TableClause Description
copyin Allows threads to access the master thread's value, for a threadprivate variable.
copyprivate Specifies that one or more variables should be shared among all threads.
default Specifies the behavior of unscoped variables in a parallel region.
firstprivateSpecifies that each thread should have its own instance of a variable, and that the variable should be initialized with the value of the variable, because it exists before the parallel construct.
if Specifies whether a loop should be executed in parallel or in serial.
lastprivateSpecifies that the enclosing context's version of the variable is set equal to the private version of whichever thread executes the final iteration (for-loop construct) or last section (#pragma sections).
nowait Overrides the barrier implicit in a directive.
num_threads Sets the number of threads in a thread team.
ordered Required on a parallel for statement if an ordered directive is to be used in the loop.
private Specifies that each thread should have its own instance of a variable.
reductionSpecifies that one or more variables that are private to each thread are the subject of a reduction operation at the end of the parallel region.
schedule Applies to the for directive. Have fourt method : static 、 dynamic 、 guided 、 runtime
shared Specifies that one or more variables should be shared among all threads.
Source :http://msdn2.microsoft.com/zh-tw/library/0ca2w8dk(VS.80).aspx
Embedded and Parallel Systems Lab
21
Reference
Michael J. Quinn, “Parallel Programming in C with MPI and OpenMP” Introduction to Parallel Computing
http://www.llnl.gov/computing/tutorials/parallel_comp/ OpenMP standard http://www.openmp.org/drupal/ OpenMP MSDN tutorial
http://msdn2.microsoft.com/en-us/library/tt15eb9t(VS.80).aspx OpenMP tutorial http://www.llnl.gov/computing/tutorials/openMP/#DO Kang Su Gatlin , Pete Isensee, “Reap the Benefits of Multithreading without
All the Work” ,MSDN Magazine