openmp

21
OpenMP Speaker 呂呂呂 Date 2007/06/01

Upload: zongying-lyu

Post on 23-Jun-2015

137 views

Category:

Software


2 download

TRANSCRIPT

Page 1: OpenMP

OpenMP

Speaker :呂宗螢Date : 2007/06/01

Page 2: OpenMP

Embedded and Parallel Systems Lab

2

Outline

Page 3: OpenMP

Embedded and Parallel Systems Lab

3

OpenMP

OpenMP 2.5 Multi-threaded & Share memory Fortran 、 C / C++ 基本語法

#pragma omp directive [clause] OpenMP 需求及支援環境

Windows Virtual studio 2005 standard Intel ® C++ Compiler 9.1

Linux gcc 4.2.0 Omni

Xbox 360 & PS3

Page 4: OpenMP

Embedded and Parallel Systems Lab

4

Windows

於程式最前面 #include <omp.h> Virtual studio 2005 standard

專案 / 專案屬性 / 組態屬性 /c/c++/ 語言將 OpenMP 支援改為 yes

Page 5: OpenMP

Embedded and Parallel Systems Lab

5

Linux

gcc 4.2 如果沒有請至 GNU gcc 下載 gcc

http://gcc.gnu.org/以 gcc 4.2.1 為例

1. 解開 gcc tar -zxvf gcc-4.2.1.tar.gz2. 進到該目錄 cd  gcc-4.2.13. 設定 configure ,並安裝至 /opt/gcc-4.2.1 ./configure -prefix=/opt/gcc-4.2.1/4. 編繹 make 5. 安裝 make install

Page 6: OpenMP

Embedded and Parallel Systems Lab

6

OpenMP Constructs

Page 7: OpenMP

Embedded and Parallel Systems Lab

7

Types of Work-Sharing Constructs Loop : shares iterations of a loop

across the team. Represents a type of "data parallelism".

Source : http://www.llnl.gov/computing/tutorials/openMP/

Sections : breaks work into separate, discrete sections. Each section is executed by a thread. Can be used to implement a type of "functional parallelism".

Page 8: OpenMP

Embedded and Parallel Systems Lab

8

Types of Work-Sharing Constructs

single :將程式於一個執行緒執行 ( 於一個子執行緒執行,但不會在master thread 執行 )

Source : http://www.llnl.gov/computing/tutorials/openMP/

Page 9: OpenMP

Embedded and Parallel Systems Lab

9

Loop working sharing

#pragma omp parallel for

for( int i , i <10000, i++)

for( int j , j <100 , j++)

function(i);

#pragma omp parallel

{\\大括號必須斷行,不能接於 parallel後#pragma omp for

for( int i , i <10000, i++)

for( int j , j <100 , j++)

function(i);

}

=

parallel for 只能使用迴圈的 index 為 int 型態,且執行次數是可預知的

Thread 0 (Master)

for( i = 0 , i <5000, i++)

for( int j , j <100 , j++)

function(i);

Thread 1

for( i = 5000 , i <10000, i++)

for( int j , j <100 , j++)

function(i);

於雙執行緒的 cpu 執行時情形

Page 10: OpenMP

Embedded and Parallel Systems Lab

10

OpenMP example : log.cpp#include <omp.h>#pragma omp parallel for num_threads(2) // 將 for迴圈平均分給 2 個 threads for (y=2;y<BufSizeY-2;y++) for (x=2;x<BufSizeX-2;x++)

for (z=0;z<BufSizeBand;z++) {addr=(y*BufSizeX+x)*BufSizeBand+z; ans = (BYTE)(*(InBuf+addr))*16+ (BYTE)(*(InBuf+((y*BufSizeX+x+1)*BufSizeBand+z)))*(-2) +

(BYTE)(*(InBuf+((y*BufSizeX+x-1)*BufSizeBand+z)))*(-2) +

(BYTE)(*(InBuf+(((y+1)*BufSizeX+x)*BufSizeBand+z)))*(-2)+ (BYTE)(*(InBuf+(((y-1)*BufSizeX+x)*BufSizeBand+z)))*(-2)+

(BYTE)(*(InBuf+((y*BufSizeX+x+2)*BufSizeBand+z)))*(-1)+

(BYTE)(*(InBuf+((y*BufSizeX+x-2)*BufSizeBand+z)))*(-1)+ (BYTE)(*(InBuf+(((y+2)*BufSizeX+x)*BufSizeBand+z)))*(-1)+ (BYTE)(*(InBuf+(((y-2)*BufSizeX+x)*BufSizeBand+z)))*(-1)+

(BYTE)(*(InBuf+(((y+1)*BufSizeX+x+1)*BufSizeBand+z)))*(-1) + (BYTE)(*(InBuf+(((y+1)*BufSizeX+x-1)*BufSizeBand+z)))*(-1)+

(BYTE)(*(InBuf+(((y-1)*BufSizeX+x+1)*BufSizeBand+z)))*(-1)+ (BYTE)(*(InBuf+(((y-1)*BufSizeX+x-1)*BufSizeBand+z)))*(-1);*(OutBuf+addr)=abs(ans)/8;}

Page 11: OpenMP

Embedded and Parallel Systems Lab

11

Source image Out image

Convert Log Image

Page 12: OpenMP

Embedded and Parallel Systems Lab

12

Sections Working Shareint main(int argc, char* argv[]) {

#pragma omp parallel sections {

#pragma omp section {

toPNG(); } #pragma omp section {

toJPG(); } #pragma omp section {

toTIF();}

}

}

Input image

toPNG

toJPG

toTIF

Page 13: OpenMP

Embedded and Parallel Systems Lab

13

OpenMP notice

int Fe[10]; Fe[0] = 0;Fe[1] = 1; #pragma omp parallel for num_threads(2)for( i = 2; i < 10; ++ i )

Fe[i] = Fe[i-1] + Fe[i-2];

Data dependent

#pragma omp parallel {

#pragma omp for for( int i = 0; i < 1000000; ++ i )

sum += i; }

Race conditions

Page 14: OpenMP

Embedded and Parallel Systems Lab

14

OpenMP notice

DeadLock#pragma omp parallel

private(me)

{

int me;

me = omp_get_thread_num ();

if (me == 0) goto Master;

#pragma omp barrier

Master:

#pragma omp single

write(*,*) ”done”

}

Page 15: OpenMP

Embedded and Parallel Systems Lab

15

OpenMP example:matrix(1)#include <omp.h>#include <stdio.h>#include <stdlib.h>#define RANDOM_SEED 2882 //random seed#define VECTOR_SIZE 4 //sequare matrix width the same to height

#define MATRIX_SIZE (VECTOR_SIZE * VECTOR_SIZE) //total size of

MATRIXint main(int argc, char *argv[]){

int i,j,k;int node_id;int *AA; //sequence use & check the d2mce right or faultint *BB; //sequence useint *CC; //sequence useint computing;int _vector_size = VECTOR_SIZE;int _matrix_size = MATRIX_SIZE;char c[10];

Page 16: OpenMP

Embedded and Parallel Systems Lab

16

OpenMP example:matrix(2)if(argc > 1){

for( i = 1 ; i < argc ;){ if(strcmp(argv[i],"-s") == 0){ _vector_size = atoi(argv[i+1]); _matrix_size =_vector_size * _vector_size; i+=2; } else{ printf("the argument only have:\n"); printf("-s: the size of vector ex: -s 256\n"); return 0; } } }

AA =(int *)malloc(sizeof(int) * _matrix_size);BB =(int *)malloc(sizeof(int) * _matrix_size);CC =(int *)malloc(sizeof(int) * _matrix_size);

Page 17: OpenMP

Embedded and Parallel Systems Lab

17

OpenMP example:matrix(3)srand( RANDOM_SEED );/* create matrix A and Matrix B */

for( i=0 ; i< _matrix_size ; i++){AA[i] = rand()%10; BB[i] = rand()%10;

}/* computing C = A * B */

#pragma omp parallel for private(computing, j , k)for( i=0 ; i < _vector_size ; i++){

for( j=0 ; j < _vector_size ; j++){computing =0;for( k=0 ; k < _vector_size ; k++)

computing += AA[ i*_vector_size + k ] * BB[ k*_vector_size + j ];

CC[ i*_vector_size + j ] = computing;}

}

Page 18: OpenMP

Embedded and Parallel Systems Lab

18

OpenMP example:matrix(4)

printf("\nVector_size:%d\n", _vector_size);

printf("Matrix_size:%d\n", _matrix_size);

printf("Processing time:%f\n", time);

return 0;

}

Page 19: OpenMP

Embedded and Parallel Systems Lab

19

OpenMP Directive TableDirective Description

atomic Specifies that a memory location that will be updated atomically.

barrierSynchronizes all threads in a team; all threads pause at the barrier, until all threads execute the barrier.

critical Specifies that code is only executed on one thread at a time.

flush Specifies that all threads have the same view of memory for all shared objects.

for Causes the work done in a for loop inside a parallel region to be divided among threads.

master Specifies that only the master threadshould execute a section of the program.

ordered Specifies that code under a parallelized for loop should be executed like a sequential loop.

parallel Defines a parallel region, which is code that will be executed by multiple threads in parallel.

sections Identifies code sections to be divided among all threads.

singleLets you specify that a section of code should be executed on a single thread, not necessarily the master thread.

threadprivate Specifies that a variable is private to a thread.

Source :http://msdn2.microsoft.com/zh-tw/library/0ca2w8dk(VS.80).aspx

Page 20: OpenMP

Embedded and Parallel Systems Lab

20

OpenMP Clause TableClause Description

copyin Allows threads to access the master thread's value, for a threadprivate variable.

copyprivate Specifies that one or more variables should be shared among all threads.

default Specifies the behavior of unscoped variables in a parallel region.

firstprivateSpecifies that each thread should have its own instance of a variable, and that the variable should be initialized with the value of the variable, because it exists before the parallel construct.

if Specifies whether a loop should be executed in parallel or in serial.

lastprivateSpecifies that the enclosing context's version of the variable is set equal to the private version of whichever thread executes the final iteration (for-loop construct) or last section (#pragma sections).

nowait Overrides the barrier implicit in a directive.

num_threads Sets the number of threads in a thread team.

ordered Required on a parallel for statement if an ordered directive is to be used in the loop.

private Specifies that each thread should have its own instance of a variable.

reductionSpecifies that one or more variables that are private to each thread are the subject of a reduction operation at the end of the parallel region.

schedule Applies to the for directive. Have fourt method : static 、 dynamic 、 guided 、 runtime

shared Specifies that one or more variables should be shared among all threads.

Source :http://msdn2.microsoft.com/zh-tw/library/0ca2w8dk(VS.80).aspx

Page 21: OpenMP

Embedded and Parallel Systems Lab

21

Reference

Michael J. Quinn, “Parallel Programming in C with MPI and OpenMP” Introduction to Parallel Computing  

http://www.llnl.gov/computing/tutorials/parallel_comp/ OpenMP standard http://www.openmp.org/drupal/ OpenMP MSDN tutorial

http://msdn2.microsoft.com/en-us/library/tt15eb9t(VS.80).aspx OpenMP tutorial http://www.llnl.gov/computing/tutorials/openMP/#DO Kang Su Gatlin , Pete Isensee, “Reap the Benefits of Multithreading without

All the Work” ,MSDN Magazine