parallelcomputingparadigms*5 (ucecpd)alfa.di.uminho.pt › ~brunom ›...
TRANSCRIPT
![Page 1: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/1.jpg)
University of Minho Informatics Department
OpenMP 3.1 • Bruno Medeiros (id 3985) • Prof. João Luís Ferreira Sobral
Parallel Computing Paradigms (UCE CPD)
1
![Page 2: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/2.jpg)
Introduction to OpenMP • OpenMP is an API to support Shared Memory (SM)
parallelization on multi-core machines o Based on compiler directives, library routines and environmental variables
o Supports C/C++ and Fortran programming languages
• Uses multithreading based in the fork-join model of parallel execution
Image Reference: http://www.ceemple.com/wp-content/uploads/2014/08/Fork_join.svg1.png
• It is through directives, added by the programmer to the code, that the compiler adds parallelism to the application
2
![Page 3: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/3.jpg)
Considerations • OpenMP itself does not solve problems such as
o Starvation, deadlock or poor load balancing (among others) o But, offers routines to solve problems like load balancing or/and memory
consistency o However, starvation and deadlock are the programmer’s responsibility
• Threads creation/managing is delegated to the compiler and OpenMP runtime o + Easier to parallelize applications o - Less control over the threads behaviour
• By default, the number of parallel activities is defined in runtime according to the available resources o e.g. 2 cores -> 2 threads o Hyper-threading capability counts as a core
• OpenMP does not support distributed memory systems and more complex parallelization must resort to library calls
3
![Page 4: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/4.jpg)
Programming Model • The OpenMP program starts as a single thread (master thread) • Parallel regions create a team of parallel activities • Work-sharing generates work for the team to process • Data sharing clauses specify how variables are shared within a
parallel region
4
![Page 5: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/5.jpg)
Constructs (1) • OpenMP directives format for C/C++ applications
o #pragma omp directive-name [clause[ [,] clause]...] new-line
• Parallel Constructs o #pragma omp parallel : Creates a team of threads
• Work-sharing Constructs o #pragma omp for : Assignment of iterations to threads o #pragma omp sections : Assignment of blocks of code (section) to
threads o #pragma omp single : Restricts a block of code to be executed by
only one thread
5
![Page 6: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/6.jpg)
Constructs (2) • Tasking Constructs
o #pragma omp task : Creation of a pool of tasks to be executed by threads
• Master and Synchronization Constructs o #pragma omp master : Restricts a block of code to be executed only
by the master thread of the team o #pragma omp critical : Restricts the execution of a given block of
code to a single thread at a time o #pragma omp barrier : Makes all threads in a team to wait for the
remaining o #pragma omp taskwait : Waits for the completion of the current child
task o #pragma omp atomic : Ensures that a specific storage location is
accessed atomically o #pragma omp flush : Makes a thread’s temporary view of memory
consistent with memory o #pragma omp ordered : Specifies a block of code in a loop region that
will be executed in the order of the loop iterations 6
![Page 7: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/7.jpg)
Data Sharing • What happens to variables inside parallel regions?
o Variables declared inside are local to each thread o Variables declared outside are shared
• Data sharing clauses o private(varlist) : each variable in varlist becomes private to each
thread, initial values not specified o firstprivate(varlist) : same as private, but variables are initialized
with the same values outside the parallel region o lastprivate(varlist) : same as private, but the final value is the last
loop iteration’s value o reduction (op:var) : same as lastprivate, but the final value is the
result of a reduction using the operator “op”
• Directives for data sharing o #pragma omp threadlocal : each thread gets a local copy of the value o copying clause copies the values from master thread to the other threads
7
![Page 8: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/8.jpg)
Parallel Region • When a thread encounters a parallel construct a team of
threads is created (FORK)
• The thread which encounters the parallel region becomes the master of the new team
• All threads in the team (including the master) execute the
parallel region • At end of a parallel region all threads synchronize and join the
master thread (JOIN)
8
![Page 9: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/9.jpg)
Parallel Region Syntax #pragma omp parallel [clauses]{
code_block;}
• Where clauses can be o if(scalar-expression)o num_threads(integer-expression)o private(list)o firstprivate(list)o shared(list)o reduction(operator : list)
9
![Page 10: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/10.jpg)
Nested Parallel Region • If a thread in a team executing a parallel region encounters
another parallel directive it creates a new team and becomes the master of this team
• If nested parallelism is disabled, then no additional team of
threads will be created • To enable/disabled : omp_set_nested(x);
10
![Page 11: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/11.jpg)
11
Parallel Region level 0
Parallel Region level 1 Parallel Region level 1
T 0
T 0 T 1
T 0 T 1 T 0 T 1
Team 1
Team 2 Team 3
(Master)
(Master)
(Master) (Master)
• Thread marked with red is Slave in team 1 but master in team 3.
Nested Region (example)
(Slave)
(Slave) (Slave)
![Page 12: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/12.jpg)
Loop Construct • The for loop iterations are distributed across threads in the
team o The distribution is based on the
• chunk_size, by default is 1 • parallel for schedule, by default is static
• Loop schedule
o static : Iterations divided into chunks of size chunk_size assigned to the threads in a team in a round-robin fashion
o dynamic : the chunks are assigned to threads in the team as they are requested
o guided : similar to dynamic but the chunk size decreases during execution
o auto : the choice of scheduling is delegated to the compiler 12
![Page 13: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/13.jpg)
Loop Construct Syntax #pragma omp for[clauses]{
code_block;}
• Where clauses can be o private(list)o firstprivate(list)o lastprivate(list)o shared(list)o reduction(operator : list)o schedule(Kind[, chunk_size)o collapse(n)o orderedo nowait
13
![Page 14: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/14.jpg)
Loop Construct • schedule(static) vs. schedule(dynamic)
o static has lower overhead o dynamic has a better load balance approach o Increasing the chuck size in the dynamic for
• Diminishes the scheduling overhead • Increases the possibility of load balancing problems
• Lets consider the following loop that we want to parallelize
using 2 threads, being void f(int i) a given function
o #pragma omp parallel for schedule (?) for(i = 0; i < 100; i++)
f(i);
o What is the most appropriated type of scheduling? 14
![Page 15: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/15.jpg)
Parallel for with ordered clause
• #pragma omp for schedule(static) ordered for (int i = 0; i < N; ++i) {
// do something here#pragma omp ordered {
printf(“test() iteration %d\n”, i); }
}
15
![Page 16: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/16.jpg)
Parallel execution of code sections • Supports heterogeneous task #pragma omp parallel {
#pragma omp sections{
#pragma omp section{
taskA();}#pragma omp section{
taskB();}#pragma omp section{
taskC();}
} }
16
• section blocks are divided among threads in the team
• Each section is executed only once by threads in the team
• There is an implicit barrier at the end of the section construct unless a nowait clause is specified
• Allow the following clauses • private (list)• firstprivate(list)• lastprivate(list)• reduction(operator:list)
![Page 17: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/17.jpg)
Task constructor int fib (int n){ int i,j; if(n < 2) return n; else{ #pragma omp task shared(i) firstprivate(n) i = fib (n – 1); #pragma omp task shared(j) firstprivate(n) j = fib (n – 2); #pragma omp taskwait return i + j; }}int main(){ int n = 10; omp_set_num_threads(4); #pragma omp parallel shared(n) { #pragma omp single printf(“fib (%d) = %d \n”, n, fib(n)); }} 17
• When a thread encounters a task construct, a task is generated
• The task can be immediately execute or it can be executed later by any thread in the team
• OpenMP creates a pool of tasks to be executed by the active threads in the team
• The taskwait directive ensures that the 2 tasks generated are completed before the return statement
• A l t h o u g h o n l y o n e t h r e a d executes the single directive, and hence the call to fib(n), all four threads will participate in the generated tasks’ execution
![Page 18: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/18.jpg)
18
fib (5)
fib (4) fib (3)
fib (3) fib (2) fib (2) fib (1)
fib (2) fib (1) fib (1) fib (0) fib (1) fib (0)
fib (1) fib (0)
T 0
T 3 T 1
T 0 T 3 T 2 T 1
T 2 T 0 T 2 T 3 T 0 T 2
T 2 T 2
Execution Tree Exemplified
![Page 19: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/19.jpg)
• Critical regions (executed in mutual exclusion) o #pragma omp critical [name] updateParticles();o Restricts execution of the associated structured blocks to a single thread
at a time o Works inter-teams o An optional name may be used to identify the critical construct. All
critical constructs without a name are considered to have the same unspecified name
• Atomic Operations (fine-grain synchronization) o #pragma omp atomic
A[i] += x;o The memory update will be performed atomically. It does not make the
entire statement atomic, only the memory update is atomic o A compiler might use special hardware instructions for better performance
than when using critical19
Synchronization Constructs
![Page 20: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/20.jpg)
• Reduction of multiple values (in parallel) sum = 0; #pragma omp parallel for reduction(+:sum) for(int i = 0; i < 100; i++) { sum += array[i]; }• Thread reuse across parallel regions
#pragma omp parallel {
#pragma omp forfor(int i = 0; i < 100; i++) …#pragma omp forfor(int j = 0; j < 100; j++)…
}
20
Avoid/reduce synchronisation
![Page 21: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/21.jpg)
• OMP_SCHEDULEo sets the run-sched-var ICV for the runtime schedule type and chunk size. It can be set to any of the valid
OpenMP schedule types (i.e., static, dynamic, guided, and auto) • OMP_NUM_THREADS
o sets the nthreads-var ICV for the number of threads to use for parallel regions
• OMP_DYNAMICo sets the dyn-var ICV for the dynamic adjustment of threads to use for parallel regions.
• OMP_NESTEDo sets the nest-var ICV to enable or to disable nested parallelism.
• OMP_STACKSIZEo sets the stacksize-var ICV that specifies the size of the stack for threads created by the OpenMP
implementation • OMP_WAIT_POLICY
o sets the wait-policy-var ICV that controls the desired behavior of waiting threads
• OMP_MAX_ACTIVE_LEVELSo sets the max-active-levels-var ICV that controls the maximum number of nested active parallel regions
• OMP_THREAD_LIMITo sets the thread-limit-var ICV that controls the maximum number of threads participating in the OpenMP
program
21
Environment variables
![Page 22: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/22.jpg)
• omp_set_num_threads / omp_get_num_threads• omp_get_max_threads• omp_get_thread_num• omp_get_num_procs• omp_in_parallel.• omp_set_dynamic / omp_get_dynamic• omp_set_nested / omp_get_nested• omp_set_schedule / omp_get_schedule• omp_get_thread_limit• omp_set_max_active_levels / omp_get_max_active_levels• omp_get_level• omp_get_ancestor_thread_num• omp_get_team_size• omp_get_active_level
Locks • void omp_init_lock(omp_lock_t *lock);• void omp_destroy_lock(omp_lock_t *lock);• void omp_set_lock(omp_lock_t *lock);• void omp_unset_lock(omp_lock_t *lock);• int omp_test_lock(omp_lock_t *lock);Timers • double omp_get_wtime(void);• double omp_get_wtick(void);
22
Routines
![Page 23: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/23.jpg)
23
OpenMP versions and compiler support
![Page 24: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/24.jpg)
• Simulation of particle’s interactions
• Use of mathematical models such as Lennard-Jones Potential
• Interaction calculation based on o Position o Velocity o Force
Molecular Dynamic (MD)
24
![Page 25: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/25.jpg)
Call Graph
25
![Page 26: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/26.jpg)
Code : MD Simulation for(int move = 0; move < MaxMoves; move++){
cicleMove (md, particles);cicleForces (md, particles);cicleMkekin (md, particles);cicleVelavg (md, particles);scaleTemperature (md, particles);getFullPotentialEnergy(md);
}
26
![Page 27: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/27.jpg)
Code : Force Calculation of all Particles
void cicleForces(…){
for(int i = 0; i < MAXPARTICLES; i++){
calForce(i);}
}
27
![Page 28: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/28.jpg)
Code : Force Calculation of one Particle with the remaining
void calForce (int particleI){ for(int i = particleI + 1; i < MAXPARTICLES; i++) {
int distance = calDist(i, particleI); if (distance <= RADIUS) { forceAcc += forceBetween(particleI,i); apply3NewtonLaw(i, forceAcc); updateControlVariables(..); }}updateForces(particleI,forceAcc);
}
28
![Page 29: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/29.jpg)
MD : Parallelizing void cicleForces(…){
#pragma omp parallel forfor(int i = 0; i < MAXPARTICLES; i++){
calForce(i);}
}
• Load balancing problems (due 3º Newton's Law)
• Solution ?
29
![Page 30: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/30.jpg)
MD : Parallelizing void cicleForces(…){ #pragma omp parallel for schedule (dynamic) for(int i = 0; i < MAXPARTICLES; i++) {
calForce(i); }}
• Load balanc ing problems (due 3º Newton's Law).
• T h e r e i s t h e overhead problem L
• Solution ?
30
![Page 31: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/31.jpg)
MD : Parallelizing void cicleForces(…){ int a, b; int halfPart = MAX_PARTICLES / 2;
#pragma omp parallel for private (a,b) { int thrID = omp_get_thread_num(); int numThr = omp_get_num_threads();
for(a = thrID; a < halfPart; a += numThr) calForce(a);
for(b = MAX_PARTICLES – thrID – 1; b >= halfPart; b -= numThr) calForce(b); }}
• L o a d b a l a n c i n g prob lems (due 3º Newton's Law)
• There is the overhead problem L
• No task distribution s y n c h r o n i z a t i o n overhead!
• But there are cases of l o a d b a l a n c i n g : h a l f P a r t % numThr != 0
31
![Page 32: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/32.jpg)
Checking data Dependencies (critical clause)
void calForce (int particleI){ for(int i = particleI + 1; i < MAXPARTICLES; i++) {
int distance = calDist(i, particleI); if (distance <= RADIUS) { forceAcc += forceBetween(particleI,i); #pragma omp critical {
apply3NewtonLaw(i, forceAcc);updateControlVariables(..);
} }}#pragma omp criticalupdateForces(particleI,forceAcc);
}
• Mutual exclusion is ensured
• V e r y h i g h synchronizat ion overhead
• U n n e c e s s a r y synchronization
• Solution ? • F i n e g r a i n
synchronization
32
![Page 33: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/33.jpg)
Checking data Dependences (Lock per Particle)
omp_lock_t locks[MAX_PARTICLES];void calForce (int particleI){ for(int i = particleI + 1; i < MAXPARTICLES; i++) {
int distance = calDist(i, particleI); if (distance <= RADIUS)
{ forceAcc += forceBetween(particleI,i); omp_set_lock(&locks[i]); apply3NewtonLaw(i, forceAcc); updateControlVariables(..); omp_unset_lock(&locks[i]); }
} omp_set_lock(&locks[particleI]; updateForces(particleI,forceAcc); omp_unset_lock(&locks[particleI]);}
• M u t u a l e x c l u s i o n i s ensured
• V e r y h i g h s y n c h r o n i z a t i o n overhead;
• U n n e c e s s a r y synchronization.
• Solution ? • F i n e g r a i n
synchronization
• Lot less synchronization • B u t t h e r e i s s t i l l
overhead L
• Solution? • Data redundancy
33
![Page 34: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/34.jpg)
Removing some synchronization overhead (1)
#pragma omp parallel private (move)for(move = 0; move < MaxMoves; move++){
#pragma omp mastercicleMove (md, particles);
#pragma omp barriercicleForces (md, particles);#pragma omp barrier
#pragma omp master{ cicleMkekin (md, particles); cicleVelavg (md, particles); scaleTemperature (md, particles); getFullPotentialEnergy(md);}
} 34
![Page 35: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/35.jpg)
Removing some synchronization overhead (2)
void cicleForces(…){ int a, b; int halfPart = MAX_PARTICLES / 2;
#pragma omp for private (a,b) { … }}
35
![Page 36: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/36.jpg)
References
• OpenMP 3.1 Reference Manual o http://www.openmp.org/mp-documents/OpenMP3.1.pdf
36
![Page 37: ParallelComputingParadigms*5 (UCECPD)alfa.di.uminho.pt › ~brunom › ApresentacaoOpenMPFinal3.1.pdf · 2015-02-26 · Considerations5 • OpenMP itself does not solve problems such](https://reader033.vdocuments.pub/reader033/viewer/2022060413/5f119f5d911bf876bd7ad96b/html5/thumbnails/37.jpg)
Further Reading • OpenMP 3.1 Reference Manual
o http://www.openmp.org/mp-documents/OpenMP3.1.pdf
• 32 OpenMP traps for C++ Developers o https://software.intel.com/en-us/articles/32-openmp-traps-for-c-
developers
• Intel Guide for Developing Multithreaded Applications
o https://software.intel.com/en-us/articles/intel-guide-for-developing-multithreaded-applications
37