ccs hpc Æ ä ô Ë å Ö ô z 62ì `ú...3 62ì ÿ •! z û v m ®Ñ 4< w &© - í Ì ô É...
TRANSCRIPT
2
•!•!•!•!•!•!
3
•!
–!–!–!
•! parallel processing v.s. concurrent processing –!
concurrent processing( ) –!
parallel processing( )
•!–! CPU, memory, disk, network
–! CPU
4
•!
•! O(N)–!
1 N 3O(N^3)
–!N O
(N^3)–! web
N O(N^2)•!
•!
5
•! 1.5 2
•! IC
1.52
•! Intel
•!
•!
6
TOP500 http://www.top500.org
#1
#500
#1 #500
7
•!–!
domain decomposition
–!parameter search
•!–! EP (Embarassingly Parallel parameter
search–! data parallel domain decomposion–! pipeline–! master/worker
8
EP
•!–!
–! 1/4 !
–! (x,y)
–! C C/N
9
•! domain decomposition –!
–! EP
interaction–!
for(t=0; t < T; t++){ for(i=0; i < N; i++) a[i] = b[i-1] + 2*b[i] + b[i+1]; for(i=0; i < N; i++) b[i] = a[i]; }
b[…] (i)
10
domain decomposition
for(t=0; t < T; t++){ for(i=0; i < N; i++) a(i) = b(i-1) + 2*b(i) + b(i+1); // for(i=0; i < N; i++) b(i) = a(i); //
}
.... b(5) b(6) b(7) b(8) b(9) ....
.... a(5) a(6) a(7) a(8) a(9) ....
t
j j+1
11
•!
•!for(i=0; i < N; i++){ a[i]=b[i] * c[i] + s * d[i]; }
b[i]
c[i]
[i]s( )
a[i]
12
master/worker
•! master worker master>>worker
•! master worker
•! worker master
master:: worker:: // give a job to each worker while(1){ while(1){ // receive a job from master // receive a worker’s result // process the job // give the next job to that worker // send the result to master } }
13
master/worker•!•! EP
master
worker#1
worker#2
worker#3
worker#N job pool (EP)
job
14
•!
•!domain decomposition
•! (communication)–!
–! :synchronizationload imbalance
•!
15
Proc-A
computation
send data to proc-B & receive data from proc-B
computation
… … …
Proc-B
computation
send data to proc-A & receive data from proc-A
computation
… … …
Proc-A, Proc-B
16
Proc-A
computation
send data to proc-B
& receive data from proc-B
computation
… … …
Proc-B
computation
send data to proc-A & receive data from proc-A
computation
… … …
Proc-A, Proc-B
17
•!–!
message passing (send, receive, ...) –!
shared memory access (write read, ...) •!
–! message passing
–! shared memory access
–!
18
•!–!
–!•!
–!
–!
19
•! (
•!
•!•!
scalability•!•! degree of parallelism
–!
–!
20
•!–! 1 T–! p T(p)–! s(p)=T/T(p)
s(p) p s(p) 1
–! s(p)=p p p
p
s(p)
s(p)=plinear speed-up
p saturation
21
•!–! s(p) p–! s(p)=p
–! e(p)=s(p)/p e(p) p 1 1
e(p)
e(p)=1linear speed-up
p saturation
p
1
22
•!–!
•!–! T TP
TST=TP+TS
–! TP p pT(p)
T(p)=TS+TP/p –! p
T(p, limit p!") = TS
e(p) = T(p) / p = TS / p (p!") = 0
p TS0
23
Scalable
•! TS scalability
•!TS TPscalable(
•! scalableTS (p)
•! scale
granularity
24
granularity
•!–!–!
•!
•!–! (coarse grain) –! (fine grain)
•!
scalable
25
domain decomposition
•! 2•! 2 2
.... b(5) b(6) b(7) b(8) b(9) ....
.... a(5) a(6) a(7) a(8) a(9) ....
•! 1•! 2 2
.... b(5) b(6) b(7) b(8) b(9) ....
.... a(5) a(6) a(7) a(8) a(9) ....
26
sensitivity •!
weak scaling strong scaling
•!
•!
•!
•!
•!
27
•!
•!
•!–! domain–! parameter search parameter
–!
28
•! domain decomposition
•!
•!
29
cut-off MD •! MD (Molecular Dynamics)
–! n P
–!
cut-off
•! interaction
cut-off
–! domain decompositioncell
cell cut-offcellcell mapping method
cut-off
30
cut-off MD•!
cell
•! cell
•!cell
•!–! 1) /cell
cell–! 2) cell
cell block cyclic
–! 3) cell
cell
cell
31
cut-off MD•! 1)
cellcell
•! 2) cyclic cell
•! 3) cell
•!
•!
32
•!
•!•!•!•! scalability•!
1. T1 [sec]
pp Tcomm(p) [sec]
p(1) p s(p) e(p)
(2) Tcomm(p) p Tcomm(p)= p 80%T1 p
(3) T1 p
2. N3 n3 domain decomposition 1
325
1 3
33