introduction 因為需要 large amount of computation. why parallel?

56
Introduction 因因因因 large amount of computation. Why parallel?

Upload: wilfred-edwards

Post on 30-Dec-2015

279 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Introduction 因為需要 large amount of computation. Why parallel?

Introduction

因為需要 large amount of computation.

Why parallel?

Page 2: Introduction 因為需要 large amount of computation. Why parallel?

Example

1015/107 10≒ 8 秒108/105 10≒ 3 天≒ 3 年

用在外科手術時,每秒約需 1015 個計算3D 立體影像

一般的電腦每秒約可算 107 個計算

Page 3: Introduction 因為需要 large amount of computation. Why parallel?

其他需要大量計算的領域Aircraft TestingNew Drug DevelopmentOil ExplorationModeling Fusion ReactorsEconomic PlanningCryptanalysisManaging Large DatabasesAstronomyBiomedical Analysis…

Page 4: Introduction 因為需要 large amount of computation. Why parallel?

電腦的速度

光速 : 3×108 m/s

Components 太近 : 會互相干擾

每 3 、 4 年增快一倍

Unfortunately

It is evident that this trend will soon come to an end.

A simple law of physics

Page 5: Introduction 因為需要 large amount of computation. Why parallel?

Parallelism

人 (CPU) 海戰術

目前許多電腦中都不只一顆 CPU

Page 6: Introduction 因為需要 large amount of computation. Why parallel?

Models of Computation

Flynn's classification:SISD(Single Instruction, Single Data)MISD(Multiple Instruction, Single Data)SIMD(Single Instruction, Multiple Data)MIMD(Multiple Instruction, Multiple Data)

Page 7: Introduction 因為需要 large amount of computation. Why parallel?

SISD Computers

Sequential(serial) Computation

Page 8: Introduction 因為需要 large amount of computation. Why parallel?

Example

summation of n numbers:

Sum = 0For I = 1 to n Sum = Sum + INext

need n-1 additionsO(n) time

Page 9: Introduction 因為需要 large amount of computation. Why parallel?

MISD Computers

Page 10: Introduction 因為需要 large amount of computation. Why parallel?

Example

test whether a positive integer is prime or not.n is prime: no divisors except 1 & n.n is composite: otherwiseEach processor tests a number between 1 and n.

Each processor needs O(1) operation. O(1) time

Page 11: Introduction 因為需要 large amount of computation. Why parallel?

SIMD Computers

Page 12: Introduction 因為需要 large amount of computation. Why parallel?

PRAM(Parallel Random Access Machine)

Share-Memory(SM) SIMD computers

Page 13: Introduction 因為需要 large amount of computation. Why parallel?

How does a processor i pass a number to another processor j?

O(1) time

Page 14: Introduction 因為需要 large amount of computation. Why parallel?

Write Read

1. EREW (Exclusive Read, Exclusive Write)2. CREW (Concurrent Read, Exclusive Write)3. ERCW (Exclusive Read, Concurrent Write)4. CRCW (Concurrent Read, Concurrent

Write)

Page 15: Introduction 因為需要 large amount of computation. Why parallel?

How to resolve conflicts in CRCW?

(b) the values to be written in all processorsare equal, otherwise access is denied to all processor.

(c) sum the values to be written.

(a) the smallest-numbered processor is allowed to write, and access is denied to all other processor.

Page 16: Introduction 因為需要 large amount of computation. Why parallel?

Simulating multiple accesses on an EREW computer:

(i) N multiple accesses (read) : broadcasting

procedure:1. P1→P22. P1,P2, → P3,P43. P1,P2,P3,P4 → P5,P6,P7,P8 : :

O(log N) steps

Page 17: Introduction 因為需要 large amount of computation. Why parallel?

1 2 3 4 5 6 7 8

5

D

P1 P2 P3 P4 P5 P6 P7 P8

A5

5 5

5

5 5

5 5

5 5 5 5

5 5 5 5

Page 18: Introduction 因為需要 large amount of computation. Why parallel?

(ii) m out of N multiple access(read), m N≦ Each memory location is associated with a binary

tree.

Page 19: Introduction 因為需要 large amount of computation. Why parallel?

[1] [2] [4] [7]

[1] [4] [7]

[1] [7]

d

P1 P2 P4P7

[1] [2] [4] [7][1] [2] [4]

Page 20: Introduction 因為需要 large amount of computation. Why parallel?

[1] [4]

[2]

[1]

[7]

d

P1 P2 P4P7

d[1]

[1]d

[7]d

[4]

[7]

d d

d dd d

Page 21: Introduction 因為需要 large amount of computation. Why parallel?

Dividing a shared memory into blocks

Page 22: Introduction 因為需要 large amount of computation. Why parallel?

Interconnection networks SIMD computers

The memory is distributed among the N processors.

Every pair of processors can communicate with each other if they are connected by a line(link).

Instantaneous communication between severalpairs of processors is allowed.

Page 23: Introduction 因為需要 large amount of computation. Why parallel?

(1) fully connected networks

There are 2

)1( NN links.

Page 24: Introduction 因為需要 large amount of computation. Why parallel?

(2) linear array(1-dimensional array) 1-D array

Pi-1 Pi Pi+1 , 2≦i≦ N–1

Page 25: Introduction 因為需要 large amount of computation. Why parallel?

(3) two-dimensional array (2-D mesh-connected)

Page 26: Introduction 因為需要 large amount of computation. Why parallel?

P(j,k) has links connecting to P(j+1,k), P(j-1,k), P(j,k+1) and P(j,k-1).

Page 27: Introduction 因為需要 large amount of computation. Why parallel?

(4) tree connection (tree machines)

Page 28: Introduction 因為需要 large amount of computation. Why parallel?

The sons of Pi: P2i and P2i+1

The parent of Pi: Pi/2

Page 29: Introduction 因為需要 large amount of computation. Why parallel?

(5) shuffle-exchange connection

shuffle links:Pi Pj

j=2i if 0 i 12N

j=2i-(N-1) if 2N

exchange links:Pi Pi+1 if i is even

i N-1

Page 30: Introduction 因為需要 large amount of computation. Why parallel?

(6) cube connection (n-cube,hypercube)

3-cube:

Page 31: Introduction 因為需要 large amount of computation. Why parallel?
Page 32: Introduction 因為需要 large amount of computation. Why parallel?

4-cube:

Page 33: Introduction 因為需要 large amount of computation. Why parallel?

An n-cube connected network consists of 2n node. (N=2n)

binary representation of node i:in-1 in-2 ... ij ... i1 i0

connects toin-1 in-2 ... ... i1 i0

Each processor has n links.

ji

Page 34: Introduction 因為需要 large amount of computation. Why parallel?

adding 8 numbers:

Page 35: Introduction 因為需要 large amount of computation. Why parallel?

This concept can be implemented on a tree machine.

This concept can also be implemented on other models.

time: O(log n) , n:# of inputs

m sets, each of n numbers:pipelining: log n+(m-1) steps.

Page 36: Introduction 因為需要 large amount of computation. Why parallel?

MIMD Computers

Page 37: Introduction 因為需要 large amount of computation. Why parallel?

MIMD computers sharing a common memory:multiprocessors, tightly coupled machines

MIMD computers with an interconnection network:multicomputers, loosely coupled machines, distributed system

Page 38: Introduction 因為需要 large amount of computation. Why parallel?

Analyzing Algorithms

● running time● speedup ● cost● efficiency

以工程為例 :

● 到完工所花的時間● 和一個人所花的時間比較● 人年 ( 或人月 )● 一人之人年 /n 人之人年

Page 39: Introduction 因為需要 large amount of computation. Why parallel?

Running Time

從 the first processor begins computing 到 the last processor ends computing

的時間

Page 40: Introduction 因為需要 large amount of computation. Why parallel?

Running Time

Sum = 0For I = 1 to n Sum = Sum + INext

1 次N+1 次N 次N+1 次

共 3N+3 次 = O(N)

Page 41: Introduction 因為需要 large amount of computation. Why parallel?

g(n)=O(f(n)): at most f(n),

,, 0nc

0nnfor cf(n),g(n) g(n) = 3n + 3 f(n) = n

3n+3 5≦ n 當 n 2≧ 時

c n0

Page 42: Introduction 因為需要 large amount of computation. Why parallel?

Upper Bound

g(n)=O(f(n)): at most f(n),

,, 0nc

0nnfor cf(n),g(n)

Page 43: Introduction 因為需要 large amount of computation. Why parallel?

g(n)cf(n)

n0

Page 44: Introduction 因為需要 large amount of computation. Why parallel?

g(n) = 3n + 3 f(n) = n

3n+3 ≧ 2 n 當 n 1≧ 時

c n0

g(n)= Ω(f(n)): at least f(n),

,, 0nc0nnfor ),()( ncfng

Page 45: Introduction 因為需要 large amount of computation. Why parallel?

Lower Bound

g(n)= Ω(f(n)): at least f(n),

,, 0nc

0nnfor ),()( ncfng

Page 46: Introduction 因為需要 large amount of computation. Why parallel?

g(n)

cf(n)

n0

Page 47: Introduction 因為需要 large amount of computation. Why parallel?

adding 8 numbers:

time: O(log n)

Page 48: Introduction 因為需要 large amount of computation. Why parallel?

The time complexity of heapsort is O(n log n).

The lower bound of sorting is Ω(n log n).

(why?)

Thus, heapsort is optimal.

The upper bound of sorting is O(n log n).

Page 49: Introduction 因為需要 large amount of computation. Why parallel?

For matrix multiplication, the lower bound is Ω(n2) and the upper bound is O(n2.5). (There exists such an algorithm.)

Page 50: Introduction 因為需要 large amount of computation. Why parallel?

speedup =

algo. parallelfor time

algo. sequentialfastest for the time

即 : 一個 CPU 所花的時間 / 平行所花的時間

Page 51: Introduction 因為需要 large amount of computation. Why parallel?

Adding n number on a tree machine

Time: O(log n)Processor: n-1

speedup=O(n/log n)

Page 52: Introduction 因為需要 large amount of computation. Why parallel?

Bitonic sorting

speedup=O(nlogn/log2 n) =O(n/log n)

Time: O(log2 n)Processor: O(n)

Page 53: Introduction 因為需要 large amount of computation. Why parallel?

cost = parallel time * # of processor = t(n) * p(n)

Page 54: Introduction 因為需要 large amount of computation. Why parallel?

s(n): time for the fastest sequential algo.

If s(n)=t(n)*p(n), the algorithm is cost optimal (achieves optimal speedup).

Page 55: Introduction 因為需要 large amount of computation. Why parallel?

adding n numbersprocessor: O(n/log n)}cost optimaltime: O(log n)Each processor first adds logn numbers sequentially.

Page 56: Introduction 因為需要 large amount of computation. Why parallel?

efficiency =

algo. parallel oft cos

algo. sequentialfastest for the time

e.g. bitonic sorting efficiency:

n log

1