improving page migration분산처리

Post on 12-Apr-2017

271 Views

Category:

Data & Analytics

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Improving Performance of OpenMP for SMP Clusters through Overlapped Page Migra-

tions

Seoul National University 2008

Abstract• Costly page migration is a major obstacle

to integrating OpenMP and page-based software distributed shared memory (S-DSM)

• We propose the ‘collective-prefetch’ tech-nique, which overlaps page migrations themselves– execution time was reduced to 57%~79%.

Introduction • OpenMP start 1998• SMP clusters have become an attractive platform

for high performance computing. • most of them utilize page-based SDSM systems,

which keep memory consistency with a user-level page fault signal handler.

OpenMP 는 여러개의 프로세스가 공유된 메모리를 참조하는 환경에서 다중 스레드 병렬 프로그래밍을 위한 표준 스펙이다 . 여기에서 제공하는 API 를 통해 사용자들은 어플리케이션의 성능 향상을 얻을 수 있다 .

execution time• execution time of an application on a

page-based SDSM system decompose – page migration overhead– computation time– signal handler overhead– synchronization overhead

Problem & motivation• page migration

– procedure that sends a page request to the home node and receives the page reply for the request

• page migration overhead – net time to complete all page migrations

• SDSM systems suffer from poor performance due to high synchro-nization overhead and excessive accesses to the remote pages

solution techniques 5 1. Implementing synchronization directives effi-

ciently 2. Reducing the shared address space 3. Lessening the number of page migrations4. Reducing the page migration delay with fast

communication HW and page update protocol 5. Hiding the page migration overhead by overlap-

ping computation time and page migration over-head

Motivational Example • target system :

ParADE : OpenMP-based parallel programming environment

• application :FT that contains a computational kernel of a 3-D Fast Fourier Transform (FFT)-based spectral method.

Time Table• The number of page migrations and exe-

cution time breakdown of FT class A on Pa-rADE

• Node And Migration proportion( 비례 )

memory consistency mechanism• Any unprivileged access of a computation thread (at Node 0) to

a page invokes the segmentation fault signal handler and then the handler fetches the valid page negotiating with the owner (Node 1) of the page

• problem is that the handler is blocked for a long time, waiting for the reply.

Overlapping Page Migra-tions

• The master thread creates a computation thread, which asks page requests one by one.

• The computation thread sequentially performs computation and page migrations.

• Joins the master thread after the parallel region ends.

Overlapping Page Migra-tions

Propose solution :Collective Prefetch Technique

• All page requests initially arrive at a home node• The collective page server thread can send page replies

continuously without idle time between page replies.

Propose solution :Collective Prefetch Technique

• lessen this page migration overhead.• This technique analyzes the page access patterns and

prefetch remote pages by overlapping page migrations themselves.

• Before : send page requests one by one • Collective : send a list of page request once• The page server thread creates a collective page server

thread.

Experiment Environment• OS

– Red Hat 8.0 Linux 2.4.18-14 SMP kernel for ParADE– Red Hat 7.3 Linux 2.4.18-3 SMP kernel for Omni/SCASH– the SCore 5.6.1 package [18] installs Red Hat 7.3 and

Omni/SCASH together.• Each node

– dual 2.4GHz Intel Pentium 4 Xeon processors ,1GB memory.

• Linux cluster consisting of 8 nodes – interconnected by a Gigabit Ethernet switch.

Results • one computation thread

Results • two computation threads

translation• http://blog.naver.com/sogangori/

220479200673

top related