improving page migration분산처리

17
Improving Performance of OpenMP for SMP Clusters through Overlapped Page Migrations Seoul National University 2008

Upload: park-chunduck

Post on 12-Apr-2017

271 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Improving page migration분산처리

Improving Performance of OpenMP for SMP Clusters through Overlapped Page Migra-

tions

Seoul National University 2008

Page 2: Improving page migration분산처리

Abstract• Costly page migration is a major obstacle

to integrating OpenMP and page-based software distributed shared memory (S-DSM)

• We propose the ‘collective-prefetch’ tech-nique, which overlaps page migrations themselves– execution time was reduced to 57%~79%.

Page 3: Improving page migration분산처리

Introduction • OpenMP start 1998• SMP clusters have become an attractive platform

for high performance computing. • most of them utilize page-based SDSM systems,

which keep memory consistency with a user-level page fault signal handler.

OpenMP 는 여러개의 프로세스가 공유된 메모리를 참조하는 환경에서 다중 스레드 병렬 프로그래밍을 위한 표준 스펙이다 . 여기에서 제공하는 API 를 통해 사용자들은 어플리케이션의 성능 향상을 얻을 수 있다 .

Page 4: Improving page migration분산처리

execution time• execution time of an application on a

page-based SDSM system decompose – page migration overhead– computation time– signal handler overhead– synchronization overhead

Page 5: Improving page migration분산처리

Problem & motivation• page migration

– procedure that sends a page request to the home node and receives the page reply for the request

• page migration overhead – net time to complete all page migrations

• SDSM systems suffer from poor performance due to high synchro-nization overhead and excessive accesses to the remote pages

Page 6: Improving page migration분산처리

solution techniques 5 1. Implementing synchronization directives effi-

ciently 2. Reducing the shared address space 3. Lessening the number of page migrations4. Reducing the page migration delay with fast

communication HW and page update protocol 5. Hiding the page migration overhead by overlap-

ping computation time and page migration over-head

Page 7: Improving page migration분산처리

Motivational Example • target system :

ParADE : OpenMP-based parallel programming environment

• application :FT that contains a computational kernel of a 3-D Fast Fourier Transform (FFT)-based spectral method.

Page 8: Improving page migration분산처리

Time Table• The number of page migrations and exe-

cution time breakdown of FT class A on Pa-rADE

• Node And Migration proportion( 비례 )

Page 9: Improving page migration분산처리

memory consistency mechanism• Any unprivileged access of a computation thread (at Node 0) to

a page invokes the segmentation fault signal handler and then the handler fetches the valid page negotiating with the owner (Node 1) of the page

• problem is that the handler is blocked for a long time, waiting for the reply.

Page 10: Improving page migration분산처리

Overlapping Page Migra-tions

• The master thread creates a computation thread, which asks page requests one by one.

• The computation thread sequentially performs computation and page migrations.

• Joins the master thread after the parallel region ends.

Page 11: Improving page migration분산처리

Overlapping Page Migra-tions

Page 12: Improving page migration분산처리

Propose solution :Collective Prefetch Technique

• All page requests initially arrive at a home node• The collective page server thread can send page replies

continuously without idle time between page replies.

Page 13: Improving page migration분산처리

Propose solution :Collective Prefetch Technique

• lessen this page migration overhead.• This technique analyzes the page access patterns and

prefetch remote pages by overlapping page migrations themselves.

• Before : send page requests one by one • Collective : send a list of page request once• The page server thread creates a collective page server

thread.

Page 14: Improving page migration분산처리

Experiment Environment• OS

– Red Hat 8.0 Linux 2.4.18-14 SMP kernel for ParADE– Red Hat 7.3 Linux 2.4.18-3 SMP kernel for Omni/SCASH– the SCore 5.6.1 package [18] installs Red Hat 7.3 and

Omni/SCASH together.• Each node

– dual 2.4GHz Intel Pentium 4 Xeon processors ,1GB memory.

• Linux cluster consisting of 8 nodes – interconnected by a Gigabit Ethernet switch.

Page 15: Improving page migration분산처리

Results • one computation thread

Page 16: Improving page migration분산처리

Results • two computation threads

Page 17: Improving page migration분산처리

translation• http://blog.naver.com/sogangori/

220479200673