performance improvement techniques for software distributed shared memory

21
Performance Improvement Techniques for Software Distributed Shared Memory Speaker 呂呂呂 Adviser 呂呂呂 呂呂 Date 2007/3/9

Upload: zongying-lyu

Post on 09-Apr-2017

401 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Performance improvement techniques for software distributed shared memory

Performance Improvement Techniques for Software Distributed Shared Memory

Speaker:呂宗螢Adviser:梁文耀 老師Date : 2007/3/9

Page 2: Performance improvement techniques for software distributed shared memory

Embedded and Parallel Systems Lab

2

Paper

Byung-Hyun Yu; Werstein, P.; Purvis, M.; Cranefield, S. , “Performance improvement techniques for software distributed shared memory “ 11th International Conference on Parallel and Distributed Systems, 2005. Proceedings. Volume 1,  20-22 July 2005 Page(s):119 - 125 Vol. 1

Page 3: Performance improvement techniques for software distributed shared memory

Embedded and Parallel Systems Lab

3

Reference

L. Iftode, J.P. Singh and K. Li: "Scope Consistency: A Bridge between Release Consistency and Entry Consistency," In Proc. of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures, 1996.

Page 4: Performance improvement techniques for software distributed shared memory

Embedded and Parallel Systems Lab

4

Outline

Introduction Implementation of ScC model Diff Integration Technique Dynamic Home Migration Performance Evaluation Environment Performance Evaluation

Page 5: Performance improvement techniques for software distributed shared memory

Embedded and Parallel Systems Lab

5

Introduction

It is more convenient to implement parallel algorithms by using shared variables compared to message passing in which a programmer explicitly sends or receives data between.

DSM hasn’t been a major attraction to the parallel computing community due to its slow performance.

Page 6: Performance improvement techniques for software distributed shared memory

Embedded and Parallel Systems Lab

6

Introduction

Lazy home-based (LHB) Scope consistency (ScC) Diff integration technique which can solve most

diff accumulation problems A dynamic home migration protocol that solves

the static homes assignment problem in the original home-based protocol.

To evaluate the techniques, using well know DSM benchmark applications.

Page 7: Performance improvement techniques for software distributed shared memory

Embedded and Parallel Systems Lab

7

Implementation of ScC model

The LHB protocol does not send diffs to home nodes between two consecutive barriers.

Uses the update protocol during lock synchronization and the invalidation protocol for global scope during barrier synchronization.

Page 8: Performance improvement techniques for software distributed shared memory

Embedded and Parallel Systems Lab

8

Implementation of ScC model

Page 9: Performance improvement techniques for software distributed shared memory

Embedded and Parallel Systems Lab

9

Diff Integration Technique

Twinning occurs before diff application and not after a write page fault.

In this way, all previous diffs on the same page made in the same critical section are preserved and integrated into a single integrated diff.

Page 10: Performance improvement techniques for software distributed shared memory

Embedded and Parallel Systems Lab

10

Diff Integration Technique

Page 11: Performance improvement techniques for software distributed shared memory

Embedded and Parallel Systems Lab

11

Dynamic Home Migration

The home-based protocol has a weakness when a home node is allocated for pages that are not accessed or are less frequently accessed by the home node compared with other nodes.

General home migration techniques proposed provide a solution only for single writer DSM applications

To migrate homes at the time of lock synchronization (acq & rel)

Page 12: Performance improvement techniques for software distributed shared memory

Embedded and Parallel Systems Lab

12

Dynamic Home Migration

This paper propose a home migration technique which can decide optimum home nodes for multiple writer applications as well as single writer applications.

Uses a barrier process in which best home nodes are piggybacked with other coherence –related data, thus minimizing the home finding and data communication overheads.

Page 13: Performance improvement techniques for software distributed shared memory

Embedded and Parallel Systems Lab

13

Dynamic Home Migration

Page 14: Performance improvement techniques for software distributed shared memory

Embedded and Parallel Systems Lab

14

Dynamic Home Migration

1. All nodes record their dirty pages between two consecutive barriers.

2. Upon arrival at a barrier, all nodes create final NCS diffs.

3. All nodes except the barrier manager node send their invalidation notices including each dirty page diff size to the manager node.

4. Barrier manager receives a barrier arrival notice including a dirty page list and the size of each dirty page diff from every node.

Page 15: Performance improvement techniques for software distributed shared memory

Embedded and Parallel Systems Lab

15

Dynamic Home Migration

5. Whenever the manager receives the notice, it accumulates dirty pages, creates global dirty pages, and sets a home node which has the maximum diff size for each dirty page

6. Receiving the new home node list, all nodes update home nodes by sending their diffs to corresponding home.

Note That only the last lock owner updates the home nodes with its integrated diffs made in the lock synchronization if the last lock owner is not the home of the CS diff.

Page 16: Performance improvement techniques for software distributed shared memory

Embedded and Parallel Systems Lab

16

Performance Evaluation Environment

TM : ThreadMarks which is a home less LRC CHBLRC : conventional home-based LRC (eager, there is no diff

integration, static home) LHB (or LHB ScC) : lazy home-based Scope consistency

Network has 32 nodes 100Mbit switched ethernet 350 MHz Pentium II CPU 192 MB of memory Gentoo Linux with gcc3.3.2

Page 17: Performance improvement techniques for software distributed shared memory

Embedded and Parallel Systems Lab

17

Performance Evaluation Environment

PNN : parallel neural network application (lock & barrier) Barnes-Hut : Barnes-Hut N-Body algorithm (barrier) IS : Integer sort (barrier) Water : simulates water molecular dynamic (lock & barrier) SOR : Successive Over-Relaxation (barrier)

Page 18: Performance improvement techniques for software distributed shared memory

Embedded and Parallel Systems Lab

18

Performance Evaluation

Page 19: Performance improvement techniques for software distributed shared memory

Embedded and Parallel Systems Lab

19

Performance Evaluation

Page 20: Performance improvement techniques for software distributed shared memory

Embedded and Parallel Systems Lab

20

Performance Evaluation Diff integration Effect on PNN and Water

Page 21: Performance improvement techniques for software distributed shared memory

Embedded and Parallel Systems Lab

21

Thank you!