performance improvement techniques for software distributed shared memory

Performance Improvement Techniques for Software Distributed Shared Memory

Speaker：呂宗螢Adviser：梁文耀　老師Date ： 2007/3/9

Embedded and Parallel Systems Lab

2

Paper

Byung-Hyun Yu; Werstein, P.; Purvis, M.; Cranefield, S. , “Performance improvement techniques for software distributed shared memory “ 11th International Conference on Parallel and Distributed Systems, 2005. Proceedings. Volume 1, 20-22 July 2005 Page(s):119 - 125 Vol. 1


3

Reference

L. Iftode, J.P. Singh and K. Li: "Scope Consistency: A Bridge between Release Consistency and Entry Consistency," In Proc. of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures, 1996.


4

Outline

Introduction Implementation of ScC model Diff Integration Technique Dynamic Home Migration Performance Evaluation Environment Performance Evaluation


5

Introduction

It is more convenient to implement parallel algorithms by using shared variables compared to message passing in which a programmer explicitly sends or receives data between.

DSM hasn’t been a major attraction to the parallel computing community due to its slow performance.


6

Introduction

Lazy home-based (LHB) Scope consistency (ScC) Diff integration technique which can solve most

diff accumulation problems A dynamic home migration protocol that solves

the static homes assignment problem in the original home-based protocol.

To evaluate the techniques, using well know DSM benchmark applications.


7

Implementation of ScC model

The LHB protocol does not send diffs to home nodes between two consecutive barriers.

Uses the update protocol during lock synchronization and the invalidation protocol for global scope during barrier synchronization.


8

Implementation of ScC model


9

Diff Integration Technique

Twinning occurs before diff application and not after a write page fault.

In this way, all previous diffs on the same page made in the same critical section are preserved and integrated into a single integrated diff.


10

Diff Integration Technique


11

Dynamic Home Migration

The home-based protocol has a weakness when a home node is allocated for pages that are not accessed or are less frequently accessed by the home node compared with other nodes.

General home migration techniques proposed provide a solution only for single writer DSM applications

To migrate homes at the time of lock synchronization (acq & rel)


12


This paper propose a home migration technique which can decide optimum home nodes for multiple writer applications as well as single writer applications.

Uses a barrier process in which best home nodes are piggybacked with other coherence –related data, thus minimizing the home finding and data communication overheads.


13



14


1. All nodes record their dirty pages between two consecutive barriers.

2. Upon arrival at a barrier, all nodes create final NCS diffs.

3. All nodes except the barrier manager node send their invalidation notices including each dirty page diff size to the manager node.

4. Barrier manager receives a barrier arrival notice including a dirty page list and the size of each dirty page diff from every node.


15


5. Whenever the manager receives the notice, it accumulates dirty pages, creates global dirty pages, and sets a home node which has the maximum diff size for each dirty page

6. Receiving the new home node list, all nodes update home nodes by sending their diffs to corresponding home.

Note That only the last lock owner updates the home nodes with its integrated diffs made in the lock synchronization if the last lock owner is not the home of the CS diff.


16

Performance Evaluation Environment

TM ： ThreadMarks which is a home less LRC CHBLRC ： conventional home-based LRC (eager, there is no diff

integration, static home) LHB (or LHB ScC) ： lazy home-based Scope consistency

Network has 32 nodes 100Mbit switched ethernet 350 MHz Pentium II CPU 192 MB of memory Gentoo Linux with gcc3.3.2


17

Performance Evaluation Environment

PNN ： parallel neural network application (lock & barrier) Barnes-Hut ： Barnes-Hut N-Body algorithm (barrier) IS ： Integer sort (barrier) Water ： simulates water molecular dynamic (lock & barrier) SOR ： Successive Over-Relaxation (barrier)


18

Performance Evaluation


19

Performance Evaluation


20

Performance Evaluation Diff integration Effect on PNN and Water


21

Thank you!

performance improvement techniques for software distributed shared memory

Technology