performance improvement techniques for software distributed shared memory
TRANSCRIPT
Performance Improvement Techniques for Software Distributed Shared Memory
Speaker:呂宗螢Adviser:梁文耀 老師Date : 2007/3/9
Embedded and Parallel Systems Lab
2
Paper
Byung-Hyun Yu; Werstein, P.; Purvis, M.; Cranefield, S. , “Performance improvement techniques for software distributed shared memory “ 11th International Conference on Parallel and Distributed Systems, 2005. Proceedings. Volume 1, 20-22 July 2005 Page(s):119 - 125 Vol. 1
Embedded and Parallel Systems Lab
3
Reference
L. Iftode, J.P. Singh and K. Li: "Scope Consistency: A Bridge between Release Consistency and Entry Consistency," In Proc. of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures, 1996.
Embedded and Parallel Systems Lab
4
Outline
Introduction Implementation of ScC model Diff Integration Technique Dynamic Home Migration Performance Evaluation Environment Performance Evaluation
Embedded and Parallel Systems Lab
5
Introduction
It is more convenient to implement parallel algorithms by using shared variables compared to message passing in which a programmer explicitly sends or receives data between.
DSM hasn’t been a major attraction to the parallel computing community due to its slow performance.
Embedded and Parallel Systems Lab
6
Introduction
Lazy home-based (LHB) Scope consistency (ScC) Diff integration technique which can solve most
diff accumulation problems A dynamic home migration protocol that solves
the static homes assignment problem in the original home-based protocol.
To evaluate the techniques, using well know DSM benchmark applications.
Embedded and Parallel Systems Lab
7
Implementation of ScC model
The LHB protocol does not send diffs to home nodes between two consecutive barriers.
Uses the update protocol during lock synchronization and the invalidation protocol for global scope during barrier synchronization.
Embedded and Parallel Systems Lab
8
Implementation of ScC model
Embedded and Parallel Systems Lab
9
Diff Integration Technique
Twinning occurs before diff application and not after a write page fault.
In this way, all previous diffs on the same page made in the same critical section are preserved and integrated into a single integrated diff.
Embedded and Parallel Systems Lab
10
Diff Integration Technique
Embedded and Parallel Systems Lab
11
Dynamic Home Migration
The home-based protocol has a weakness when a home node is allocated for pages that are not accessed or are less frequently accessed by the home node compared with other nodes.
General home migration techniques proposed provide a solution only for single writer DSM applications
To migrate homes at the time of lock synchronization (acq & rel)
Embedded and Parallel Systems Lab
12
Dynamic Home Migration
This paper propose a home migration technique which can decide optimum home nodes for multiple writer applications as well as single writer applications.
Uses a barrier process in which best home nodes are piggybacked with other coherence –related data, thus minimizing the home finding and data communication overheads.
Embedded and Parallel Systems Lab
13
Dynamic Home Migration
Embedded and Parallel Systems Lab
14
Dynamic Home Migration
1. All nodes record their dirty pages between two consecutive barriers.
2. Upon arrival at a barrier, all nodes create final NCS diffs.
3. All nodes except the barrier manager node send their invalidation notices including each dirty page diff size to the manager node.
4. Barrier manager receives a barrier arrival notice including a dirty page list and the size of each dirty page diff from every node.
Embedded and Parallel Systems Lab
15
Dynamic Home Migration
5. Whenever the manager receives the notice, it accumulates dirty pages, creates global dirty pages, and sets a home node which has the maximum diff size for each dirty page
6. Receiving the new home node list, all nodes update home nodes by sending their diffs to corresponding home.
Note That only the last lock owner updates the home nodes with its integrated diffs made in the lock synchronization if the last lock owner is not the home of the CS diff.
Embedded and Parallel Systems Lab
16
Performance Evaluation Environment
TM : ThreadMarks which is a home less LRC CHBLRC : conventional home-based LRC (eager, there is no diff
integration, static home) LHB (or LHB ScC) : lazy home-based Scope consistency
Network has 32 nodes 100Mbit switched ethernet 350 MHz Pentium II CPU 192 MB of memory Gentoo Linux with gcc3.3.2
Embedded and Parallel Systems Lab
17
Performance Evaluation Environment
PNN : parallel neural network application (lock & barrier) Barnes-Hut : Barnes-Hut N-Body algorithm (barrier) IS : Integer sort (barrier) Water : simulates water molecular dynamic (lock & barrier) SOR : Successive Over-Relaxation (barrier)
Embedded and Parallel Systems Lab
18
Performance Evaluation
Embedded and Parallel Systems Lab
19
Performance Evaluation
Embedded and Parallel Systems Lab
20
Performance Evaluation Diff integration Effect on PNN and Water
Embedded and Parallel Systems Lab
21
Thank you!