Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Performance improvement techniques for software distributed shared memory


Published on

Byung-Hyun Yu; Werstein, P.; Purvis, M.; Cranefield, S. , “Performance improvement techniques for software distributed shared memory “ 11th International Conference on Parallel and Distributed Systems, 2005. Proceedings. Volume 1,  20-22 July 2005 Page(s):119 - 125 Vol. 1

Published in: Technology
  • Login to see the comments

  • Be the first to like this

Performance improvement techniques for software distributed shared memory

  1. 1. Performance Improvement Techniques for Software Distributed Shared Memory Speaker :呂宗螢 Adviser :梁文耀 老師 Date : 2007/3/9
  2. 2. Embedded and Parallel Systems Lab2 Paper  Byung-Hyun Yu; Werstein, P.; Purvis, M.; Cranefield, S. , “Performance improvement techniques for software distributed shared memory “ 11th International Conference on Parallel and Distributed Systems, 2005. Proceedings. Volume 1, 20-22 July 2005 Page(s):119 - 125 Vol. 1
  3. 3. Embedded and Parallel Systems Lab3 Reference  L. Iftode, J.P. Singh and K. Li: "Scope Consistency: A Bridge between Release Consistency and Entry Consistency," In Proc. of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures, 1996.
  4. 4. Embedded and Parallel Systems Lab4 Outline  Introduction  Implementation of ScC model  Diff Integration Technique  Dynamic Home Migration  Performance Evaluation Environment  Performance Evaluation
  5. 5. Embedded and Parallel Systems Lab5 Introduction  It is more convenient to implement parallel algorithms by using shared variables compared to message passing in which a programmer explicitly sends or receives data between.  DSM hasn’t been a major attraction to the parallel computing community due to its slow performance.
  6. 6. Embedded and Parallel Systems Lab6 Introduction  Lazy home-based (LHB)  Scope consistency (ScC)  Diff integration technique which can solve most diff accumulation problems  A dynamic home migration protocol that solves the static homes assignment problem in the original home-based protocol.  To evaluate the techniques, using well know DSM benchmark applications.
  7. 7. Embedded and Parallel Systems Lab7 Implementation of ScC model  The LHB protocol does not send diffs to home nodes between two consecutive barriers.  Uses the update protocol during lock synchronization and the invalidation protocol for global scope during barrier synchronization.
  8. 8. Embedded and Parallel Systems Lab8 Implementation of ScC model
  9. 9. Embedded and Parallel Systems Lab9 Diff Integration Technique  Twinning occurs before diff application and not after a write page fault.  In this way, all previous diffs on the same page made in the same critical section are preserved and integrated into a single integrated diff.
  10. 10. Embedded and Parallel Systems Lab10 Diff Integration Technique
  11. 11. Embedded and Parallel Systems Lab11 Dynamic Home Migration  The home-based protocol has a weakness when a home node is allocated for pages that are not accessed or are less frequently accessed by the home node compared with other nodes.  General home migration techniques proposed provide a solution only for single writer DSM applications  To migrate homes at the time of lock synchronization (acq & rel)
  12. 12. Embedded and Parallel Systems Lab12 Dynamic Home Migration  This paper propose a home migration technique which can decide optimum home nodes for multiple writer applications as well as single writer applications.  Uses a barrier process in which best home nodes are piggybacked with other coherence –related data, thus minimizing the home finding and data communication overheads.
  13. 13. Embedded and Parallel Systems Lab13 Dynamic Home Migration
  14. 14. Embedded and Parallel Systems Lab14 Dynamic Home Migration 1. All nodes record their dirty pages between two consecutive barriers. 2. Upon arrival at a barrier, all nodes create final NCS diffs. 3. All nodes except the barrier manager node send their invalidation notices including each dirty page diff size to the manager node. 4. Barrier manager receives a barrier arrival notice including a dirty page list and the size of each dirty page diff from every node.
  15. 15. Embedded and Parallel Systems Lab15 Dynamic Home Migration 5. Whenever the manager receives the notice, it accumulates dirty pages, creates global dirty pages, and sets a home node which has the maximum diff size for each dirty page 6. Receiving the new home node list, all nodes update home nodes by sending their diffs to corresponding home.  Note That only the last lock owner updates the home nodes with its integrated diffs made in the lock synchronization if the last lock owner is not the home of the CS diff.
  16. 16. Embedded and Parallel Systems Lab16 Performance Evaluation Environment  TM : ThreadMarks which is a home less LRC  CHBLRC : conventional home-based LRC (eager, there is no diff integration, static home)  LHB (or LHB ScC) : lazy home-based Scope consistency  Network has 32 nodes  100Mbit switched ethernet  350 MHz Pentium II CPU  192 MB of memory  Gentoo Linux with gcc3.3.2
  17. 17. Embedded and Parallel Systems Lab17 Performance Evaluation Environment  PNN : parallel neural network application (lock & barrier)  Barnes-Hut : Barnes-Hut N-Body algorithm (barrier)  IS : Integer sort (barrier)  Water : simulates water molecular dynamic (lock & barrier)  SOR : Successive Over-Relaxation (barrier)
  18. 18. Embedded and Parallel Systems Lab18 Performance Evaluation
  19. 19. Embedded and Parallel Systems Lab19 Performance Evaluation
  20. 20. Embedded and Parallel Systems Lab20 Performance Evaluation  Diff integration Effect on PNN and Water
  21. 21. Embedded and Parallel Systems Lab21 Thank you!