Publication Type |
technical report |
School or College |
College of Engineering |
Department |
School of Computing |
Creator |
Carter, John B. |
Other Author |
Kuo, Chen-Chi; Kuramkote, Ravindra |
Title |
A comparison of software and hardware synchronization mechanisms for distributed shared memory multiprocessors |
Date |
1996 |
Description |
Efficient synchronization is an essential component of parallel computing. The designers of traditional multiprocessors have included hardware support only for simple operations such as compare-and-swap and load-linked/store-conditional, while high level synchronization primitives such as locks, barriers, and condition variables have been implemented in software [9,14,15]. With the advent of directory-based distributed shared memory (DSM) multiprocessors with significant flexibility in their cache controllers [7,12,17], it is worthwhile considering whether this flexibility should be used to support higher level synchronization primitives in hardware. In particular, as part of maintaining data consistency, these architectures maintain lists of processors with a copy of a given cache line, which is most of the hardware needed to implement distributed locks. We studied two software and four hardware implementations of locks and found that hardware implementation can reduce lock acquire and release times by 25-94% compared to well tuned software locks. In terms of macrobenchmark performance, hardware locks reduce application running times by up to 75% on a synthetic benchmark with heavy lock contention and by 3%-6% on a suite of SPLASH-2 benchmarks. In addition, emerging cache coherence protocols promise to increase the time spent synchronizing relative to the time spent accessing shared data, and our study shows that hardware locks can reduce SPLASH-2 execution times by up to 10-13% if the time spent accessing shared data is small. Although the overall performance impact of hardware lock mechanisms varies tremendously depending on the application, the added hardware complexity on a flexible architecture like FLASH [12] or Avalanche [7] is negligible, and thus hardware support for high level synchronization operations should be provided. |
Type |
Text |
Publisher |
University of Utah |
Subject |
Hardware locks |
Subject LCSH |
Parallel programming (Computer science); Synchronization; Synchronous circuits |
Language |
eng |
Bibliographic Citation |
Carter, J. B., Kuo, C.-C., & Kuramkote, R. (1996). A comparison of software and hardware synchronization mechanisms for distributed shared memory multiprocessors. 1-24. UUCS-96-011. |
Series |
University of Utah Computer Science Technical Report |
Relation is Part of |
ARPANET |
Rights Management |
©University of Utah |
Format Medium |
application/pdf |
Format Extent |
9,113,743 bytes |
Identifier |
ir-main,16231 |
ARK |
ark:/87278/s6223c1z |
Setname |
ir_uspace |
ID |
703945 |
Reference URL |
https://collections.lib.utah.edu/ark:/87278/s6223c1z |