LOT-ECC: LOcalized and tiered reliability mechanisms for commodity memory systems

Update Item Information
Publication Type pre-print
School or College College of Engineering
Department Computing, School of
Creator Balasubramonian, Rajeev
Other Author Udipi, Aniruddha N.; Muralimanohar, Naveen; Davis, Al; Jouppi, Norman P.
Title LOT-ECC: LOcalized and tiered reliability mechanisms for commodity memory systems
Date 2012-01-01
Description Memory system reliability is a serious and growing concern in modern servers. Existing chipkill-level mem- ory protection mechanisms suffer from several draw- backs. They activate a large number of chips on ev- ery memory access - this increases energy consump- tion, and reduces performance due to the reduction in rank-level parallelism. Additionally, they increase ac- cess granularity, resulting in wasted bandwidth in the absence of sufficient access locality. They also restrict systems to use narrow-I/O x4 devices, which are known to be less energy-efficient than the wider x8 DRAM de- vices. In this paper, we present LOT-ECC, a local- ized and multi-tiered protection scheme that attempts to solve these problems. We separate error detection and error correction functionality, and employ simple checksum and parity codes effectively to provide strong fault-tolerance, while simultaneously simplifying imple- mentation. Data and codes are localized to the same DRAM row to improve access efficiency. We use sys- tem firmware to store correction codes in DRAM data memory and modify the memory controller to handle data mapping. We thus build an effective fault-tolerance mechanism that provides strong reliability guarantees, activates as few chips as possible (reducing power con- sumption by up to 44.8% and reducing latency by up to 46.9%), and reduces circuit complexity, all while work- ing with commodity DRAMs and operating systems. Fi- nally, we propose the novel concept of a heterogeneous DIMM that enables the extension of LOT-ECC to x16 and wider DRAM parts.
Type Text
Publisher Institute of Electrical and Electronics Engineers (IEEE)
First Page 285
Last Page 296
Dissertation Institution University of Utah
Language eng
Bibliographic Citation Udipi, A. N., Muralimanohar, N., Balsubramonian, R., Davis, A., & Jouppi, N. P. (2012). LOT-ECC: LOcalized and tiered reliability mechanisms for commodity memory systems. Proceedings - International Symposium on Computer Architecture, no. 6237025, 285-96.
Rights Management (c)2012 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Format Medium application/pdf
Format Extent 415,196 bytes
Identifier uspace,17708
ARK ark:/87278/s6k93s9k
Setname ir_uspace
ID 708077
Reference URL https://collections.lib.utah.edu/ark:/87278/s6k93s9k
Back to Search Results