Page 121

Contents | 121 of 135

Download PDF | | Reference URL | Gallery View | Parent Record

Publication Type	technical report
School or College	College of Engineering
Department	Computing, School of
Creator	Chandramouli, V.
Title	Design of a Self-Timed, Pipelined, Floating Point Multiplier in Gallium Arsenide
Date	1994-06
Description	This thesis presents the design of a self-timed, floating point multiplier in Gallium Arsenide (GaAs) technology. It implements the Institute of Electrical and Electronic Engineers (IEEE) single precision standard. Self-timed design methodology offers some advantages over a synchronous approach, especially at higher clock frequencies, which are quite common in technologies like GaAs. This thesis looks at the various issues involved by undertaking the design of a system of reasonable complexity. It begins with a study of existing techniques for parallel binary multiplication. Based on this study, an architecture comparison is presented. Then a new architecture is obtained by modifying an existing architecture and compared against other competing approaches in terms of area and latency.; Also, as part of this thesis, a new family of precharged circuits has been developed that allows for a delay insensitive implementation in GaAs. Since delay insensitive systems tend to reflect the average case behavior rather than the worst case, this approach will improve performance in cases where applicable.; Finally, based on SPICE simulations, the proposed multiplier was sound to have a latency of 24ms and a peak throughput of 76 MFLOPS. It is also shown to have an area about one-third and consume about half the power of an existing GaAs floating point multiplier, described in the 1992 IEEE GaAs Symposium.
Type	Text
Subject	gallium arsenide; floating point multiplier; GaAs; GaAs technology
Language	eng
Bibliographic Citation	Chandramouli, V. (1994). Design of a self-timed, pipelined, floating point multiplier in gallium arsenide.
Series	University of Utah Computer Science Technical Report
Relation is Part of	ARPANET
Format Medium	application/pdf
Format Extent	74,241,242 bytes
File Name	Chandramouli-Design_Of.pdf
Conversion Specifications	Original scanned with Kirtas 2400 and saved as 400 ppi uncompressed TIFF. PDF generated by Adobe Acrobat Pro X for CONTENTdm display
ARK	ark:/87278/s60g5mcm
Setname	ir_computersa
ID	97163
Reference URL	https://collections.lib.utah.edu/ark:/87278/s60g5mcm

Page Metadata

Title	Page 121
Setname	ir_computersa
ID	97148
OCR Text	Show 107 5.6 Estimating the Delays The worst case for this stage of the multiplier would be as follows. We would have the worst case in the carry propagate addition. Initially, the overflow bit would be reset and we would go through the mux. However, in the worst case, this choice of the final result may not be correct. So we will have to go through another two gate delays (which is the same as a mux delay in DCFL) followed by another pass through the mux. This will be followed by another mux delay in the shifter. Thus we will have about four mux delays after the CPA in the worst case. At 25C and tt, the delay of a mux is about 200ps (one inverter and 1 OOf load). After layout, etc, this may be about 300ps. From the previous section we see that the delay for the CPA is 2. 7ns. Therefore, in the worst case the total delay would be about 3.9ns. To allow for a margin of error and delays through the control logic, this delay can be rounded upto 7ns. Thus, the worst case time for the final stage of the multiplier is expected to be 7ns at tt process corner and 25C temperature. Also note that, even in the best case, there will be two mux delays after the CPA. Thus, a bundled implementation will not be significantly worse. 5.7 Summary In this chapter some techniques for implementing the round to nearest/ even, which is the default rounding mode for the IEEE standard, were presented. It was shown how this could be merged with the final CPA thus saving area and delay. It was also shown how this could be implemented with an iterative architecture. This stage of the multiplier saw a possible use of the precharged circuits discussed in Chapter 3 of this thesis. The worst case delay of this stage was expected to be about 7ns under tt and 25C. Using the results of Chapter 4 and this chapter, the total latency for a single multiply will be about 24ns at tt process corner and 25°C.
Reference URL	https://collections.lib.utah.edu/ark:/87278/s60g5mcm/97148