Page 123

Contents | 123 of 135

Download PDF | | Reference URL | Gallery View | Parent Record

Publication Type	technical report
School or College	College of Engineering
Department	Computing, School of
Creator	Chandramouli, V.
Title	Design of a Self-Timed, Pipelined, Floating Point Multiplier in Gallium Arsenide
Date	1994-06
Description	This thesis presents the design of a self-timed, floating point multiplier in Gallium Arsenide (GaAs) technology. It implements the Institute of Electrical and Electronic Engineers (IEEE) single precision standard. Self-timed design methodology offers some advantages over a synchronous approach, especially at higher clock frequencies, which are quite common in technologies like GaAs. This thesis looks at the various issues involved by undertaking the design of a system of reasonable complexity. It begins with a study of existing techniques for parallel binary multiplication. Based on this study, an architecture comparison is presented. Then a new architecture is obtained by modifying an existing architecture and compared against other competing approaches in terms of area and latency.; Also, as part of this thesis, a new family of precharged circuits has been developed that allows for a delay insensitive implementation in GaAs. Since delay insensitive systems tend to reflect the average case behavior rather than the worst case, this approach will improve performance in cases where applicable.; Finally, based on SPICE simulations, the proposed multiplier was sound to have a latency of 24ms and a peak throughput of 76 MFLOPS. It is also shown to have an area about one-third and consume about half the power of an existing GaAs floating point multiplier, described in the 1992 IEEE GaAs Symposium.
Type	Text
Subject	gallium arsenide; floating point multiplier; GaAs; GaAs technology
Language	eng
Bibliographic Citation	Chandramouli, V. (1994). Design of a self-timed, pipelined, floating point multiplier in gallium arsenide.
Series	University of Utah Computer Science Technical Report
Relation is Part of	ARPANET
Format Medium	application/pdf
Format Extent	74,241,242 bytes
File Name	Chandramouli-Design_Of.pdf
Conversion Specifications	Original scanned with Kirtas 2400 and saved as 400 ppi uncompressed TIFF. PDF generated by Adobe Acrobat Pro X for CONTENTdm display
ARK	ark:/87278/s60g5mcm
Setname	ir_computersa
ID	97163
Reference URL	https://collections.lib.utah.edu/ark:/87278/s60g5mcm

Page Metadata

Title	Page 123
Setname	ir_computersa
ID	97150
OCR Text	Show 109 whereas the Vitesse process is a commercially available one. The architecture used in the existing multiplier is based on a full ( 4,2) tree based approach. It also uses Booth recoding. In Chapter 2, it was shown that the full tree based approach (with Booth recoding) achieves the minimum possible latency but at considerable investment in area. This is very clearly reflected in the table too. For n=24, the optimal approach would give the latency of the array to be 5 CSA delays whereas our approach has a latency of 10 CSA delays. In the table the latency for the new architecture is more than double possibly because of differences in the implementation of the Rounding/ CPA stage. The implementation details of this stage for the existing multiplier were not available. It is to be noted that under maximum utilization, the new approach fares much better with a throughput of 13ns. However, the area of the existing multiplier is more than three times the area of the new multiplier commensurate with the results of Chapter 2. This in spite of the fact that the new design has not used metal3 for routing outside of the cells because of an ACME constraint. The area estimate for the new multiplier was obtained as follows. From the ACME layout of the 8 x 24 multiplier an accurate estimation of the 24 x 24 array multiplier was obtained. Then the area of the Rounding/CPA stage was estimated by using the sizes of the precharged adder, other ACME cells, etc. The total area of the complete multiplier then came out to be about 18 sq.mm. However, to allow for a margin of error, this area was rounded up to 25 sq.mm. The logic family used in the existing multiplier was Source Follower FET Logic [1 0], a logic family that offers higher speed, better noise margins and consumes more power(about twice that of DCFL). Our approach uses DCFL whose merits are discussed in Chapter 3. The power estimation for our multiplier was obtained by counting the total number of gates and assuming 0.4m W /gate. Also, the various superbuffers used were taken into account and a margin for error was allowed.
Reference URL	https://collections.lib.utah.edu/ark:/87278/s60g5mcm/97150