Page 38

Contents | 38 of 135

Download PDF | | Reference URL | Gallery View | Parent Record

Publication Type	technical report
School or College	College of Engineering
Department	Computing, School of
Creator	Chandramouli, V.
Title	Design of a Self-Timed, Pipelined, Floating Point Multiplier in Gallium Arsenide
Date	1994-06
Description	This thesis presents the design of a self-timed, floating point multiplier in Gallium Arsenide (GaAs) technology. It implements the Institute of Electrical and Electronic Engineers (IEEE) single precision standard. Self-timed design methodology offers some advantages over a synchronous approach, especially at higher clock frequencies, which are quite common in technologies like GaAs. This thesis looks at the various issues involved by undertaking the design of a system of reasonable complexity. It begins with a study of existing techniques for parallel binary multiplication. Based on this study, an architecture comparison is presented. Then a new architecture is obtained by modifying an existing architecture and compared against other competing approaches in terms of area and latency.; Also, as part of this thesis, a new family of precharged circuits has been developed that allows for a delay insensitive implementation in GaAs. Since delay insensitive systems tend to reflect the average case behavior rather than the worst case, this approach will improve performance in cases where applicable.; Finally, based on SPICE simulations, the proposed multiplier was sound to have a latency of 24ms and a peak throughput of 76 MFLOPS. It is also shown to have an area about one-third and consume about half the power of an existing GaAs floating point multiplier, described in the 1992 IEEE GaAs Symposium.
Type	Text
Subject	gallium arsenide; floating point multiplier; GaAs; GaAs technology
Language	eng
Bibliographic Citation	Chandramouli, V. (1994). Design of a self-timed, pipelined, floating point multiplier in gallium arsenide.
Series	University of Utah Computer Science Technical Report
Relation is Part of	ARPANET
Format Medium	application/pdf
Format Extent	74,241,242 bytes
File Name	Chandramouli-Design_Of.pdf
Conversion Specifications	Original scanned with Kirtas 2400 and saved as 400 ppi uncompressed TIFF. PDF generated by Adobe Acrobat Pro X for CONTENTdm display
ARK	ark:/87278/s60g5mcm
Setname	ir_computersa
ID	97163
Reference URL	https://collections.lib.utah.edu/ark:/87278/s60g5mcm

Page Metadata

Title	Page 38
Setname	ir_computersa
ID	97065
OCR Text	Show 24 2.6.1 Higher Order Counters The area can be made smaller without sacrificing speed by using higher order counters such as the (6,2) counters and the (8,2) counters [31]. Figure 2.8(a) and Figure 2.8(b) show how to construct such counters using (3,2) and ( 4,2) counters. For a 24 x 24 partial product reduction, the latency is still 7 CSA stage delays which is same as the Wallace tree approach. A rough estimate of the area indicates this scheme would take 130 x 320 PPL cells (2.0mm x 8.9mm) which is 16 percent smaller than the earlier approach and is about three times bigger than the full array based approach. The approach using (8,2) counters does not seem to be significantly better than the one using (6,2) counters, so it is not described here. This is because as shown in Figure 2.8(b ), it requires two full adders and a half adder. This thus increases the width of the bit slice thus offsetting the advantage of reducing more partial products. We could build the (8,2) counter using two ( 4,2) counters but that scheme would be slower. We could pipeline the multiplier by putting a latch at the end so that when the final carry propagate addition (CPA) is taking place we can use the tree for starting another multiply. Since we get more than three times the speed of the array approach, this seems like a reasonable approach in technologies where the chip real estate is not very expensive like CMOS. However, in GaAs chip area is more expensive and we should probably look at other schemes. 2.6.2 Variations of the Theme Ideally we would like to have the area of the array scheme and the latency of the tree scheme. This is possible to a great extent if we use partial trees. Instead of reducing all of the partial products at once, we can reduce them a few at a time thus iterating over the tree more than once. Santoro has studied and implemented this approach using ( 4,2) counters in his PhD thesis [34]. His is a pipelined approach
Reference URL	https://collections.lib.utah.edu/ark:/87278/s60g5mcm/97065