| OCR Text |
Show 24 2.6.1 Higher Order Counters The area can be made smaller without sacrificing speed by using higher order counters such as the (6,2) counters and the (8,2) counters [31]. Figure 2.8(a) and Figure 2.8(b) show how to construct such counters using (3,2) and ( 4,2) counters. For a 24 x 24 partial product reduction, the latency is still 7 CSA stage delays which is same as the Wallace tree approach. A rough estimate of the area indicates this scheme would take 130 x 320 PPL cells (2.0mm x 8.9mm) which is 16 percent smaller than the earlier approach and is about three times bigger than the full array based approach. The approach using (8,2) counters does not seem to be significantly better than the one using (6,2) counters, so it is not described here. This is because as shown in Figure 2.8(b ), it requires two full adders and a half adder. This thus increases the width of the bit slice thus offsetting the advantage of reducing more partial products. We could build the (8,2) counter using two ( 4,2) counters but that scheme would be slower. We could pipeline the multiplier by putting a latch at the end so that when the final carry propagate addition (CPA) is taking place we can use the tree for starting another multiply. Since we get more than three times the speed of the array approach, this seems like a reasonable approach in technologies where the chip real estate is not very expensive like CMOS. However, in GaAs chip area is more expensive and we should probably look at other schemes. 2.6.2 Variations of the Theme Ideally we would like to have the area of the array scheme and the latency of the tree scheme. This is possible to a great extent if we use partial trees. Instead of reducing all of the partial products at once, we can reduce them a few at a time thus iterating over the tree more than once. Santoro has studied and implemented this approach using ( 4,2) counters in his PhD thesis [34]. His is a pipelined approach |