| OCR Text |
Show 215 cessor arrays. The complete architecture can be globally synchronized or self-time controlled. The parallel mDRA algorithm performed on parallel mDRA architecture is il-lustrated in Figure 6.40. It has optimal time complexity, i.e., O(nm). Meanwhile, its convergence property has been greatly improved. Real algorithm run and simulation indicate that this algorithm is many orders faster than the parallel DRA5 algorithm. Three advanced parallel mDRA architectures were designed during 1988 (68]. Some implementation issues for the parallel rnDRA architecture are discussed in the next sections. 6. 7 Wafer-Scale Integration of Parallel DRA Architectures VLSI circuits offer a wonderful computing medium with incredible computing power and permit much spatial parallelism within a 2-dimensional plane, while in any sequential uniprocessor machine only !-dimensional serial computation is possible. In order to map the optimal parallel DRAS and mDRA algorithms onto a VLSI architecture to solve large size engineering problems, one has to deal with the following two critical challenges: (I) 1/0 Problem. 1/0 design in the DRAS and mDRA implementation are important. It may become a bottleneck for the entire system, if we still follow the track of conventional chip level design. (2) Extension to Large Scale Computation. When problem size (n and m) in-creases, or one selects a very large granule size in processor implementation, the complete design has to be implemented on separate chips, thus increasing the per-formance penalties resulting from off-chip communication. This is due mainly to the time required to drive the package pins and also the expense of initializing |