| OCR Text |
Show 94 the impact of communication patterns during an execution. Time spent in remapping the Butterfly's memoxy is time during which the process doing the mapping cannot put any load on the interconnect. If this overhead is too high it will mask communication overhead. Table 4 presents SAR-smashing statistics for the 8 -queens program on 16 nodes. Using SAR-smashing, Boplog asserts direct control over the memoxy-mapping process, using kernel-mode pr1vileges to write directly into the hardware mapping registers on the PNC. By doing so, memoxy map changes are made in the minimum possible time. Each change involves only a 32-bit write to a fixed location in the PNC. This undoubtedly takes just a few microseconds, and the remaining time for each SAR-smashing operation is spent obtaining the necessaxy information for determining the correct SAR value. The information is extracted from a reference by masking and shifting to get a SAR table index and an object offset. The index is added to the base address of the SAR table to find the appropriate SAR value, and the object offset is left-shifted to convert the four-byte offset from the reference into a byte offset. If this information could be found more easily, each SAR-smashing operation would be correspondingly faster. Alternatively, another mapping scheme could be used in conjunction with SAR-smashing. Since SAR-smashing uses only one SAR, many others are still available. Even for reasonably large programs, it should be possible to keep the data areas for all ancestor processes mapped in at all times, elim1nating most memory map changes altogether. Such a scheme would require three SARs for each ancestor (one each for the ancestor's heap, stack, and binding stack), which should allow support for some 60+ ancestor processes. Use of this scheme would require fast determination of the identity of the "owning" process of a reference, so that the proper SAR number could be used in computing the virtual address. Yet another scheme would be to use the same SAR for mapping all three data areas of a particular process, changing its contents to gain access to the different data areas. In this scenario, each process could be assigned a SAR a priori, so that |