| OCR Text |
Show 73 3.1.3 Predictable processor behavior Behavior of a faulty node has significant impact on fault-tolerant techniques. A very liberal fault behavior model is the Byzantine general's problem [4 1, 59]: A faulty node may pretend that it is healthy. The failed processor may respond to inquiries from other nodes with erroneous but valid messages. The other extreme of modeling fault behavior is to require that malfunctioning processors inform the rest of a system as soon as an internal fault has been identified. The assumption is normally enforced by extensive hardware redundancy. It is assumed that processor behaviors are "predictable." If a processor fails, it will not transmit any valid message. This assumption can be enforced by commanding a faulty node to keep silent and not to respond to any inquiry. Or, a faulty node may answer an inquiry with an invalid message. Several techniques are available for a processor to determine node malfunctioning. Parity checking on the system bus or resident memory, illegal instruction trap, protection violation, or a subsystem breakdown may trigger the CPU to report a processor failure. Duplication of processors within a node, or passive node diagnosis [47], is also a common technique to build a self-checking node. 3.1.4 Reliable communication channel The switching network is another possible source of system failures. Fault-tolerance of computer networks has been an active research subject in the parallel processing area. Most fault-tolerant networks [48] were proposed to provide redundant paths such that a healthy processor could easily reroute messages through an alternate communication channel. In this study, .it is assumed that a processor makes the best effort to communicate with a destination node. If the destination can~ot be reached due tG a network problem, the node is considered faulty. Problems with the interconnection network can be detected via coding or timeout mechanisms [5]. |