| OCR Text |
Show 111 processor B, which corresponds to task A 1/ ab. Task A 1/ab is then reinstantiated by processor A with the answer of task A 1/ab/xyz/daa ready to be utilized. 3.6 Related research Fault tolerant problems in data-driven systems have been studied by [44, 28, 54]. Misunas proposed a triple modular redundancy implementation of a dataflow machine [44, 1 5]. Three complete copies of the program are stored in the memory. Copies of each instruction are carefully distributed so that each copy is executed by a different processor and utilizes different communication paths. Thus, the failure of any single block affects at most one copy of the program. Hughes [28] described a variation of periodic checkpointing, where a host processor periodically stored the whole system state. Also discussed was a recovery technique. node-by-node correction, which used a control unit of the system as a monitoring device. Erroneous packets were recomputed and resent. Srini [54] suggested a node reassignment algorithm for error recovery purposes. The algorithm depends on a global system memory for collecting and communicating recovery messages. The checkpointed node state is stored in the global memory. Grit [22] proposed a structural recovery method where each node in the system is limited to spawning child tasks to its immediate neighbors. At system initialization time. a node receives a list of recovery sites for each of its immediate neighbors. When a node fails, a neighbor notifies the recovery site. The recovery node polls all possible parent nodes of the failed processor for the following information: address of parent, address and workstatus of each child, and addresses of argument list template.s. The recovery node also polls all possible child nodes of the failed processor for this additional information: address of child, address of its parent, address of result, work status of the task, |