| OCR Text |
Show 74 3.2 Functional checkpoint Checkpoint is a familiar term in fault-tolerant computing literature [1 , 67]. In a uniprocessor system, checkpointing is normally performed by storing machine state on nonvolatile devices periodically. Definition 3-1: A checkpoint is periodic if it is activated at a regular interval. A periodic checkpoint stores the entire system state of all tasks such that recovery of the system is poss1ble. The same technique has been enhanced to multiprocessor systems where synchronization among processors becomes a problem [18. 7. 28, 62]. The basic idea is to virtually stop computational operations while periodic global check-pointing is taking place. Periodic global checkpointing may not serve the best interest of fault tolerant applicative systems. For example, nonvolatile storage for storing the system states may not be necessary. If recovery of a faulty processor is ac-complished outside the node, the role of nonvolatile storage becomes dispens-able. Checkpoint information may be stored on one or more peer processors. Furthermore, the periodic synchronization effort among a large number of processors is potentially inefficient. We proposed a distributed checkpointing strategy for applicative systems. The approach attempts to exploit the determinacy property of applicative programs. Definition 3-2: A functional checkpoint is a recovery point for a functi<?n in an applicative system. A partial state of the system is stored so that recovery of the function is possible. Th~ partial system state used in a functional checkpoint is related to a single function only. Normally, a functional checkpoint does not have enough infor-mation to recover a node, not to mention recovering -a system. The sole pur-pose of the partial state is just to back up a function. The idea of funct1onal checkpointing is to disseminate the responsibilities of . recovering a faulty node to processors which have immediate relationship |