| OCR Text |
Show 112 and its current status. The recovery node reconstructs the lost task descriptor with this information. Other fault tolerant research concern1ng nonappticative multiprocessor systems includes nested atomic remote procedure call discussed in section 3.4.6.4, general load redistribution techniques [6], and various fault-tolerant interconnections. The load redistribution approach of [6] keeps an allocation table in each processor. When a node fails, other healthy processors commun1cate with each other to settle the redistribution. 3.7 Summary This chapter discusses the reliability aspect of applicative multiprocessor systems and suggests means for fail-soft treatment. The concept of tunct1onal checkpointing is proposed. Unlike conventional checkpoint schemes, functional checkpointing is concise. distributed and asynchronous. Furthermore. the functional checkpointing method is applicable over a wide range of static and dynamic load distnbution techniques, including the gradient balancing model described in the previous chapter. Two fault recovery techniques based on the ·notion of functional checkpointing are suggested. The thrust of these recovery models is to minimize the overhead while the system is in a normal, fault-free operation. This is a reasonable philosophy since the reliability of electronic components has made great improvements during the ye~rs. The simple rollback recovery method attempts to reconstruct the faulty section of the program structure by redoing the functions from the most effici~ nt parent task or tasks. In other words, the recovery starts from the most recent functional checkpoints. The scheme is simple and has very little overhead in a normal operation. But. if a fault happens· at a later stage of the evaluation, the rollback recovery may be costly. The splice recovery scheme also uses the most recent functional checkpojnts for error recovery as in the rollback method. In addition, the splicing |