| OCR Text |
Show CHAPTER 3 FAULT TOLERANCE An important feature of a multiprocessor system is the ability to sustain partial system failures. An applicative system is no exception. As discussed in early chapters, scalabi I ity is one of the most desirable feature of a large-scale system. An upward scalable system is linearly or near linearly more powerful as the number of processors increases. On the other hand, the performance of a downward scalable system degrades in proportion to the decease of processors. In general, a downward scalable system is not automatically upward scalable and vice versa. Most research in applicative systems [ 15, 31, 10, 13, 35] has been centered on upward scalability issues. The downward scalability measures, or faulttolerance problems. of applicative systems have seldom been reported [28, 22, 54]. Many fault-tolerance techniques for general multiprocessor systems have been proposed [1, 67]. Some of these schemes can be adapted to applicative multiprocessor systems. However,. applicative systems possess some interesting characteristic~, e.g., determinacy, that merit distinct fault recovery considerations." In this chapter, the fault tolerance issues of applicative systems are studied. Although the following discussions assume that the dynamic load balancing scheme advocated in the previous chapter is used. the approaches proposed in this chapter are equally applicable to many other task allocation methods as well. Program evaluation model and fault assumptions are described in section 3.1. A distributed checkpointing scheme, r :.;nctional |