Page 86

Contents | 86 of 135

Download PDF | | Reference URL | Gallery View | Parent Record

Publication Type	technical report
School or College	College of Engineering
Department	Computing, School of
Creator	Lin, Frank Chung Huei
Title	Load balancing and fault tolerance in applicative systems
Date	1985-08
Description	Applicative systems are promising candidates to achieve high performance computing through aggregation of processors. This dissertation studies two important issues in building scalable applicative systems: load balancing problem and fault tolerance.; A dynamic load balancing scheme is proposed for large scale applicative systems. The method is based on a demand-driven approach, the gradient model, which transfers excessive tasks to the nearest idle processor via a gradient surface. The gradient surface is established by the demands from idle processors. The algorithm is fully distributed and asynchronous. A global balance is achieved by successive refinements of many localized balances. The gradient model is independent of system topology and can easily accommodate heterogeneous multiprocessor systems. Simulations have shown that the gradient model performs reasonably well.; The concept of functional checkpointing is proposed as the nucleus of a distributed recovery mechanism. This entails incrementally building a resilient structure as the evaluation of an applicative program proceeds. A simple rollback algorithm is suggested to regenerate the corrupted structure by the most effective functional checkpoints. Another algorithm, which attempts to recover all intermediate results, is also presented. The parent of a faulty task reproduces a functional twin of the failed task. The regenerated task inherits all offspring of the faulty task so that partial results can be salvaged.
Type	Text
Subject	computer architecture; load balancing; fault tolerance; computer science; applicative systems
Language	eng
Bibliographic Citation	Lin, FCH. (1985). Load balancing and fault tolerance in applicative systems. UUCS-85-118.
Series	University of Utah Computer Science Technical Report
Relation is Part of	ARPANET
Format Medium	application/pdf
Format Extent	52,468,022 bytes
File Name	Lin-Load_Balancing.pdf
Conversion Specifications	Original scanned with Kirtas 2400 and saved as 400 ppi uncompressed TIFF. PDF generated by Adobe Acrobat Pro X for CONTENTdm display
ARK	ark:/87278/s6ck0fp4
Setname	ir_computersa
ID	99648
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6ck0fp4

Page Metadata

Title	Page 86
Setname	ir_computersa
ID	99598
OCR Text	Show 74 3.2 Functional checkpoint Checkpoint is a familiar term in fault-tolerant computing literature [1 , 67]. In a uniprocessor system, checkpointing is normally performed by storing machine state on nonvolatile devices periodically. Definition 3-1: A checkpoint is periodic if it is activated at a regular interval. A periodic checkpoint stores the entire system state of all tasks such that recovery of the system is poss1ble. The same technique has been enhanced to multiprocessor systems where synchronization among processors becomes a problem [18. 7. 28, 62]. The basic idea is to virtually stop computational operations while periodic global check-pointing is taking place. Periodic global checkpointing may not serve the best interest of fault tolerant applicative systems. For example, nonvolatile storage for storing the system states may not be necessary. If recovery of a faulty processor is ac-complished outside the node, the role of nonvolatile storage becomes dispens-able. Checkpoint information may be stored on one or more peer processors. Furthermore, the periodic synchronization effort among a large number of processors is potentially inefficient. We proposed a distributed checkpointing strategy for applicative systems. The approach attempts to exploit the determinacy property of applicative programs. Definition 3-2: A functional checkpoint is a recovery point for a functi<?n in an applicative system. A partial state of the system is stored so that recovery of the function is possible. Th~ partial system state used in a functional checkpoint is related to a single function only. Normally, a functional checkpoint does not have enough infor-mation to recover a node, not to mention recovering -a system. The sole pur-pose of the partial state is just to back up a function. The idea of funct1onal checkpointing is to disseminate the responsibilities of . recovering a faulty node to processors which have immediate relationship
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6ck0fp4/99598