Page 93

Contents | 93 of 135

Download PDF | | Reference URL | Gallery View | Parent Record

Publication Type	technical report
School or College	College of Engineering
Department	Computing, School of
Creator	Lin, Frank Chung Huei
Title	Load balancing and fault tolerance in applicative systems
Date	1985-08
Description	Applicative systems are promising candidates to achieve high performance computing through aggregation of processors. This dissertation studies two important issues in building scalable applicative systems: load balancing problem and fault tolerance.; A dynamic load balancing scheme is proposed for large scale applicative systems. The method is based on a demand-driven approach, the gradient model, which transfers excessive tasks to the nearest idle processor via a gradient surface. The gradient surface is established by the demands from idle processors. The algorithm is fully distributed and asynchronous. A global balance is achieved by successive refinements of many localized balances. The gradient model is independent of system topology and can easily accommodate heterogeneous multiprocessor systems. Simulations have shown that the gradient model performs reasonably well.; The concept of functional checkpointing is proposed as the nucleus of a distributed recovery mechanism. This entails incrementally building a resilient structure as the evaluation of an applicative program proceeds. A simple rollback algorithm is suggested to regenerate the corrupted structure by the most effective functional checkpoints. Another algorithm, which attempts to recover all intermediate results, is also presented. The parent of a faulty task reproduces a functional twin of the failed task. The regenerated task inherits all offspring of the faulty task so that partial results can be salvaged.
Type	Text
Subject	computer architecture; load balancing; fault tolerance; computer science; applicative systems
Language	eng
Bibliographic Citation	Lin, FCH. (1985). Load balancing and fault tolerance in applicative systems. UUCS-85-118.
Series	University of Utah Computer Science Technical Report
Relation is Part of	ARPANET
Format Medium	application/pdf
Format Extent	52,468,022 bytes
File Name	Lin-Load_Balancing.pdf
Conversion Specifications	Original scanned with Kirtas 2400 and saved as 400 ppi uncompressed TIFF. PDF generated by Adobe Acrobat Pro X for CONTENTdm display
ARK	ark:/87278/s6ck0fp4
Setname	ir_computersa
ID	99648
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6ck0fp4

Page Metadata

Title	Page 93
Setname	ir_computersa
ID	99605
OCR Text	Show 81 creates a new substructure and establishes linkages between the parent and children. Return packets from a child task normally eliminate the ch ild structures that are no longer needed. The previous section discussed a simple rollback scheme, which cuts off the branch or branches originating from a faulty node and regrows new branches. The method basically abandons all intermediate results computed by the child tasks of a faulty node. The scheme is simple, with very little overhead if no fault has occurred. However. recovery could be inefficient because all par-tial results are neglected. This section suggests a different approach, splice recovery, which attempts to retrieve all possible intermediate results. First, we describe the fun-damental principles for retrieving orphan tasks. Then. we apply the method to applicative programs without forward chaining (section 3.4.3). The method is later extended to general program structures. A proof of correctness is also presented. 3.4.1 Resilient evaluation structure The splice approach toward a fault-tolerant applicative system is to con-tinuously establish a resilient evaluation structure during program computations. A resilient structure is one containing redundant information which allows a # system to rebuild the original structure after a failure has been identified. By rebuilding the structure, the system may salvage many partial results. 3.4.1.1 Resilient applicative tree. It is obvious that ·every child task has a pointer to its parent. The pointer is needed for returning results. There is also an "implied" pointer from every parent task to its child, since a parent can · recreate any child. In other words, a child task is ·replaceable as long as the parent task is alive. This property suggests that the applicative call tree itself is resilient. When a processor fails, the call tree may break into several pieces.. The idea of the splice recovery is to provide necessary bridging information such
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6ck0fp4/99605