Page 70

Contents | 70 of 135

Download PDF | | Reference URL | Gallery View | Parent Record

Publication Type	technical report
School or College	College of Engineering
Department	Computing, School of
Creator	Lin, Frank Chung Huei
Title	Load balancing and fault tolerance in applicative systems
Date	1985-08
Description	Applicative systems are promising candidates to achieve high performance computing through aggregation of processors. This dissertation studies two important issues in building scalable applicative systems: load balancing problem and fault tolerance.; A dynamic load balancing scheme is proposed for large scale applicative systems. The method is based on a demand-driven approach, the gradient model, which transfers excessive tasks to the nearest idle processor via a gradient surface. The gradient surface is established by the demands from idle processors. The algorithm is fully distributed and asynchronous. A global balance is achieved by successive refinements of many localized balances. The gradient model is independent of system topology and can easily accommodate heterogeneous multiprocessor systems. Simulations have shown that the gradient model performs reasonably well.; The concept of functional checkpointing is proposed as the nucleus of a distributed recovery mechanism. This entails incrementally building a resilient structure as the evaluation of an applicative program proceeds. A simple rollback algorithm is suggested to regenerate the corrupted structure by the most effective functional checkpoints. Another algorithm, which attempts to recover all intermediate results, is also presented. The parent of a faulty task reproduces a functional twin of the failed task. The regenerated task inherits all offspring of the faulty task so that partial results can be salvaged.
Type	Text
Subject	computer architecture; load balancing; fault tolerance; computer science; applicative systems
Language	eng
Bibliographic Citation	Lin, FCH. (1985). Load balancing and fault tolerance in applicative systems. UUCS-85-118.
Series	University of Utah Computer Science Technical Report
Relation is Part of	ARPANET
Format Medium	application/pdf
Format Extent	52,468,022 bytes
File Name	Lin-Load_Balancing.pdf
Conversion Specifications	Original scanned with Kirtas 2400 and saved as 400 ppi uncompressed TIFF. PDF generated by Adobe Acrobat Pro X for CONTENTdm display
ARK	ark:/87278/s6ck0fp4
Setname	ir_computersa
ID	99648
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6ck0fp4

Page Metadata

Title	Page 70
Setname	ir_computersa
ID	99582
OCR Text	Show fel(); { result print:dc[1 ,51 2] dc[m,n] =- } { result if m >= n then m else dc[m,med]+dc[med+ 1 ,n] med = (m+n) div 2 } 58 2.6.3.3 TopoiQgy._ The program DC512 is run on the Rediflow simulator with an increasing number of xputers. Given a fixed number of xputers, DC512 is exercised on several configurations. Different topologies used in the simula-tion are depicted in Figure 15. The speedup of the simulation versus the size and topology of the system is shown in Figure 16. It is no surprise that wrapped topology performs better than the non-wrapped configuration, since the average distance between any two xputers is only about half in the wrapped configuration. Both the tasl< packets and status functions benefit from the shorter communication distance. The simulation shows that the speedup increases almost linearly as the number of xput,ers increases through 16 processors. The system efficiency declines sharply as the system size exceeds 32 xputers. Closely examining the simulation shows that the algorithm DC512 does not have enough concurrent tasks to l<eep large numbers of xputers busy. The result in Figure 16 reflects the simple fact that merely adding processors is a pure waste if there is no concurrency in a program. 2.6.3.4 Program size. In order to show the scalability beyond 16 xputers, we increase the size of the DC512 program to DC 1024. Figure 17 summarizes the speedup of the new program. with simple cube and hypercube configuration added. Note that the .system with only 4 xputers can not run through comple-tion because of insufficient storage in the xputer. This otrservation savs that scale-down of a problem is not always possible unless the amount of memory
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6ck0fp4/99582