A wrapper generation tool for the creation of scriptable scientific applications

Beazley, David. M.

A wrapper generation tool for the creation of scriptable scientific applications

Download File | | Reference URL

Update Item Information

Publication Type	journal article
School or College	College of Engineering
Department	Kahlert School of Computing
Creator	Beazley, David. M.
Title	A wrapper generation tool for the creation of scriptable scientific applications
Date	1998
Description	In recent years, there has been considerable interest in the use of scripting languages as a mechanism for controlling and developing scientific software. Scripting languages allow scientific applications to be encapsulated in an interpreted environment similar to that found in commercial scientific packages such as MATLAB, Mathematica, and IDL. This improves the usability of scientific software by providing a powerful meachanism for specifyling and controlling cimplex problems as well as giving users an interactive and exploratory problem solving environment. Scripting languages also provide a framework for building and integrating software components that allows tools be used in a more efficient manner. This streamlines the problem solving process and enable scientists to be more productive.
Type	Text
Publisher	University of Utah
First Page	1
Last Page	174
Language	eng
Bibliographic Citation	Beazley, D. M. (1998). A wrapper generation tool for the creation of scriptable scientific applications. 1-174. UUCS-98-018.
Rights Management	©University of Utah
Format Medium	application/pdf
Format Extent	26,938,363 bytes
Identifier	ir-main,15971
ARK	ark:/87278/s6vq3kwb
Setname	ir_uspace
ID	703491
OCR Text	Show A WRAPPER GENERATION TOOL FOR THE CREATION OF SCRIPTABLE SCIENTIFIC APPLICATIONS hy David M. Beazley A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of CompuLer Science The University of Utah August 1998Copyright © David M. Beazley 1998 All Rights ReservedT H E U N IV E R S IT Y O F U T A H G R A D U A T E S C H O O L FINAL READING APPROVAL To the Graduate Council of the University of Utah: I have read the dissertation of David M. Beazley in its final form and have found that (1) its format, citations, and bibliographic style are consistent and acceptable; (2) its Illustrative materials including figures, tables, and charts are in place; and (3) the final manuscript is satisfactory to the Supervisory Committee and is ready for submission to The Graduate School. 2.1 I IS & _____ ------------------ -------------------- /----------------------------------------------- Date Christopher H. Johnson Chair, Supervisory Committee Approved for the Major Department /___________________^ Robert Kessler Chair/Dean Approved for the Graduate Council Ann W. Hart Dean of The Graduate SchoolABSTRACT In recent years, there has been considerable interest in the use of scripting languages as a mechanism for controlling and developing scientific software. Scripting languages allow scientific applications to be encapsulated in an interpreted environment similar to that found in commercial scientific packages such as MATLAB, Mathematica, and IDL. This improves the usability of scientific software by providing a powerful mechanism for specifying and controlling complex problems as well as giving users an interactive and exploratory problem solving environment. Scripting languages also provide a framework for building and integrating software components that allows tools be used in a more efficient manner. This streamlines the problem solving process and enables scientists to be more productive. One of the most, powerful features of modern scripting languages is their ability to be extended with code written in C, C+ + , or Fortran. This allows scientists to integrate existing scientific applications into a scripting language environment. Unfortunately, this integration is not easily accomplished due to the complexity of combining scripting languages with compiled code. To simplify the use of scripting languages, a compiler, SWIG (Simplified Wrapper and Interface Generator), has been developed. SWIG automates the construction of scripting language extension modules and allows existing programs written in C or C++ 1.0 be easily transformed into scriptable applications. This, in turn, improves the usability and organization of those programs. The design and implementation of SWIG are described as well as strategies for building scriptable scientific applications. A detailed case study is presented in which SWIG has been used to transform a high performance molecular dynamics code at Los Alamos National Laboratory into a highly flexible scriptablc application. This transformation revolutionized the use of this application and allowed scientists to perform large-scale materials simulations on an day-to-day basis. In addition, a user survey is presented in which SWIG is shown to greatly simplify the creation of scriptable applications, improve productivity, and enhance the usability of scicntific programs.parents.CONTENTS ABSTRACT........................................................................................................................... iv LIST OF FIGURES ........................................................................................................... xi LIST OF TABLES................................................................................................................ xii ACKNOWLEDGEMENTS............................................................................................. xiii CHAPTERS 1. INTRODUCTION....................................................................................................... 1 1.1 The Problems Facing Computational Scientists.................................................... 1 1.2 Technical and Cultural Challenges.............................................................................. 3 1.3 The Need lor Evolutionary Improvement................................................................. 3 1.4 Scripting Languages .......................................................................................................... 4 1.5 Research Goals.................................................................................................................... 5 1.5.1 Making Scripting Languages Simple to Use................................................. 5 1.5.2 Simplifying Software Development................................................................... 6 1.5.3 Increasing the Usability of Scientific Programs............................................ 6 1.6 Methodology........................................................................................................................ Ci 1.7 Results........................................................................................................................ 7 1.8 Organization........................................................................................................................ 9 2. SCIENTIFIC SOFTWARE.................................................................................... 10 2.1 The Culture of Scientific Computing ........................................................................ 10 2.2 Scientific Software............................................................................................................... 12 2.2.1 Piecemeal Growth..................................................................................................... 12 2.2.2 User Interfaces .......................................................................................................... 13 2.3 The Search for Better Scientific Software................................................................. 15 2.3.1 Object-Oriented Frameworks.............................................................................. 15 2.3.2 Computational Steering ........................................................................................ l(j 2.3.3 Heterogeneous Computing ................................................................................... 16 2.3.4 Computational Proxies.......................................................................................... 17 2.3.5 Components and Distributed Objects.............................................................. 17 2.4 Limitations of Other Approaches................................................................................ 18 2.4.1 Poor Performance..................................................................................................... 18 2.4.2 Closed Systems.......................................................................................................... 19 2.4.3 Programming in the Large . ................................................................................. 20 2.4.4 Poor Adaptation to Change................................................................................ 20 2.4.5 Conceptual Difficulties.......................................................................................... 20 2.5 Scripting Languages and SWIG ................................................................................... 213. SCRIPTING LANGUAGES . . 22 3.1 What Ts a Scripting Language?..................................................................................... 22 3.2 Component Gluing............................................................................................................ 23 3.3 High-Level Programming ............................................................................................... 24 3.4 Scripting and Scientific Computing............................................................................. 2G 3.5 Scripting Language Extension Programming......................................................... 27 3.5.1 Extension Modules.................................................................................................. 28 3.5.1.1 Wrapper Functions........................................................................................ 29 3.5.1.2 Variable Linking............................................................................................. 30 3.5.1.3 Creating Constants........................................................................................ 31 3.5.1.4 Object Manipulation..................................................................................... 32 3.5.2 Compiling an Extension Module........................................................................ 34 3-6 Scripting Versus Commercial Packages...................................................................... 34 3.7 Scientific Computing and the Problems with Scripting....................................... 35 4. SWIG............................................................................................................................................. 37 4.1 Compilation of Scripting Components ...................................................................... 37 4.2 Related Work ....................................................................................................................... 37 4.3 Design Goals......................................................................................................................... 39 4.3.1 Simplicity.................................................................................................................... 39 4.3.2 Applicability to Existing Software ................................................................... 40 4.3.3 Support for Rapid Change................................................................................... 41 4.3.4 Separation of Interface and Implementation................................................. 4L 4.3.5 Extensibility............................................................................................................... 41 4.3.6 Support for Multiple Scripting Languages.................................................... 42 4.4 Implementation................................................................................................................... 42 4.4.1 Parsing........................................................................................................................ 42 4.4.2 Code Generation....................................................................................................... 13 4.5 SWIG Directives.................................................................................................................. 44 4.6 SWIG Input Files............................................................................................................... 45 4.7 A Simple SWTG Example................................................................................................ 45 4.8 Datatypes and Data Representation........................................................................... 47 4.8.1 Fundamental Types.................................................................................................. 47 4.8.2 Pointers, Arrays, and Objects.............................................................................. 47 4.8.2.1 Typed Pointers............................................................................................... 47 4.5.2.2 Arrays................................................................................................................. 49 4.8.2.3 Structures and Objects................................................................................ 50 4.8.3 Unsupported Datatypes ........................................................................................ 51 4.9 Objects. Classes, and Structures................................................................................... 51 4.9.1 4.9.2 4.9.3 4.9.4 4.9.5 Objects as Typed Pointers Accessor Functions............. Wrapper Classes.................. 51 53 54 Class Extension.......................................................................................................... 55 ................................................................... 56 Type Checking and Iulicritance...................................................................... 4.10 Type Management With Typemaps...............................................................................57 4.10.1 Typcioaps.......................................................................................................................57 4.10.2 Typemap Rules..............................................................................................................59 4.10.3 Advantages of Typemaps.........................................................................................60 v ii 4.J 1 Exception Handling................................................................................................................62 4.12 Mixed-Language Programming Issues............................................................................63 4.12.1 Namespace Management............................................................................................63 4.12.2 Memory Management.................................................................................................63 4.12.2.1 Garbage Collection and Pointers ................................................................63 4.12.2.2 Implicit Memory Allocation..........................................................................64 /1.12.2.3 Objects and Wrapper Classes.......................................................................65 4.12.3 Callbacks..........................................................................................................................60 4.12.4 Process and Resource Management .....................................................................66 4.13 The SWIG Library................................................................................................................66 4.14 Limitations...............................................................................................................................67 4.15 Summary....................................................................................................................................67 5. INTERFACE CONSTRUCTION....................................................................................69 5.1 First Use of SWIG ................................................................................................................69 5.2 Evolutionary Interface Development...............................................................................70 5.3 Helper Functions......................................................................................................................71 5.4 Type Management..................................................................................................................72 5.4.1 Type Conversion...........................................................................................................73 5.4.2 Containers.......................................................................................................................74 5.4.3 Aliasing............................................................................................................................75 5.5 Object-Based Interfaces ......................................................................................................76 5.6 Improving Reliability..............................................................................................................77 5.6.1 Execution Order Dependencies...............................................................................77 5.6.2 Argument Checking ....................................................................................................79 5-7 Data Management...................................................................................................................80 5.8 Performance Considerations...............................................................................................82 5.8.1 The Performance of Scripting Languages..........................................................82 5.8.2 The Performance of Compiled Extensions........................................................83 5.8.3 Designing for Performance.......................................................................................84 6. SOFTWARE COMPONENTS.........................................................................................86 6.1 Scripting Language Components......................................................................................86 6.2 Splitting Applications into Components.......................................................................87 6.3 Systems Integration................................................................................................................88 6.4 Component Design ................................................................................................................90 6.4.1 Libraries..........................................................................................................................91 6.4.2 Adapters..........................................................................................................................91 6.4.3 Bridges............................................................................................................................92 6.4.4 Facades............................................................................................................................92 6.4.5 Building a Component Library...............................................................................93 6.5 SWIG and Component Building.......................................................................................94 7. CASE STUDY : MOLECULAR DYNAMICS .....................................................95 7.1 The SPaSM Code...................................................................................................................95 7.2 Before SWIG............................................................................................................................96 7.2.1 Development of SPaSM..............................................................................................96 7.2.2 User Interfaces ..............................................................................................................97 v iii 7.2.3 Data Analysis and Visualization Woes........................................................... 98 7.2.4 The Need for a New Approach........................................................................... 99 7.3 The SWIG Prototype....................................................................................................... 100 7.3.1 A Scripting Language and Compiler................................................................ 100 7.3.2 Building the Initial System................................................................................... 101 7.3.3 Usiug the Scripted Version................................................................................... 101 7.3.4 Dead Code Elimination.......................................................................................... 102 7.3.5 Improving Reliability ............................................................................................. 103 7.3.6 Integrated Data Analysis and Visualization................................................. 105 7.3.7 Lessons Learned....................................................................................................... 106 7.3 8 Limitations................................................................................................................. 107 7.4 SWIG and Python............................................................................................................... 108 7.4.1 Building a Python Interface................................................................................ 108 7.4.2 Splitting SPaSM into C Libraries...................................................................... 108 7.4.3 Creation of Python Modules................................................................................ 109 7.4.4 Object-Oriented Extensions................................................................................ HO 7.4.5 Exception Handling.................................................................................................. 113 7.5 The Current Implementation ........................................................................................ 114 7.5.1 Components............................................................................................................... 114 7.5.2 Using the System..................................................................................................... 116 7.5.3 Writing User Code.................................................................................................. 117 7.5.4 Python Programming............................................................................................. 119 7.5.4.1 Web Based Simulation Monitoring......................................................... 119 7.5.4.2 Code Browsing................................................................................................ 121 7.5.4.3 Distributed Objects ..................................................................................... 122 7.C Performance ......................................................................................................................... 122 7.6.1 Scripting for Control, C for Performance...................................................... 122 7.6.2 A Recent Performance Study.............................................................................. 123 7.7 Results..................................................................................................................................... 124 8. USER STUDY......................................................................................................................... 125 8.1 Survey Methodology.......................................................................................................... 125 8.2 User Profile........................................................................................................................... 126 8.3 Languages.............................................................................................................................. 129 8.4 Using SWIG ......................................................................................................................... 129 8.5 Evaluation..................................................................................................................... 131 8.6 Application Areas............................................................................................................... 131 8.7 Benefits of Using SWIG .................................................................................................. 135 8-7.1 Ease of Use................................................................................................................. 135 8.7.2 Productivity............................................................................................................... 137 8.7.3 Software Development............................................................................................... 138 8.7.4 Usability...................................................................................................................... 139 8.8 Limitations........................................................................................................................... 140 8.8.1 Survey Results .......................................................................................................... 140 8.8.2 Array Handling.......................................................................................................... 141 8.8.3 Overloaded Functions............................................................................................. 143 8.8.4 Better C++Support............................................................................................... 143 8.8.5 Code Optimization............................................................................................... . 144 ix 8.8.6 !s SWIG Automatic? 8.8.7 Conceptual Barriers . 8.9 Summary................................. 9. RESULTS AND CONCLUSIONS................ 9.1 Evaluation of SWIG .............................................. 9.2 The Impact of Scripting Environments.......... 9.3 The Role of SWIG................................................... 9.4 Scientific Software Development....................... 9.5 Future Challenges................................................... 9.C Conclusion............................................................. APPENDICES A. SCRIPTING LANGUAGE EXTENSIONS B. SWIG DIRECTIVES .......................................... C. USER SURVEY..................................................... D. SOFTWARE AVAILABILITY....................... REFERENCESLIST OF FIGURES 3.1 Extension module organization........................................................................................ 28 4.1 SWIG organization............................................................................................................... 43 4.2 Layered approach to objects............................................................................................. 52 5.1 Creation of a scriptable application................................................................................ 70 5.2 Execution order dependencies.......................................................................................... 79 6.1 Splitting an application into libraries aud components......................................... 87 6.2 Structure of a scripting language component ............................................................ 88 6.3 Providing a common scripting interface to different packages............................. 89 6.4 Dircct integration of packages into into a shared environment.......................... 89 6.5 A poorly designed set of components ........................................................................... !)0 6.6 A library component............................................................................................................ 91 6.7 An adapter component ....................................................................................................... 91 6.8 A bridge component............................................................................................................... 92 6.9 A facade component ............................................................................................................ 93 6.10 A designed component library........................................................................................... 94 7.1 SPaSM component architecture........................................................................................ 115 7.2 Sample SPaSM session.......................................................................................................... 118 9.1 User interface ease of use versus implemeutation difficulty with SWIG .... 150LIST OF TABLES 4.1 Commonly used SWIG directives ................................................................................... 45 4.2 Fundamental C datatypes.................................................................................................. 48 4.3 Scripting datatypes............................................................................................................... 48 4.4 Datatype conversion ............................................................................................................ 48 4.5 SW7IG type map rules............................................................................................................ (j I 5.1 Performance penalties of scripting................................................................................... 84 7.1 SPaSM component implementation................................................................................ 115 7.'2 Execution time (seconds) of C versus C with scripting.......................................... 123 8.1 User programming experience and background......................................................... 127 8.2 User programming experience (applications).............................................................. 128 8.3 SWKi experience.................................................................................................................... 128 8.4 Languages being used with SWIG................................................................................... 130 8.5 SWIG usage.............................................................................................................................. 130 8.6 SWIG feature usage............................................................................................................... 132 8.7 Compilation of SWIG generated extensions .............................................................. 133 8.8 SWIG evaluation.................................................................................................................... 133 8.9 General uses of SWIG.......................................................................................................... 134 8.10 SWIG application areas....................................................................................................... 134 8.11 Areas in which SWIG could he improved................................................................... 141 8.12 C+ + features being used by SWIG C++ users ...................................................... 144ACKNOWLEDGEMENTS This research would not be possible without the contributions and support of many people. First, I would like to thank all of the SWIG users who have provided bug reports, feedback, and suggestions for improvement. There are far too many people to thank individually, but yon know who you are. Second, I would like to thank my collaborators Tim Germann, Brad Holian, Shujia Zhou, Ralf Makkula, Niels Jensen, and Wanshu Huang in the Theoretical Physics Division al Los Alamos National Laboratory. Paul Dubois and Brian Yang at Lawrence Livermore National Laboratory also provided many interesting discussions concerning the use of scripting languages and scientific applications. I would also like to acknowledge Chris Johnson and the Scientific Computing and Imaging group at the University of Utah for their generous support of this work. Finally, I like to offer a special thanks to Peter Lomdahl at Los Alamos National Laboratory who has supported my efforts throughout graduate school and allowed me to pursue crazy ideas. This research has been performed under the auspices of the Department of Energy, National Science Foundation, and a University of Utah Graduate Research Fellowship.CHAPTER 1 INTRODUCTION Scripting languages such as Perl. Python, and Tel are becoming an increasingly popular tool for the development and use of modern software. In fact, John Ousterhout, creator Tel. writes: For the past 15 years, a fundamental change has been occurring in the way people write computer programs. The change is a transition from system programming languages such as C or C++ to scripting languages such as Perl or Tel. Although many people are participating in the change, Few realize that the change is occurring and ever fewer know why it is happening [77. p. 23]. Although scripting languages have been used in a variety of computing applications, this dissertation primarily focuses on the use of scripting languages with scientific software. A tool, SWIG, has been developed to simplify the integration of scripting languages with existing software written in C and C++. Furthermore, the use of SWIG and scripting languages are shown to have a tremendous impact on the development, organization, and use of scientific software. Traditionally, scientific computing has been ignored by most of the computer science and software engineering community. Likewise, computational scientists often give little attention to modern software practice. This dissertation illustrates the practical application and impact of many modern software construction techniques including the use of scripting languages, software components, design patterns, software re-engineering, and interface building tools on scientific programs. While the emphasis is on scientific applications, many of the techniques and results presented are applicable to other areas of software development. 1.1 The Problems Facing Computational Scientists Computational scientists have recently witnessed an unprecedented change in the environment in which scientific simulations are performed. This changc has been fueled bya number of developments including huge increases in simulation sizes due to increased computing power, a shift in the types of scientific simulations being performed, and a variety of new software development techniques such as object-oriented programming and component frameworks. Unfortunately, these developments have greatly increased the complexity of developing and performing scientific computations. This complexity manifests itself in a number of ways. For example, the fact that a scientific program might, run on workstations, shared memory multiprocessors, distributed memory parallel computers, and clusters greatly complicates software development and has led some researchers to call for better language, software development, debugging, and tool support [78]. Large-scale simulations have resulted in large amonnts of data that overwhelm existing hardware and software-a problem often referred to as the "data glut" [20]. The increased interest in complex unstructured three-dimensional simulations has created a need for new data analysis and visualization tools. When combined with the data^glut, researchers often talk about ‘Visual supercomputing" and the construction of highly interactive data analysis systems [80, 35], Even though each of these problems is unique, they are all symptoms of the increasingly complex nature of scientific computing and the breakdown of traditional approaches. Although there are many facets to the complexity puzzle, one of the biggest problems facing computational scientists is the process by which scientific software is developed, assembled, and controlled. Not only is the development of new software more complicated, but scientists must work with a wide variety of existing packages, libraries, and tools. These components are often written in different languages, use a variety of programming styles, and make different assumptions about data layout, file formats, and user interfaces. As a result, many computational scientists find themselves spending a large amount of time fighting with a "witches brew" of different programs, tools, and packages. To address these problems, there has been considerable interest in improving the development, structure, and usability of scientific programs. The use of advanced software development techniques such as object-oriented programming is becoming increasingly common in scientific projects [31, 52, 86j. To provide better integration between tools, developers have been working on the creation of integrated problem solving environments and component frameworks [80, 84]. To improve usability, a number of efforts have focused on user interfaces and the way in which scientific programs are driven [80, 49], Ifa sensible solution to these problems can be devised, il will greatly streamline the problem solving process as well as the way in which scientific programs are developed. 1.2 Technical and Cultural Challenges Although there are many benefits to building better scientific software, solutions need to overcome a number of cultural and technical obstacles. Scientific computing is largely practiced by people trained in disciplines other than computer science. In addition, they generally pay little attention to software engineering and design- As a result, tools and techniques designed for large software engineering projects have largely been ignored by the scientific community. To be useful to scientists, solutions need to be easy to use and well adapted to the scientific computing culture. Furthermore, scientists are unlikely to abandon years of previous work or radically change their programming methodology in favor of unproven software technology. Therefore, tools must not only be simple to use, but they mast work with a diverse range of software that is often idiosyncratic, difficult to use, and poorly designed. 1.3 The Need for Evolutionary Improvement When faced with the prospect of improving scientific software, there is a tendency for software engineers to abandon existing scientific software and development techniques in favor of seemingly revolutionary improvements or new software technology. Unfortunately, this practice has the danger of producing a second-system effect in which a software environment is created with the goal of eliminating every possible shortcoming found in existing systems [17]. Unfortunately, users are often frustrated to find that such efforts result in systems that are too complicated and general purpose to effectively solve any problem. Although improving the usability and structure of scientific programs is beneficial, it is important for software developers to realize that it is rarely necessary to throw existing software away and start over. In fact, many existing systems can be greatly improved by making a series of small modifications. Such an approach is attractive to scientists since they often develop a familiarity with their software and are reluctant to abandon previous work. Therefore, tools designed to improve scientific software are more likely to succeed if they embrace existing software and allow developers to make incremental and evolutionary improvements.1.4 Scripting Languages Scripting languages are a powerful tool for building better scientific software because they provide scientists with an interpreted environment that can be used to specify problems, control complex applications, and solve problems in an exploratory manner. In addition, scripting languages provide a framework for building and assembling software components. A component-based approach greatly improves the organization of scientific programs and allows different systems to be integrated. Such integration allows tools to work together more efficiently and streamlines the problem-solving process. Finally, scripting languages can interact with code written in compiled languages such as C, C++, and Fortran. This allows existing applications, as well as performance critical operations, to be incorporated as extensions to a scripting environment. This, in turn, provides an evolutionary path for improving the organization and use of existing software as described in the previous section. The benefits of scripting languages have even led some researchers to make bold claims about the future. Paul Dubois writes, Much of scientific programming is exploratory in nature, and for that sort of programming the, use of compiled languages will cease. Interpreters will simply be fast enough for most such calculations. More computationally intensive programs will be written as extensions of interpreted environments [33, p. 171]. Although scripting languages have much to offer computational scientists, it is unlikely that scientists will abandon the use of compiled code due to the computationally intensive nature of scientific applications and the relatively slow performance of interpreters (which is sometimes more than three orders of magnitude slower than compiled C or C++). Therefore, the integration of scripting environments with extensions written in compiled languages such as C, C++, and Fortran will be critical if scripting languages are to succecd in the computational science community. Unfortunately, the incorporation of compiled code into a scripting environment is a difficult endeavor. This difficulty arises from the fact that scripting languages provide no automated mechanism for accessing compiled code. As a result, scientists are forccd to write wrapper codc that acts as a glue-layer between their application and the scripting language interpreter. Creating this wrapper code is complicated, tedious, and prone to error. Therefore, scripting languages currently require too much time and effort to be used with most, scientific computing projects.If the process of integrating scripting languages and compiled code can be simplified, computational scientists will be able to effectively utilize scripting in a wide range of applications. Such simplification has even been discussed in the literature. Paul Dubois also writes, The specification of information in order to run a significant physics calculation is a complex task; the use of scripting languages for making such specifications will become universal. We shall have good tools that automatically connect a scripting language to compiled modules [33, p. 171]. Although a number of existing tools can be used to create scripting language extensions, these tools are special purpose, limited in their capabilities, and somewhat difficult to use. As a result, these tools have remained of limited use to the computational science community. 1.5 Research Goals The goal of the research is to develop a general purpose scripting language extension building tool and to demonstrate the impact of such a tool on the development, organization, and use of scientific software. In particular, the research will show how such a tool makes it easier for scientists to use scripting languages and how the use of a scripting environment fundamentally improves the way in which scientific software can be used to solve scientific problems. 1.5.1 Making Scripting Languages Simple to Use The research will show how an automated extension building tool can simplify the way in which scientists currently utilize scripting languages. First, such a tool would allow scientists to easily retrofit existing applications with a scripting language interface. This would improve the usability of those applications and allow them to be used in a much more flexible manner than previously possible. Second, by automating the creation of scripting interfaces, scriptable applications would be largely insensitive to changes in the underlying implementation-making such applications more adaptable to change. Finally, by automating the process of extension building, scientists will be able to utilize scripting languages in situations where they might otherwise not be considered.1.5.2 Simplifying Software Development Scripting languages provide a highly flexible environment for controlling applications as well as integrating software components. If the construction of scripting interfaces can be sufficiently simplified; it will possible for scientists to easily incorporate software into a scripting environment. This, in turn, can have a dramatic impact on the continued development and organization of that software. In particular, the research will show how scripting languages lead to greater flexibility, better reliability, and improved modularity. Furthermore, it will be shown that such an approach allows different software systems to be packaged as collections of components and combined with other systems. This integration allows different programs to work together more efficiently than previously possible. 1.5.3 Increasing the Usability of Scientific Programs Finally, the primary purpose of using scripting languages is to improve the usabilit}r of scientific programs. Scripting languages are particularly appropriate for scientific applications because they provide a flexible interpreted environment that can be used to specify complex problems, run simulations, and interact with programs in an exploratory manner. Currently, these qualities are usually only found in large commercial systems such as MATLAB, Mathematica: Maple, and IDL [53, 108, 22, 83]. However, the research will show that the use of extension building tools and scripting languages makes it easy for scientists to construct their own applications of comparable power and flexibility. 1.6 Methodology A general purpose scripting language extension tool will be developed and freely distributed to the software development community. This purpose of this tool will be to automatically construct scripting language interfaces to existing programs written in C and C++. Although a large number of scientific programs arc currently implemented in Fortran, the vise of Fortran will not be considered. First, an increasing number of scientific programs are now being written in C or C++. Second, scripting languages require compiled extensions to be accessed through a C interface. At, this time, the C interface to Fortran varies by compiler and is highly nonstandard (making automatic extension building difficult). Finally, since scripting language access to Fortran already requires a C interface, this interface can be used with tools designed for C and C++ code.Iii addition to developing an extension building tool, a number of interface construction and design techniques for migrating existing applications to a scripting environment will be developed. Some of these techniques include methods for data management, error handling, type management, and the creation of scripting language components. To demonstrate the impact of the tool and interface building techniques, a detailed case study will he conducted. The case study will describe the process of transforming an existing scientific program into a scriptable application and how that application improves as a result of operating in a scripting environment. Finally, a user survey will be used to determine the effectiveness of the extension building tool with other applications. The survey will also help identify strengths and weaknesses of this approach as well as the impact on the application build in process. Results will be validated through the use of the case study and user survey. In particular, success will be based on the following criteria Ease of use. Unless an extension building tool is easy to use, it is unlikely to be of much use to the scientific community. Applicability to real software. To be succcssful, an extension building tool must be able to operate with the software developed and used by scientists. Productivity. Tools must make scientists more productive by simplifying the development of scientific software and streamlining the way in which that software is used to solve scientific problems (i.e., improving the "usability" of scientific software). Performance. Given that most scientific programs are computationally intensive, solutions must not introduce large performance penalties. 1.7 Results A freely available scripting tool, SWIG (Simplified Wrapper and Interface Generator), has been developed and distributed [8, 5], SWTG allows developers to create scripting interfaces to programs written in C, C++, and Objective-C. To simplify use, SWIG constructs scripting interfaces dircctly from ANSI C/C++ declarations as opposed to using a formal interface definition language. Thus, using only C header files, a scientist, can often construct a simple scripting interface to an application in only a matter of minutes. In addition, SWIG has an extensible design that allows it to support multiplescripting languages and to be customized. Currently, SWIG is being used by several thousand users to construct extensions to Perl, Python, Tel, and Guile on Unix, Windows, In addition, a detailed case study is presented in which SWIG has been used to transform the SPaSM molecular dynamics code at Los Alamos National Laboratory into a highly flexible and efficient scriptable application [10]. In the process, the case study examines the use of SWIG and scripting languages with a real application over a 3-year period. As a result,, the study provides a description of how an existing application can be incorporated into a scripting environment and how that application has improved over In the case study, it will be shown that SWIG enabled scientists to build a scripting interface to the SPaSM code in a relatively short amount of time and how the resulting scripting interface indirectly led to a series of incremental changes resulting in improved reliability, organization, and modularity. Furthermore, the use of SWIG and scripting languages eventually resulted in a high-performance highly flexible component-based system capable of integrated simulation, data analysis, and visualization. In addition, the scripting environment created with SWIG revolutionized the use of the code and made it possible for scientists to perform large-scale simulations of materials on a day-to-day Finally, a user survey consisting of 119 responses from current SWIG users is presented. The survey shows that SWIG is being used with a wide variety of scientific and nonscientific applications. Furthermore, survey responses indicate that SWIG greatly simplifies the creation of scripting language interfaces, improves productivity, and has a Based on the results of the case study and user survey, SWIG is shown to have positive impact on the development and use of scientific applications. First, SWIG greatly simplifies the integration of scripting languages and compiled code. This makes it possible to easily incorporate existing applications into a scripting environment as well as allowing scientists to use scripting languages in situations where they might otherwise have not been considered. Second, the use of SWIG and scripting languages simplifies the development and organization of scientific software-resulting in greater reliability, flexibility, and modularity. Finally, the use of scripting environments substantially improves theEven though this dissertation primarily focuscs on the development of scientific software, SWIG is also applicable to other areas of software development. In particular, the user survey reveals that nearly 40% of SWIG users are working on nonscientific projects including industrial and commercial software development. 1.8 Organization This dissertation primarily describes SWIG and the process of creating scriptable scientific applications. Chapter 2 describes some of the software problems faced by computational scientists and related research on scientific software environments. Chapter 3 describes scripting languages and the mechanisms by which they are extended with compiled code. The design and implementation of SWIG are described in Chapter 4. Chapters 5 and 6 describe strategies for migrating existing applications to a scripting environment as well as aspects of component-based scripting applications. Chapter 7 presents a detailed case study describing the use of SWIG and scripting languages with the SPaSM molecular dynamics code at Los Alamos National Laboratory. Finally, a user survey is presented in Chapter 8. This survey provides statistical data about who is using SWIG as well as antedotal evidence describing how SWIG simplifies the creation of scriptable applications, improves productivity, and improves the development and organization of scientific applications.CHAPTER 2 SCIENTIFIC SOFTWARE 2.1 The Culture of Scientific Computing Scientific computing has a unique culture that is quite different than that found in a commercial or industrial setting. In a nonscientific setting, the primary goal of a software project is usually the construction of a well-defined product such as a billing system, a CAD system, or a database. There arc a variety of software engineering techniques that can be used to design, specify, and implement such projects. Furthermore, there arc variety of metrics for measuring the success or failure of these efforts. The primary goal of most scientific projects, however, is not to build a specific product but to gain understanding and knowledge about a scientific problem of interest. Understanding this difference is important if successful tools are to be developed. Most scientific computing projects are started by a small group of scientists (physicists, chemists, mathematicians, etc.) who are interested in studying a particular problem. More often than not, programs start small and are written to address a particular class of problems. Few computational scientists start with the goal of writing a large general purpose software package. However, programs that prove to be useful may evolve into larger systems over time. When creating a scientific program, scientists arc unlikely to use many (if any) of the software engineering methodologies that might be found in a large programming effort [109, 16, 17, 37], The use of "requirements" documents, program analysis, CASE tools, and so forth is virtually unheard of. One reason for this is that scientific programs are almost always experimental and unproven. More often than not, the scientists may not know exactly how to solve the problem in advance. In fact, the entire "design" phase of a project may just be a discussion of the scientific problem (initial conditions, numerical methods, physical models, etc.). As a result, it is extremely difficult, if not impossible, to formally describe the structure that a scientific program will take in advance. Paul Dubois, writes:A scientific program is usually the product of one or two people, who write it initially to solve a class of problems faced by themselves and perhaps a few friends. It is much rarer for a decision to be made early to write a large program; rather, the programs that prove to be useful are added to, and evolve into, large programs over time. Such programs have not been suitable subjects for a massive analysis and design effort. In fact, scientists would not dream of doing such a thing even if they were to have the skill. Usually, it is not even known if the approach being taken will actually work. Anything remotely like a "Requirements'* document is of questionable value to the scientist. Generally, the author has a class of problems in mind and an algorithmic idea that he or she believes will do the modeling joh. The entire Requirements Phase usually involves a little muttering to oneself about what kinds of geometry and boundary conditions to allow for [29, p. 4]. Even though traditional software engineering techniques arguably might result in "better" scientific software, the inherently unpredictable nature of scicntific problems makes the application of such techniques difficult. Genevieve Dazzo writes, Scientific programs tend to undergo more revision than their business counterparts because the needs of their users change more drastically over a short period of time. Users of scientific programs are anxious to explore new areas and expand existing knowledge [‘26, p. 52]. Finally, performance is an important part of the scientific computing culture and often one of the top priorities when developing scientific applications. Scientific problems routinely push the limits of available hardware and software. The need for performance is primarily motivated by the need for scientists to have an adequate turn-around time while still providing useful information about, a problem. Simulations that take too long to complete are of limited value because they do not provide enough of a sample size to draw conclusions (simulations often need to be run dozens to hundreds of times with different parameters to be useful). Likewise, simulations that are of insufficient size may not have enough accuracy to yield interesting results. Interestingly enough, faster computing hardware does not seem to have had a large effect on simulation time. Rather, scientists have used increased computing power to improve the accuracy or size of their simulations. In fact, some authors have even observed that simulation times have remained relatively constant over the last 20 years despite huge gains in computing performance [29], The performance focus of most projects does not necessarily mean that scientists ignore other software development issues. Portability is also a concern but is often not addressed until a machine is about to disappear. Making programs easier to use is also of interest, but not always a high priority. When these issues are considered, it is often withinthe con text of performance. Solutions with severe performance penalties will usually be dismissed. However, scientists might also want to consider a quote attributed to John Ousterhout, "The best performance improvement is the transition from the nonworking state to the working state" [104, p. 447], 2.2 Scientific Software The lack of formal design and piecemeal growth of scientific programs presents a number of technical challenges to framework and tool designers. Even though it is common for scientists to write software, they tend to do so by following the "principle of least action." In other words, scientists tend to favor techniques that are conceptually simple and require the least amount of effort on their part (although this phenomenon does not appear to be isolated to computational science). As a result, most scientific systems tend to be simple and minimalistic in nature. Unfortunately, approaches that make a program easy to write can come back to haunt users and developers. For example, a program that starts small and is grown in an ad hoc manner can become a nightmare to maintain. Likewise, a program that is easy to write might not be easy to use due to the difficulty of writing a user interface. This section describes some of the common problems associated with working with scientific software. 2.2.1 Piecemeal Growth When a scientific program is first written, it usually addresses a specific scientific problem. For example, a program might be written to perform a three-dimensional molecular dynamics simulation of an elliptical crack in a periodic face-centered-cubic (fee) crystal using a Lcnnard-Jones interatomic potential [3]. However, most programs can be generalised to look at other related cases so a scientist may start modifying the code witli new boundary conditions, new interatomic potentials, a variety of numerical integration algorithms, and features for data management. When new features are added, conditional statements are often added to the program as follows: if (boundary == FREE) { Use free boundary conditions } else if (boundary == PERIODIC) { Use periodic boundary conditions }- else if (boundary == DAMPED) { Use damped boundary conditions >Even though adding new features to small programs is relatively simple, it becomes increasingly difficult as programs grow in size. In fact., after several years of this kind of development, scientists may find that a substantial portion of their program has become a tangled web of control logic, special cases, and obscure functions. Worse still, changing any part of the code may have far-reaching consequences and unforeseen side effects. 2.2.2 User Interfaces Closely related to the growth and development of scientific software are the user interface mechanisms used to control such software. The most simple user interface is none at all. For small programs, parameters can be hard-coded into the program itself. This approach works fine for very simple problems, but scientific computing is an inherently exploratory activity. Scientists want to change parameters and see what happens. This becomes tedious if the code is recompiled alter every change. An alternative approach is to modify the program to interactively prompt the user for various program parameters. This allows a user to change parameters at run time, but many scientific problems are solved by just changing one or two interesting parameters and observing the outcome of repeated simulations. Since answering the same series of questions quickly becomes repetitive, scientists eventually just write an input file containing the answers to all of the questions and run programs as batch processing jobs. Finally, scientific programs are sometimes controlled through a collection of command line options. However, users quickly become annoyed if they have to specify several dozen command line options each time a program is rim. Although all of these user interface schemes are easy to implement, they break down as programs grow in size and capabilities. As more features are added, the development of the user interface and the control of the program becomes increasing complex. At some point, it becomes unreasonable to explicitly ask the user hundreds of questions or to provide a hundred different options on the command line. The problem is f'uther compounded by the desire to integrate different packages and provide a more interactive problem solving environment. For example, none of the user interface techniques described so far would be appropriate for driving an integrated and interactive simulation, data analysis, and visualization environment. The simplicity of existing user interfaces raises the question of why scientists don't use more sophisticated user interface strategies. One such strategy, often seen in scientificsystems, is to utilize a simple command interpreter similar to what might be found ju a commercial package such as MATLAB or Mathematics [53, 108]. Using an interpreter, a scientist, would control an application by writing a simple script or typing commands that the application would interpret at run time. This provides a great deal of flexibility and appears remarkably similar to other techniques (especially since scientists are already accustomed to writing scripts and input files). However, making scientific programs interpret commands requires an interpreter. Writing a new interpreter from scratch is a. time-consuming and difficult endeaver for scientists. On the other hand, using an existing interpreter can be equally difficult since a scientist may not know how to integrate it into their existing programs and nse it effectively. Finally, scientists might consider the use of a graphical user interface (GUT). This is often a "popular" notion until scientists discover the difficulty of creating a GUI. The development of a GUI is substantially more difficult than any of the schemes described so far-requiring detailed knowledge of graphical user interface libraries, event driven programming, wiclget libraries, and so forth. Furthermore, the implementation of a usable GUI is a nontrivial task. One would certainly not want to present the user with a dialogue box containing hundreds of buttons and entry fields because that would not be much different than just asking the user a series of questions. To further complicate matters GUI interfaces are often highly nonportable and difficult to manage on experimental platforms. In extreme cases, a machine might only support batch-processing jobs and have no support for graphical display. Finally, promoters of graphical user interfaces often assume that the user wants to constantly interact with their programs. Although interaction is clearly important, some scientific programs can rnn for tens to hundreds of hours. Therefore a scripting and batch processing capability is almost always necessary. As a result, a graphical user interface is most useful when combined with a command interpreter or other batch-oriented interface. Overall, scientists tend to prefer user interface schemes that are simple to implement even though more sophisticated techniques are available. Since scientific programs start small, there is initially little need to utilize a highly sophisticated user interface. Furthermore, usability is only a minor concern since the initial developers of a system tend to be its primary users (also, the goal of most scientific projects is not to deliver a polished product). As a result, user interface problems tend to "sneak up" on developers as programs grow in size. In Fact, it is not unusual for scientific programs to adopt a2.3 The Search for Better Scientific Software In later sections, the use of scripting languages and automated extension building tools will be described as a means for improving the usability and organization of scientific software. However, this is not the only approach being pursued in the scientific community. This section briefly describes a number of other development efforts. The primary goal of this section is to call attention to related work that is aimed at changing the way in which scientific software is developed and used. 2.3.1 Object-Oriented Frameworks Some scientists have been adopting the techniques of object-oriented programming to provide an application development environment for solving science and engineering problems. Some efforts include POOMA. A++/P+ + , PETSc, and Diffpack [84, 81, 4, 18\|. The idea behind these systems is to provide scientists with a useful collection of objects and an environment in which the objects can be used to solve problems. For example, a system might provide basic objects for matrices, unstructured meshes, vectors, complex numbers, particles, vector fields, and so on. A number of operations and methods such as basic arithmetic., linear solvers, preconditioners, visualization, and error analysis could then be applied to the objects as needed. To solve a problem with one of these systems, a scientist assembles an appropriate collection of objects and applies a series of "interesting" operations to them. This approach is attractive for a number of reasons. First, it provides a tightly integrated environment that allows objects to interact with each other. Second, it allows software designers to hide much of the complexity from users. For example, on a parallel machine, the parallelism could be bidden away in abstract base classes and lower levels of the framework. At the highest level, users might not even be aware of such parallelism or the technical details involving its implementation. In addition, this approach can result in very compact and "simple" formulations of scientific problems. For example, a scientist might be able to solve a problem by simply creating a few objects and writing a few mathematical equations. Operator overloading and other advanced language features can often hide much of the underlying complexity while greatly reducing the amount of code that must be written by the user. Finally, such approaches attempt to capitalize number o f increasingly complex interface schemes over th e ir life tim e . on the general benefits of object oriented programming including management of large software systems, controlling complexity, code reuse, and encapsulation. 2.3.2 Computational Steering Computational steering is an emerging field that attempts to provide integration between simulation, data analysis, and visualization [49, 79], User interaction is a key feature because steering systems provide scientists with a highly flexible and interactive data exploration and simulation environment. That is, they allow scientists t,o interact with their data in real time, guide simulations, and play out different scenarios. Steering systems are primarily focused on the way in which a scientist performs and interacts with simulations. Much of the work is focused on issues of data locality, moving data between machines, visualization techniques, and mechanisms for presenting the data to the user. Some recent steering efforts include the SCIRun system developed at Utah, program instrumentation tools at Georgia Tech. and integration of visualization systems such as AVS with simulation codes [80, 100, 99, 98, 97, 21, 59, 44]. The interesting aspect of steering systems, is that in providing integrated simulation and visualization to the user, they also address complex software construction issues. In order to make a steering system work, the different subsystems need to be combined and controlled in an effective manner. In many cases, the components are third-party packages and libraries. Therefore, developers need to worry about the interfaces between modules, frameworks for combining and using modules, and the difficulties of using existing software. Since these issues also arise in scripting environments, many of the techniques utilized by steering systems also apply to scriptabfe applications. 2.3.3 Heterogeneous Computing A number of researchers have been interested in the problem of providing software and infrastructure for heterogeneous computing. Some efforts include the I-WAY, Globus, the Grid, and Legion [27, 40, 92, 47]. Significant portions of these projects are devoted to infrastructure issues such as faster networks, high performance computing platforms, and high-end visualization systems, but there is also a fundamental software problem that needs to be addressed. In particular, how are scientists going to go about hooking all of these pieces together? How will they write software to run in such a heterogeneous environment? How can existing systems be incorporated into such an environment?Like efforts in computational steering and scripting environments, success depends upon finding schemes for building, controlling, and using scientific software components. 2.3,4 Computational Proxies The integration and control of scientific software components have also been accomplished using objcct-oriented databases and computational proxies [24], With a proxy system, the original scientific applications remain unmodified while a proxy system is used to provide a generalized interface to users. The proxy system manages the execution and transfer of data between different components while hiding details from the users. In order to do this, the proxy server knows how each program is controlled as well as the data formats used for input and output. The proxy approach is primarily used to encapsulate a variety of legacy applications into a unified environment. It does not change the way in which each individual application is structured or used nor does it address the problems of moving massive amounts of data around between subsystems (although it may hide the process from users). An approach similar to computational proxies can sometimes be accomplished using scripting languages. For example, Expect is a Tc.1-based extension that is often used to drive existing applications by mimicking the input of users [63]. Likewise, Perl and Python can be used to drive legacy applications from a scripting environment [101, 66]. 2.3.5 Components and Distributed Objects The creation and integration of software components are also of great interest to commercial and industrial software development efforts. The primary difficulty in this case is that programming projects are often undertaken by large teams of programmers who are working on very large and complex systems. Since individual components may be developed by different groups of programmers, frameworks for integrating these components arc of critical importance. Two of the most common component architectures include COIIBA (the Common Object Request Broker Architecture) and Microsoft COM (Common Object Model) [74, 87]. CORBA is a specification created by the Object Management Group (OMG), a consortium of computer companies including Sun, HP, DEC. and IBM. COM is a competing component architecture developed by Microsoft and is the hasis for most applications developed in the Windows environment.When using COM or CORBA, applications are built by assembling components. Each component can be thought of as providing a specific service such as access to a database, performing computational intensive operations, or presenting the user with an interface. Although these services may all exist on a single machine, they may also be distributed across a network of machines. Thus, a database server could provide database access to other machines on the network and be nsed as a component in various other software packages. Component architectures allow components to be completely decoupled, written in different languages, or to exist on different machines. However, the key to using CORBA and COM is that the interfaces between components are precisely defined. Interfaces are specified using an interface definition language (IDL) such as CORBA IDL. The 1DL specification provides a language and platform independent description of all of the available objects, datatypes, and operations supported by a component. With an IDL compiler, the interface description is turned into client and server stubs that must be written by the developer. After a developer fills in these stubs, the component cau be made available for general use by other software clients. Even though CORBA and COM are being used in an increasing number of commercial applications, these systems are unlikely to have a large appeal to computational scientists since they are viewed as being too cumbersome and difficult to use in a scientific setting. The primary benefit that scientists would receive from a component architecture is a well-defined mechanism for gluing software components together. However, given the nature of scientific software and the culture of scientific computing projects, this task can often be accomplished through other means such as object-oriented frameworks or scripting languages. 2.4 Limitations of Other Approaches Although many of the approaches described improve scientific software, they also suffer from a number of drawbacks that has prevented their widespread use in the scientific computing community. This section briefly describes some of these problems. 2.4.1 Poor Performance Scientific applications routinely push the limits of the machines that they run on. Yet. object-oriented frameworks and component architectures have a number of well-known performance problems. In C++, if objects are created through inheritance, there is a per formance penalty due to virtual function calls. Operator overloading and other advanced features often result in the creation and destruction of large numbers of temporary objects [28]. The creation of temporaries is also problematic for very large objects such as million element arrays (especially when memory utilization is critical). Component architectures such as CORBA suffer additional performance penalties since they are often built around RPC-like mechanisms for invoking procedures and methods. A widely cited article in 1994 reported results in which C++ was as much as 700% slower than Fortran [50]. As a result, there has been considerable interest in techniques designed for improving C++ performance. One highly publicized technique involves the use of expression templates [51, 86, 95, 96]. Using expression templates, run time performance comparable with C and Fortran can be achieved for certain operations [95]. However this performance improvement is achieved by expanding arithmetic expressions into nested template definitions. This grossly inflates compilation time and makes debugging extremely difficult since most debuggers do not fully support templates. Given the rapidly changing and experimental nature of scientific applications, this is an unacceptable solution to many computational scientists. In criticizing object-oriented of frameworks, it is important to point out that differences in design have a large impact on performance and that not all frameworks suffer from performance problems. The state of C++ compilers also appeals to be improving [86], 2,4.2 Closed Systems Most frameworks enforce a rigid set of rules that must be followed by software developers. For example, an object-oriented framework typically provides an extensive inheritance hierarchy that must be used by developers when developing new code. Likewise, component architectures such as CORBA and COM precisely define the mechanisms by which software components are constructed and interact with each other. Although the formality provided by these approaches may be appropriate for large programming efforts, it also results in closed systems that complicate the use of existing software. For example, if a scientist wanted Lo incorporate an existing application into such an environment, they would be forced to encapsulate the application inside an adapter class that was compatible with the target framework [43]. Depending on the nature of the original application, this process could be quite complicated. Closed systems also discourage reuse since the components of one system are generally not usable within other systems. For example, COM and CORBA components cannot be easily usedtogether. Likewise, C++ systems based on implementation inheritance make it almost impossible for components to be extracted and used in other systems. 2.4.3 Programming in the Large Many frameworks are designed for large-scale', programming efforts and the development of packages. For example, at Los Alamos National Laboratory, a recenL article about the ASCI (Accelerated Strategic Computing Initiative) project stated "We will be forming code development teams larger than any we have ever attempted to manage, with as many as 20 to 30 staff members each. And these teams will be developing extremely complex software that must run on the world's largest massively parallel computers" [82, p. 3]. Rather than attempting to investigate new scientific problems, these efforts are primarily oriented towards developing production software for solving engineering problems. The formality provided by frameworks is likely to be a suitable mechanism for managing such projects. However, most computational scientists rarely set out to create massive software packages. As a result, the formalism and methodology used to manage large software projects often becomes an obstacle in small projects. 2.4.4 Poor Adaptation to Change Frameworks tend to enforce a particular design model on users. Scientific programs, on the other hand, are usually grown in a piecemeal and adhoc manner. This difference creates two fundamental problems. First, a system that is too rigid may be too difficult to modify and extend with new features (or so difficult to understand that scientists do not know where to start). Likewise, when changes are made, they may be difficult to incorporate. For example, in a component architecture, the addition of new features would require modifications to interface definition files, regeneration of stubs, and so forth. In a scientific setting where software changes rapidly, this clearly presents a problem. 2.4.5 Conceptual Difficulties Finally, for a scientist who has only written Fortran or C programs, jumping into a large object-oriented framework can be overwhelming. Not only must scientists learn a new language to use these systems, they need to learn a whole new vocabulary and mindset for thinking about problems. To further complicate matters, general purpose systems such as CORBA and COM are often bloated with features such as security, quality of service, garbage collection, version control, and fault tolerance. Few scientistshave much interest (or need) to use such features and are easily overwhelmed by the complexity that they introduce. 2.5 Scripting Languages and SWIG In the following chapters, the use of scripting languages and SWIG will be presented as a new approach for building, managing, and using scientific software. This approach is attractive because it solves many of the practical software problems encountered by computational scientists while addressing all of the above limitations. In particular. Performance. Scripting languages can interact with code written in compiled languages. This allows performance critical operations to be easily written in C, C+ + , or Fortran. Open systems. Rather than enforcing a rigid structure, scripting languages make it easy to work with a wide variety of software components. Furthermore, SWIG simplifies the process of incorporating existing packages into a scripting environment regardless of their underlying implementation or design. Programming in the large and small. Although scripting languages have been used in large-scale programming projects, they enforcc very few rules and can be easily used in small projects. This makes them appropriate for a wide variety of scientific programs. Adaptation to change. Using extension building tools such as SWIG, scripting languages can easily respond to rapid changes in the underlying implementation of scientific programs. Furthermore, scripting languages can be easily used with programs that are under development or in an unfinished state. Conceptual simplicity. Scripting languages are simple to learn and use. Furthermore, they can be added to the software already being used by scientists. Thus, scripting is more of an evolutionary improvement rather than an revolutionary change. In addition, it will be shown that scripting languages and SWIG enable scientists to achieve many of the same benefits associated with other approaches. This includes improved modularity and encapsulation, systems integration, development of component architectures, and interactive exploratory problem solving.CHAPTER 3 SCRIPTING LANGUAGES Scripting languages have much to offer scientists because they provide a powerful mechanism for specifying scientific problems, integrating software components, controlling scientific systems. Furthermore, scientists already use simple scripts and scripting languages for a number of other tasks. This section discusses scripting languages, the benefits they bring to scientific computing applications, and the methods by which scripting languages are extended. 3.1 What Is a Scripting Language? It is surprisingly difficult to give precise definition of a scripting language. However, scripting languages share a number of qualities. Component gluing. Rather than building programs from scratch, scripting languages are primarily designed to glue components together. For example, the Unix shell provides an environment for executing and controlling programs as well as moving data between programs using files and pipes, in a similar spirit, scripting languages also can be used in a more fine-grained manner by gluing software libraries together, passing data between individual functions, creating collections of widgets for user interfaces, and so forth. Interpreted. Unlike compiled languages such as C, C++, or Fortran, scripting language programs are interpreted. This eliminates the need for a separate compilation step and allows scripting languages to be run interactively. High-level. Scripting languages provide a variety of useful data structures along with techniques such as dynamic typing. This results in programs that are smaller and easier to develop than in compiled languages.Traditionally, scripting languages have been dismissed as being too simplistic to solve real problems, in (act., almost anything that can be done in a compiled language can also be accomplished in a scripting language. Many modern scripting languages also support object-oriented programming as well as aspects of functional programming found in languages such as Lisp and Scheme [94, 42]. In addition, most scripting languages also provide high-level access to operating system services such as the file system, sockets, and threads. Much of the confusion regarding scripting languages is due to a misunderstanding of the role scripting languages play in relationship to systems programming languages such as C and C++. John Ousterhout writes, System programming languages were designed for building data structures and algorithms from scratch, starting from the most primitive computer elements such as words of memory. In contrast, scripting languages are designed for gluing: They assume the existence of a set of powerful components and are intended primarily for connccting components [77, p. 23]. 3.2 Component Gluing One of t,he most powerful features of scripting languages is their ability to glue software components together. .John Ousterhout writes, I concluded that the only hope for us was a component approach. Rather than building a new application as a self-contained monolith with hundreds of thousands of lines of code, we needed to find a way to divide applications into many smaller reusable components. Ideally, each component would be small enough to be implemented by a small group, and interesting applications could be created by assembling components. In this environment it should be possible to create an exciting new application by developing one new component and then combining it with existing components. The component based approach requires a powerful and flexible "glue" for assembling the components, and it occurred to me that perhaps a shared scripting language could provide that glue [76, p. xviii]. The nature of scripting language "components'' can vary widely. At a minimal level, a component might be a stand-alone program and a scripting language used for job control as found in a Unix shell. Packages such as Expect can also be used to script executables and mimic the input of users [63], However, most scripting languages can also be extended with functions written in compiled languages sucJi as C, C++, and Fortran. In this role, scripting languages can be used to interact with compiled libraries and programs at afunctional level. This makes it possible to use scripting languages as a framework for interacting will) compiled code and building software components. 3.3 High-Level Programming An important aspect of using scripting languages is their support for high-level programming. To understand this, it is helpful to contrast scripting languages with low-level systems programming languages like C, C+ + , and Fortran. In compiled languages there are a few basic datatypes, a set of basic operations, and programming constructs such as loops, control flow, etc. Programs and data structures are generally built from scratch using these primitive features. Scripting languages, on the other hand, supply a rich variety of objects such as lists, associative arrays (i.e., hash tables), arrays, infinite precision integers, and so forth. They also assume the existence of a large set of components. Thus, rather than building applications from scratch, scripting languages allow applications to be built by gluing different components together and managing data with powerful data structures. A second area where scripting languages differ is in their treatment of datatypes. Languages such as C and C++ have strict a type-checking mechanism that checks the validity of code during compilation. Violations of the type system result in compile-time errors. Scripting languages, on the other hand, defer type-checking until run time. Thus, the Python function def add(a,b): return a+b can be used for any two objects that can be legally added. For example, >» add(3,4) # Integers 7 »> add("Hello" ."World'1) # Strings HelloUorld >>> add([3,4,5], [6]) # Lists C3,4,5,6] >» Dynamic typing also benefits component gluing because it makes it possible to combinc and utilize components and objects in a way that is simply not be possible (or easily implemented) in a compiled language. To illustrate this, consider the following Python function:def plot_data(x, y, npoints, color, img): for i in range(0,npoints); img.plot(x[i], y [i],color) This function would work properly with any kind object that defined a "plot" method. This would be checked at run time and the use of an object without this method would simply result in a run-time error. In contrast, the strongly typed nature of C++ would greatly restrict the use of a similar function by forcing it to only operate on a specific types of objects or objects derived from a common base class. As a result, systems progamming languages tend to be much more rigid and formal with respect to the use of objects and the mechanisms used to glue components together. This often makes it more difficult to glue components together and reuse software components. Critics are quick to point out that run-time checking can lead to hidden errors because errors are not detected until code is actually executed. Although this claim has some merit, run-time typing often results in code that is easier to write, more flexible, and highly reusable. Run-time checking has also been used successfully in other object-oriented languages such as Objective-C or Smalltalk [23, 46]. Finally, scripting languages excel at simplifying complicated programming tasks. For example, consider the process of writing a graphical user interface. If written in C or C+ + , it can take hundreds of lines of code to open a window and place a button on the screen. In contrast, this is easily accomplished with a simple two line Tcl/Tk script [77]. The high-level nature of scripting languages make it easier to develop significant applications in a short amount of time. In fact, recent reports confirm this fact by citing huge reductions in code size and development time [77]. The effectiveness of high-level languages has also been described by Frederick Brooks in the Mythical Man-Month: Surely the most powerful stroke for software productivity, reliability, and simplicity has been the progressive use of high-level languages for programming. Most observers credit that development with at least a factor of five in productivity, and with concomitant gains iri reliability, simplicity, and comprehensibility [17, p. 186]. These benefits apply to the use of scripting languages in general but would clearly apply to scientific computing applications. In fact, the problems of traditional software development have already appeared in the scientific literature. Frankly, the limiting factor for future [scientific] systems may well be writing the software itself. Few hard, reliable data points exist for trends in software productivity, but the perception persists that productivity increases have beenglacially slow for programs written in conventional languages such as Fortran, C, Ada, or Java [85, p. 45]. Scripting languages may provide scientists with an alternative approach. 3.4 Scripting and Scientific Computing Scripting techniques have already been used in a variety of scientific applications. Commercial systems such as MATLAB, Mathcmatica, Maple, and IDL provide interactive command-driven interfaces that are remarkably similar to scripting languages [53, 108, 22, 83]. A number of specialized languages such as Yorick and Basis have also been developed for building scientific applications [71, 32], More recently, the Python scripting language has seen increased use in a variety of scientific applications [fifi, 30, 55, 13]. Scripting languages are also widely used in the tools used by scientists. For example, the Visualization Toolkit includes a Tcl/Tk interface [89], Plotting packages, performance analysis tools, and computational steering systems such as SCIRun also make extensive use of scripting languages although this may not be apparent to the user [2, 48, 80]. In many cases, scientists may not be aware that their tools are using scripting languages in a substantial way. To understand the benefits that scripting brings to these systems, consider the fact that many scientific applications are monolithic packages with limited flexibility. More often than not, they are controlled by a series of command line switches or a simple command processor. Furthermore, programs are typically used in a batch processing mode with little if any user involvement. Scripting changes this by encapsulating applications in a highly flexible interpreted environment. This provides a better mechanism for controlling scientific software and allows users to interact with programs and data. Not only that, scripting has a positive impact on the development of scientific software [32]. In particular, Faster development. A surprising portion of many scientific applications is devoted to the handling of input parameters and control flow. Scripting languages already provide this kind of infrastructure. As a result, development can focus on the creation of modules, not the mechanism by which those modules are controlled. Systems in which scripting is applied may experience a reduction in code size [32], Reduced debugging time. Scripting provides an interpreted and interactive environment for interacting with scientific programs. Scientists can query values, executefunctions, and perform operations in a manner similar to that found in a debugger. If data analysis and visualization components are available, these can also be used in the search for bugs. Since this capability is always available, much Less time is spent using debuggers. Rapid prototyping. New features can often be implemented in the scripting language interface first, and moved to compiled code later. Given the long compile times associated with many systems, having an interpreted development environment tends to reduce development time (since new features can be implemented and tested without recompilation). Portability. Most, scripting languages can operate on a variety of architectures including Unix, Windows; and Macintosh systems. By implementing an application within a scripting environment, cross platform support can be achieved with much less effort than before. This is because the scripting environment provides generalized support for platform-dependent operations such as I/O, graphical user interfaces, and process management. Reuse. Scripting encourages the development of modular and reusable code. Tf a suitable collection of modules can be created, they can be reused in other applications. Virtually every computational scientist has utilized packages that make use of interpreted interfaces. Furthermore such interfaces have proven to be highly successful in a variety of commercial systems. Therefore, it is surprising that scripting techniques are not used more frequently in scientific applications. 3.5 Scripting Language Extension Programming Although scripting languages have a number of practical benefits, it is unlikely that scientists will abandon compiled languages any time in the foreseeable future. This is primarily because the performance of scripting languages is sometimes more than three orders of magnitude slower than a compiled language [88]. Despite the other benefits of scripting languages, they are not enough to ofTset the performance penalty that would be incurred by entirely giving up a compiled language like Fortran or C. However., scripting languages can interact with compiled extensions written in C, C+t,or Fortran. This largely eliminates the performance penalty by allowing performance critical code to be written in a compiled language and merely controlled through scripting- In such systems, the underlying application may rely upon high-performance numerical libraries while scripting languages would be used at the highest level of the system for control, problem setup, and user interaction. Jn this role, scripting languages only account for a tiny portion of the overall execution time while computationally intensive operations are still executed in compiled code and dominate the overall execution time. Therefore, the fact that a scripting language runs much times slower than compiled code may be of minimal conc.orn. 3.5.1 Extension Modules To extend a scripting language with compiled code, it is necessary to create an "extension module." An extension module consists of three parts as shown in Figure 3.1. First, there is the C/C++ code that implements the functionality of the module or which corresponds to an existing application that is to be incorporated into a scripting environment. Second, there is wrapper code that is used to provide the glue connecting the scripting interpreter and the underlying C code. Finally, there is a module initialization function. This function is used to register the contents of an extension module with the scripting language interpreter when the module is loaded. When creating an extension module, it is necessary to write the wrapper code and module initialization function. To do this, scripting languages provide a C level API that developers can use to access the scripting interpreter, convert data to and frorn a C representation, report errors, register new commands, create variables, and so forth. Initialization Wrappers C/C+ + F ig u r e 3.1. Extension module org an izatio n 3.5.1.1 Wrapper Functions To execute functions and procedures in a compiled language, it is necessary to write wrapper Functions. The role of a wrapper function is to convert datatypes between languages, provide the logic needed to make the function call, and to handle errors. To illustrate the process, consider a simple C function such as follows: /* Compute n-factorial / int fact(int n) { if (n <= 1) return 1; else return nfact(n-l); > A wrapper function used to access this function from Tc) is shown below [76]. /* A Tel Wrapper Function / int wrap_fact(ClientData clientData, Tcl_Interp interp, int argc, chair argv []) { int result; int argO; if (argc != 2) { Tcl_SetResult(interp, "Wrong # args. fact -[ int } ",TCL_STATIC); return TCL^ERROR; > argO = (int) atol(argv[1]); result = fact(argO); sprintf (interp->result, "‘/eld", (long) result); return TCL_0K; > For Tc5 to access the wrapper function, it must first be registered with the Tel interpreter. This is done in the module initialization function as follows: / A simple Tel module initialization function / int Example_Init(Tcl_Interp interp) { if (interp == 0) return TCL_ERR0R; /* Create a new command 'fact' / Tcl_CreateCommand(interp, "fact", wrap_fact, (ClientData) NULL, (Tcl_CmdDeleteProc ) NULL); > r e t u r n TCL_0K; When the extension module is loaded, the module initialization function is executed. This function registers a new command "fact" with the Tel interpreter. When this command subsequently appears in a script, execution is passed to the wrapper function. The wrapper function collects arguments passed to the function and converts them to a C representation. Since Tel passes all arguments as strings, the wrapper function converts arguments from strings to the appropriate C representation. After conversion, the real C function is executed. Finally, the return value of the the function is converted back into a string and returned to Tel. Although the process has been illustrated for Tel, a similar procedure is used for all scripting languages and detailed examples are shown in Appendix A. 3.5.1.2 Variable Linking Variable linking is the process of accessing global variables in a compiled program from a scripting language. Even though the use of global variables is highly discouraged in software engineering circlcs, they arc used quite frequently in scicntific applications to store the values of various simulation parameters. The simplest way to support global variables is through the use of functions such as the following: // A global variable double Dt; // Get and set the value double Dt_get() { return Dt; > void Dt_set(double d) { Dt = d; > These functions can then be added to the scripting interface as ordinary wrapper functions. Some scripting languages, such as Tel, provide an alternative mechanism that can be used to make global variables appear as ordinary scripting language variables. For example, executing the following C code in the module initialization function Tcl_LinkVar(interp,"Dt", (char ) &Dt, TCL_LINK_DOUBLE); turns Dt into a Tel variable that is mapped directly onto a C global variable. When thisvariable is accessed or modified from the scripting interpreter, the underlying C variable is then accessed directly. Other scripting languages can create special variables where read and write operations are mapped onto functions written in C. For example, in Perl, the following functions can be written. int wrap_set_DtCSV sv, MAGIC mg) { Dt = (double ) SvNV(sv); return 1; > int wrap_get_Dt(SV sv, MAGIC mg) { sv_setnv(sv, (double) Dt); return 1; > When a new value is assigned to Dt, the set method is used to change the value. When the value of Dt is read, the get method is used to retrieve the value. Thus, in a Perl script, Dt would appear, for all practical purposes, like an ordinary variable.1 # Change Dt $Dt = 0.0001; # Calls wrap_set_Dt # Print out the value print $Dt,"\n"; # Calls wrap_get_Dt Support for variable linking varies widely between scripting languages. Global variables can always be accessed through a functional interface. However, if a scripting language offers an alternative mechanism, it can be used to make the scriptiug interface more convenient to the user. 3.5.1.3 Creating Constants Most interesting programs, especially scientific ones, define a variety of constants for setting modes, physical constants, and so forth. In a C program, these might be defined as follows: #define PI 3.14159265359 const double E = 2.71828182846; 1 In Per) these are known as magic variables.Making constants available to a scripting language interpreter can be accomplished by creating scripting variables that contain the corresponding value. This is done by placing special function calls in the module initialization function that create constants when an extension module is loaded. For example, in Python, placing the following function calls in the initialization function would create two constants PyDict_SetItemString(d,"PI", PyFloat_FromDouble(PI)); PyDict_SetItemString(d,"E", PyFloat_FromDouble(E)); 3.5.1.4 Object Manipulation Although the interfaces to functions, variables, and constants are relatively straightforward. C structures, unions, and classes presents a more difficult problem. When working with objects, there are three fundamental problems. First, there is the issue of representation. Second, there is the problem of object creation and destruction. Finally, one must devise a mechanism for executing methods and operations on objects. A common approach to the representation problem is to generate object, handles. A handle is simply a name that is assigned to an object and used in the scripting language interface. Internally, a hash table is used to map handle names into pointers of the appropriate object type. When wrapper functions cxpect an object or pointer to an object, a handle name is used as a key in a hash table lookup. If a match is found, a pointer tn an object is extracted and passed to the C function. If not, an error is generated. To create and destroy objects, it is necessary to create and destroy handles. This is accomplished using special constructor and destructor functions that are added to the scripting language interface. For example, functions to create and destroy Vector objects might look like the following: char create_Vector() { Vector v = new VectorO ; char narne = create_handle_narae(); add_handle(name,v); return name; > void delete_Vector(char name) { Vector v = (Vector) lookup_handle(name); if (!v) error("Not a valid object!"); delete v;remove_handle(name); return; > Although handles allow objects to be created, destroyed, and passed between different C/C++ functions, they do not allow a program to examine the internals of an object. Therefore, to invoke methods and extract internal information, accessor functions can be written. An accessor function provides a functional interface that can be used to manipulate objects given a handle. For example, if the definition of a Vector is struct Vector ■{ double x,y,z; void normalize(); >; the following accessor functions could be used to examine and modify member data. double Vector_x_get(Vector v) { return v->x; > void Vector_x_set(Vector v, double x) { v->x = x; > Likewise, the following accessor function could be used to invoke a member function. void Vector.normalize(Vector v) { v->normalize(); > Using accessor functions, access to objects is controlled entirely through function calls. As a result, a scripting interface can be built by simply creating wrappers around these function calls using earlier techniques. Most modern scripting languages also provide support for object-oriented programming. An alternative approach to wrapping C and C++ objects is to encapsulate them with a scripting wrapper or adapter class. When a wrapper class is used, C and C++ objects are encapsulated inside a scripting language class. This class provides a natural object-oriented interface to the underlying objects and hides implementation details from users. For example, the following Python code illustrates the use of Vector objects when incorporated into a wrapper class. vl = VectorO vl.x = 2vl,y = 3 vl. z = 4 v2 = VectorO v2.x = -1.5 v2.y = 4 v2. z = 5 v2.normalize() d = dot.product(vl,v2) The process of writing scripting language wrapper classes varies widely and is ommit- ted here for the sake of clarity. One approach, based on accessor functions, is discusscd in Chapter 4. A variety of other objcct-oricntcd wrapping techniques can be found in [76, 66, 101, 106, 67, 39], 3.5.2 Compiling an Extension Module To use a module it must be compiled in a form that the scripting language understands. Most modern scripting languages support dynamic linking of extensions [41]. With dynamic linking, extension modules are compiled into shared libraries or dynamic link libraries (DLLs). These libraries can then be loaded by the scripting language at run time. To load a module, a user simply starts the scripting language interpreter and issues a command such as "import foo.:' This command loads the module into memory as a shared library. Immediately after loading, the module initialization function is executed and control returned to the scripting interpreter. At this point, the contents of the module can be used. Although supported on most machines, dynamic linking may not work in all eases. If building modules as shared libraries is not an option (or undesirable) it is also possible to integrate an extension module directly into the scripting language interpreter. To do this, the extension module and the scripting language interpreter are linked together to form a new executable. In the proccss, a new main program is written. This program initializes the scripting language interpreter and initializes the extension module upon startup. Thus, when the user runs the new version of the interpreter, the extension module will automatically be available for use. 3.6 Scripting Versus Commercial Packages Many commercial packages such as MATLAB and IDL can be used as a framework for solving scientific problems [53, 83]. Not only do these systems have significant functionality, but they also have a foreign function interface. This allows a scientistto extend the package with new functionality and to utilize the functionality already provided by the system. For example, MATLAB can be extended with new functions by writing .special wrapper functions in C (68). In reality, the process of writing these wrappers is identical to that found with scripting language extensions. In many respects, these packages can be viewed as domain-specific scripting languages. The system is controlled by an interpreted and interactive language that glues components together and can be extended by writing special wrappers (the same technique used by scripting languages). The main limitation of using commercial packages is their lack of generality and the fact that they are closed systems. For example, the only datatype supported in MATLAB is a matrix. This limited representation makes it difficult to represent nonmatrix objects and apply MATLAB to other domains. Despite the limited generality of such systems, packages like MATLAB are examples of what a scriptable scientific application might look like-a collection of compiled modules controlled by an interactive and interactive language. Since such systems are so similar to scripting languages in both use and design, they will be included in further discussion. Thus, techniques described for extending Perl, Python, or Tel could also be applied to a number of commercial scientific computing packages. 3.7 Scientific Computing and the Problems with Scripting Despite the potential benefits that scripting languages offer scientists, they are not widely used in the scientific computing community. Although much of this may be due to a perception of poor performance, it is most likely due to the difficulty of integrating scripting languages with existing applications. It particular, there are the following problems. The complexity of extension building. Building a scripting language extension is an extremely tedious and complex chore that requires an intimate knowledge of the target scripting language. Most scientists arc simply not interested in this task- The choice of scripting language. Given the complexity of building a scripting interface, the logical next step is to pick the "best" scripting language and use it for everything. Unfortunately, there is no such thing since all scripting languages have strengths and weaknesses depending on the application. For example, Tcl/Tk is pri marily used in the construction of graphical user interfaces, Perl is used extensively for text, processing, and Python for object-oriented programming. In many eases the choice of language may be a matter of personal preference. In any case, it is not inconceivable that one would want to use different scripting interfaces for different tasks. Unfortunately, the heavyweight extension mechanism all but prohibits this. Rapid change. Scientific applications often change to address new problems. Unfortunately, the extension building process is not well-adapted to this environment since new features and changes to interfaces require changes to the underlying wrapper code. Unless these problems can be addressed, it is unlikely that scripting languages will be of much use to scientists. Scientists must be convinced that scripting is simple to use and results in few performance penalties.CHAPTER 4 SWIG 4.1 Compilation of Scripting Components In this chapter, SWIG (Simplified Wrapper and Interface Generator) is described [5, 6, 8]. SWIG is a compiler that has been developed to automatically construct scripting language interfaces to compiled code written in C, C+ + , and Objective-C [61, 31, 23]. Versions of SWIG have been available for public use since February, 1996 and development has been ongoing. SWIG currently supports Perl, Python. Tel, and Guile extension building on Unix, Windows-NT, and Macintosh systems [101, 66, 76, 65]. Experimental modules are also available for Java and MATLAB [38, 53]. This chapter is not intended to serve as a detailed description covering all of SWIG's features. Detailed information about using SWIG can be found in the SWIG Users Manual [9], This chapter primarily focuses on the design, implementation, and operation of the SWIG compiler as well as a variety of associated language issues. 4.2 Related Work Given the difficulty of building scripting extensions, there has been considerable interest in the creation of tools that simplify the task. Rather than writing glue code by hand, an extension building tool allows a user to specify the contents of scripting language component using an interface definition language (IDL). Interface descriptions axe written in this language and compiled into scripting language components. Most scripting-related extension tools fall into the following categories : Stub generators. A stub-generator compiles an IDL file into a file containing a collection of empty function definitions known as "stubs." The stubs contain all of the pieces needed to build a module, but it is up to the user to fill in the stub bodies with the appropriate glue code. Such a technique is most commonly found with distributed applications involving RPC, 1LU, and CORBA, but can also found inscripting generators such as the Modulator tool used lor building Python extensions [93. 25, 74, 66], Language-specific module builders. Most scripting languages have specialized tools for building extensions. For example, h2xs and xsubpp are tools for building Perl extensions, Modulator can be used for building Python extensions, and Tel has a number of tools such as Itcl+H- and ObjectTcl [91, 66, 54, 106], Application-specific generators. Large applications with scripting interfaces may include specialized interface construction tools. For example, the Visualization Toolkit (VTK) includes a YACC-based parser that compiles VTK C-5--I- class definitions into Tel, Python, and Java intcrfaccs [67, 89]. Embedding tools. Embedding tools, such as Embedded Tk (ET) for Tel, provide a mechanism for embedding scripting languages in compiled code [56]. This is a fundamentally different problem than controlling C/C++ code with a scripting language. Rather, these tools address the problem of accessing scripting languages from a compiled language. Although extension building tools can simplify the interface generation process, they vary widely in capabilities and support. Most tools use their own interface definition format, making it nearly impossible to change tools or languages. In some cases, the use of a tool may even be nearly as difficult as writing an extension by hand. Finally, most extension building tools offer little in the way of documentation and support-often being labeled as obscure and magical tools for hackers and gurus. In fact, if one surveys popular scripting language books, almost no mention is made of such tools [76. 105, 101, 66]. This is unfortunate since the use of extension building tools greatly enhances the usefulness of most scripting languages. Very little work appears to have been done in the development of general purpose scripting language extension tools that support both multiple scripting languages and a wide range of C/C++ code. The closest approximation is the interface builder packaged with the Visualization Toolkit, which is able to build to intcrfaccs to Tel, Java, and Python [89]. The ILU system also provides support for multiple languages, but is primarily used for distributed computing applications [25].4.3 Design Goals SWIG shares many of the features found in other interface generation tools, hut attempts to address many of the limitations that make those tools difficult to use. Simply stated, the primary design goals of SWIG are as follows: • Simplicity. • Applicability to existing software. • Support for rapid change. • Separation of interface and implementation. • Extensibility. • Support for multiple scripting languages. Meeting these goals involves a number of tradeoffs and considerations. For example, a tool that is simple to use might not provide the formality required in a very large software project. Likewise a tool that is too general purpose might not be able to produce quality interfaces to each scripting language. For a better understanding of the design, each goal is now described in some detail. 4.3.1 Simplicity To computational scientists, a tool is simple to use if it requires a minimal effort to use effectively. In an ideal setting, tools designed to help scientists should not interfere with the problem solving process. In other words, the use of a software tool should not become the primary focus of a project. For scripting extension building tools, this can be achieved by fully automating the extension building process, making it as easy as possible for users to specify scripting interfaces, and to produce scripting interfaces that are closely mapped to the underlying compiled code. To automate extension building, a compiler should produce a fully functional scripting language module, not a collection of stubs. Ideally, the user should not have to write any of the scripting wrapper code as described in Chapter 3 nor should they be required to modify the output of the compiler. To simplify the specification of interfaces, a compiler should make it as easy as possible for users to seamlessly integrate scripting with their programs. One problemwith many interface generation tools is their reliance upon special interface definition languages (IDLs) that require the user to precisely specify almost all aspects of their application. Although such an approach provides more formality aud precision, it also makes such tools hard to use in the experimental and exploratory environment associated with scientific projects. In such cases, the development of the interface specification may be only slightly less cumbersome than writing wrapper functions by hand. Furthermore, the rapidly changing nature of scientific software complicates the maintenance of interface specifications and may result in situations in which interfaces are inconsistent with the actual implementation. To simplify the specification of interfaces, the ANSI C/C++ declarations found in header files and source files could be used. By specifying interfaces in this manner, scientists would not have to learn a special interface definition language and would be able to quickly build scripting interfaces to existing programs. Such an approach also works well in a rapidly changing software environment since changes to the underlying C implementation arc easily propagated to the scripting interface. Finally, the scripting interfaces produced by the compiler should closely match the underlying C and C+ + code. For example, a C function should be mapped to a scripting language command of the same name, variables mapped to scripting variables, and so forth. In other words, the scripting interface should merely be an extension of the compiled code. This is an important feature because computational scientists are most likely to work with both C/C++ code and scripts. Therefore, the scripting interface should merely expose the underlying functionality to the user in a straightforward manner as opposed to hiding or obscuring it. 4.3.2 Applicability to Existing Software Scientific programs vary widely both in implementation and design. Furthermore, the implementation of such programs may be quite complex-utilizing sophisticated data structures and algorithms. To successfully build scripting interfaces, tools must support a wide range of programming styles and techniques. To accomplish this, the compiler must support a large subset of the programming features found in scientific programs including functions, global variables, constants, and classes. The compiler also needs to support a wide range of datatypes including fundamental types (integers, floating point, strings), structures and objects, arrays, aud pointers. Finally, the compiler needs to behighly adaptable. Rather than requiring users to structure interfaces and components in a precise manner, it should be possible for users to add scripting intcrfaccs to existing software without having to make substantial modifications to that software. 4.3.3 Support for Rapid Change Scientific applications change more rapidly than their commercial counterparts. Interface generation tools must keep pace with this change without becoming a burden. The best way to support rapid change is to automate the interface generation process while making it nearly invisible to the user. By fully automating the compilation of scripting modules and using the same language syntax as the original application, interface generation can be hidden away in the compilation of a program. Thus, when changes are made to that program, they can automatically be reflected in the scripting interface. 4.3.4 Separation of Interface and Implementation One problem with modifying existing applications to operate in a new environment is that those applications may be modified in a way that prevents their use in other settings. For example, if a scientist builds a Tel interface to a scientific application by hand, there is a lendency for Tel specific C code Lo creep inLo the uriginal application. As a result, the program eventually becomes inseparable from its Tel interface. To prevent this, a compiler should strive to maintain a strict separation of the compiled code and its scripting interface. By doing so, the original application will remain general purpose and be usable in other settings (including those that do not involve scripting languages). 4.3.5 Extensibility Just as scientific programs and problems change, the compiler should be extensible in order to handle new situations. There are two cases that need to be considered. First, a user may want to extend or alter the behavior of the compiler to provide a "better" interface to their program. Ideally, there should provide special directives or commands that can be placed directly in interface description files for this purpose. A second important area of extensibility is support for new scripting languages. A variety of scripting languages are currently available and new ones may appear in the future. Thus, the compiler should be general purpose and easily extended to support different languages as appropriate.4.3.6 Support for Multiple Scripting Languages When it comes to C extension building, scripting languages are surprisingly similar. They are all extended with wrapper code and the techniques for writing this wrapper code, building modules, and using extensions are essentially the same. A compiler that exploits this similarity and supports multiple languages has many interesting aspccts. First, it largely eliminates the problem of choosing the l:best" scripting language. Rather, different languages can easily be used and evaluated for the job at hand (or personal preference). Second, it allows applications to simultaneously support a variety of different interfaces. This generally improves the usefulness of an application and allows it to be used in a wide variety of different settings. Finally, a compiler supporting multiple scripting languages would unify a number of extension building efforts and provide a general purpose tool for building scriptable applications regardless of the scripting language being used. This, in turn, allows developers to focus their attention on the creation of scriptable applications, not the specific scripting language that will be used. 4.4 Implementation SWIG is implemented in C++ and consists of three primary components: an ANSI C/C++ parser, a scripting language wrapper code, generator, and a documentation generator as shown in Figure 4.1. The input to SWIG is a subset of the ANSI C/C++ language that is extended with special directives. The output of SWIG is a C or C+ + source file that is compiled and linked with the rest of an application to create a scripting language module. The code generator and documentation generator are extensible to support different scripting languages and documentation formats respectively. Currently, scripting language modules are available for Perl. Python, Tel, and Guile whereas documentation can be generated in HTML, plain text, and LaTeX. Further discussion will focus exclusively on the code generation process while details about the documentation system can be found in the SWIG Users Manual [9]. 4.4.1 Parsing The SWIG parser accepts a subset of ANSI C, C++, and Objective-C and is implemented using YACC [62]. Before parsing, all input files are passed through a C preprocessor chat handles conditional compilation and macro expansion. In addition to normal C code, SWIG understands a number of special directives that are used toTo generate code, au instantiation of a particular language class is created (Tel, Perl, Python, ct,c...) and given to the parser. The setjmodule method is used to set the name of the scripting language extension module. Afterwards, the parser executes methods such as create_function, link_variable. and declare_const to generate wrappers. To illustrate, suppose that the following C declarations were to be encapsulated in a module "Poo." int fact(int); void plot(Image img, double x, double y, int color); double Dt; Hdefine PI 3.14159265359 To construct the scripting language module, SWIG performs the following operations: 1. Create a new language object, lang = new LANGO; 2. Set the module name. lang->set_module("Foo"); 3. Create wrappers. lang->create_function("fact", int, (int)); lang->create_function(llplot", void, (Image , double, double, int)); lang->link_variable("Dt", double); lang->declare_const("PI", double, 3.14159265359); 4.5 SWIG Directives Although the input to SWIG primarily consists of ANSI C/C++ declarations, a number of special directives are also available as shown in Table 4.1. These directives are used to guide the compilation process, provide hints, and customize SWIG's behavior. A full description of the directives can be found in the SWIG Users Manual although a brief description of the most commonly used directives can also be found lu Appendix B [9]. A number of the more interesting directives will also be described in later sections.45 Table 4.1. Commonly used SWIG directives 7.{ ... •/.} ‘/.addmethods '/.apply 7. checkout '/.clear 7.disabledoc ‘/.echo 7.enabledoc "/,except 7.extern 7, import ^include 7, in it 7.{ . . . '/.} 7«inline 7.{ ... 7.} V.module 7.native 7, name 7.new 7,pragma ^readonly y.readwrite 7«rename "/,typedef 7«typemap 7.wrapper /,{ ... '/} 4.6 SWIG Input Files Since SWIG interfaces are built using a mix of ANSI C/C++ declarations and special directives, there are several approaches for constructing an input file. The most common approach is to use a separate "interface file." This file contains a selective list of the C/C++ declarations to be wrapped along with spccial directives. Another common approach is to insert SWIG directives directly into a C header file and to utilize conditional compilation. SWIG defines a symbol SWIG that can be used by the preprocessor for this purpose. Finally, SWIG can extract declarations directly from C source Pdes. 4.7 A Simple SWIG Example To use SWIG., the user specifies an interface using ANSI C declarations such as follows: // file : example.i ‘/♦module example /.{ ^include "example.h" 7.) int fact(int n); double Dt; #define PI 3.14159265359 To build the module, the user runs SWIG and compiles the wrapper code into a shared library as follows:1 'The compilation process varies according the compiler and operating system being used'/. swig -tel example.i Making wrappers for Tel '/. gcc -c -fpic example.vrap.c example.c °/» gcc -shared example_wrap.o example.o -o example.so To use the new mod
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6vq3kwb