Much statistical software, and scientific software more generally, is written in C/C++ or Fortran. Yet, over the last decade, there has been a move to use high-level languages such as R, S-Plus and Matlab for scientific work, and Perl, Python, Java and other interpreted languages. The reason is human productivity. It is easier to express ideas in these higher-level languages, and it is easier to maintain and understand such code. Most of these languages provide a mechanism to access the existing C code. However, one traditionally has to manually create the wrapper code that bridges the divide between the two languages. Automating this process is very desirable. It would relieve the human from tedious and error-prone tasks. And it would facilitate users rapidly exploring an existing library of software without a large time commitment that wouldn't be warranted if the need were marginal.

There are serveral possible approaches for creating such wrappers. We have tried modifying a retargettable C compiler via the Slcc package. And we are working on a way to analyze C/C++ code from output from gcc/g++, the GNU compiler suite. These approaches yield the the richest information that is useful not just for generating bindings, but also for analyzing code.

A more specific solution is SWIG (Simplified Wrapper Interface Generator). This is a tool that provides a mechanism to generate interfaces to C/C++ code for high-level languages. The approach is not as rich as those mentioned above and has some shortcomings, but is very powerful. Its typemap system is especially flexible as it is a powerful template/pattern matching system that exists outside of the native implementation. As a result it allows both users and SWIG module developers of these interface generation tools to customize the bindings to individual pieces of code. It is this customizability that makes SWIG attractive.

The shortcomings are not very severe and some would argue that they are features. The difference is the target task.

As I mention, SWIG is not the final goal. In addition to the specific goal of generating bindings, we may also want to analyze that code and understand not just how to interact with it, but how it is written and what are its important characteristics. Automation and code analysis are two different goals, but of course the latter facilitates the former. To that end, I have been working on a more ambitious scheme than using SWIG. Instead, we can use gcc to dump its information about the entire code for a piece of code and then explore that. From this, we can get information about the variables and routines that are defined and the different data structures involved. From these, we can generate bindings to R or another language rather readily. Alternatively, we can use this information to generate input to SWIG and generate bindings using that approach. The motivation for developing RSWIG is to use the GccTranslationUnit approach to create input to RSWIG to generate bindings. This would be able to leverage the flexibility of SWIG, but get the inference about the entire code from a more complete analysis. This input to SWIG would be able to overcome the shortcomings mentioned above, while still allowing flexible customization by the user via the typemap facility provided by SWIG.

We are currently working on the perl-based version of GccTranslationUnit that will process the output of the gcc -fdump-translation-unit.

One thing that should be reasonably clear is that the SWIG bindings will not remove the need to understand the C/C++ library to which one is interfacing. The automation cannot provide knowledge to the programmer about how to use the facilities provided more readily by the bindings. Rather, these bindings merely make the individual routines and data structures accessible to the R programmer. This makes it considerably easier to experiment with and explore the software. The focus is on learning about the functionality rather than the tedium of mapping the details to R code.

If any one is doing something similar or wants to join in, please contact me.

The Software

I am making the software available "as is" in its very early stages. It is likely that it will be rewritten. I have been exploring and getting to understand SWIG via the current code and validating and experimenting with different approaches to generating the code. The interface is not likely to change extensively. The typemaps such as scoercein and scoerceout are not likely to change. The naming of the classes may change, but this unlikely to affect the end-user.

The hope is that interested people will try this and point out any errors or oversights and therefore simplify the testing and debugging process. This is immensely helpful.

  • The source
    This is a modified version of swig-1.3.25 that contains the R module and the relevant changes in the Makefile to build with this code. The instructions to install it are the same as the regular SWIG. See the SWIG website. Also, read these basic instructions to get started.
    This code is extremely experimental and is known to be incomplete. It works for basic C files. Any feedback is helpful.
  • Getting Started
  • Some notes on installing and using the R module in this version of SWIG.
  • Examples
    The directory contains examples of using typemaps and specifying rules along with test cases for the module.
  • TODO list
    A relatively extensive list of things that I am thinking about and need to do with respect to RSWIG.
  • FAQ
    Some things that may help solve some anomalies.
  • Some notes on the interface generation
  • These are just some ramblings aloud. They are helpful to understand what the interface is trying to do, but not set in stone. So things might change.
  • Acknowledgements

    Thanks to the team who developed SWIG and especially Dave Beazley.
    Duncan Temple Lang <duncan@wald.ucdavis.edu>
    Last modified: Mon Aug 22 09:51:43 PDT 2005