Much statistical software, and scientific software more generally, is
written in C/C++ or Fortran. Yet, over the last decade, there has
been a move to use high-level languages such as R, S-Plus and Matlab
for scientific work, and Perl, Python, Java and other interpreted
languages. The reason is human productivity. It is easier to express
ideas in these higher-level languages, and it is easier to maintain
and understand such code. Most of these languages provide a mechanism
to access the existing C code. However, one traditionally has to
manually create the wrapper code that bridges the divide between the
two languages. Automating this process is very desirable. It would
relieve the human from tedious and error-prone tasks. And it would
facilitate users rapidly exploring an existing library of software
without a large time commitment that wouldn't be warranted if
the need were marginal.
There are serveral possible approaches for creating such wrappers.
We have tried modifying a retargettable C compiler via the
Slcc package.
And we are working on a way to analyze C/C++ code from
output from gcc/g++, the GNU compiler suite.
These approaches yield the the richest information
that is useful not just for generating bindings, but
also for analyzing code.
A more specific solution is SWIG
(Simplified Wrapper Interface Generator). This is a tool that
provides a mechanism to generate interfaces to C/C++ code for
high-level languages. The approach is not as rich as those mentioned
above and has some shortcomings, but is very powerful. Its typemap
system is especially flexible as it is a powerful template/pattern
matching system that exists outside of the native implementation. As a
result it allows both users and SWIG module developers of these
interface generation tools to customize the bindings to individual
pieces of code. It is this customizability that makes SWIG
attractive.
The shortcomings are not very severe and some would
argue that they are features.
The difference is the target task.
- Reimplements the C++ parser and so may not be entirely consistent
with any actual compiler
- "Requires" the user to create a separate input file. This has
advantages, but we would prefer to avoid it.
- Does not give information about the content of the source code
for purposes other than creating interfaces. (The -dump_tree
options and the XML output gives a lot of information, but not
about the content of the source, just the top-level items that one
would want for generating bindings.)
- Requires the user to provide information about memory
management, lengths of arrays in pointers, active fields in
unions, etc. Many of these can be "guessed" or inferred
by code analysis.
As I mention, SWIG is not the final goal. In addition to the specific
goal of generating bindings, we may also want to analyze that code
and understand not just how to interact with it, but how it is
written and what are its important characteristics. Automation and
code analysis are two different goals, but of course the latter
facilitates the former. To that end, I have been working on a more
ambitious scheme than using SWIG. Instead, we can use gcc to dump its
information about the entire code for a piece of code and then
explore that. From this, we can get information about the variables
and routines that are defined and the different data structures
involved. From these, we can generate bindings to R or another
language rather readily. Alternatively, we can use this information
to generate input to SWIG and generate bindings using that approach.
The motivation for developing RSWIG is to use the GccTranslationUnit
approach to create input to RSWIG to generate bindings. This would
be able to leverage the flexibility of SWIG, but get the inference
about the entire code from a more complete analysis. This input to
SWIG would be able to overcome the shortcomings mentioned above,
while still allowing flexible customization by the user via the
typemap facility provided by SWIG.
We are currently working on the perl-based version
of GccTranslationUnit that will process the
output of the gcc -fdump-translation-unit.
One thing that should be reasonably clear is that the SWIG bindings
will not remove the need to understand the C/C++ library to which one
is interfacing. The automation cannot provide knowledge to the
programmer about how to use the facilities provided more readily by
the bindings. Rather, these bindings merely make the individual
routines and data structures accessible to the R programmer. This
makes it considerably easier to experiment with and explore the
software. The focus is on learning about the functionality rather
than the tedium of mapping the details to R code.
If any one is doing something similar or wants to
join in, please contact me.
The Software
I am making the software available "as is" in its very early stages.
It is likely that it will be rewritten. I have been exploring and
getting to understand SWIG via the current code and validating and
experimenting with different approaches to generating the code.
The interface is not likely to change extensively.
The typemaps such as scoercein and scoerceout are not likely
to change. The naming of the classes may change, but this
unlikely to affect the end-user.
The hope is that interested people will try this and
point out any errors or oversights and therefore
simplify the testing and debugging process.
This is immensely helpful.
-
- The source
- This is a modified version of swig-1.3.25 that contains
the R module and the relevant changes in the Makefile
to build with this code. The instructions to install
it are the same as the regular SWIG.
See the SWIG website.
Also, read these basic instructions
to get started.
This code is extremely experimental and is known to be
incomplete. It works for basic C files.
Any feedback is helpful.
-
- Getting Started
- Some notes on installing and using the
R module in this version of SWIG.
-
- Examples
- The directory contains examples of using typemaps and
specifying rules along with test cases for the module.
-
- TODO list
- A relatively extensive list of things that I am thinking about
and need to do with respect to RSWIG.
-
- FAQ
- Some things that may help solve some anomalies.
-
- Some notes on the interface generation
- These are just some ramblings aloud. They are helpful
to understand what the interface is trying to do, but
not set in stone. So things might change.
Acknowledgements
Thanks to the team who
developed SWIG and especially Dave Beazley.
Duncan
Temple Lang <duncan@wald.ucdavis.edu>
Last modified: Mon Aug 22 09:51:43 PDT 2005