Embedding R in Postgres - The REmbeddedPostgres package
The REmbeddedPostgres package allows one to embed the R interpreter in the Postgres relational database and use R functions as if they were built-in SQL functions. This is done by defining several new Procedural Languages within Postgres whose functions are implemented in R. This means that one can implement
- aggregate and
functions in Postgres with regular R functions.
There are several simple motivating ideas for this package.
- It can be much faster to perform the computations in the server rather than transferring the data from a query to the client and doing the same computations there.
- Statistical functionality is available to SQL users without them having to learn a new language.
- We can use existing software (i.e. R functions) typically with no changes.
There is no security model in the current version of R. Nor is it thread-safe. While this is likely to change in the coming months, please be aware of the potential dangers in using this package in publically accessible Postgres servers.
The package is available as a GNU zipped tar file. It installs in the same way as a regular R package, i.e.
R INSTALL -c -l whichever/directory REmbeddedPostgres_0.2.tar.gz
You will need to have the source of the Postgres distribution since this package needs access to the public but internal aspects of the database engine, not the client libraries.
- Scenarios for Using R within a Relational Database Management System Server
- A perspective on the motivation of embedding R within a database server wih some detailed examples of the different function types.
- An overview of embedding R in Postgres.
- Annotated examples
- A more general discussion of embedding languages in Postgres.
- R function help pages
- HTML, PDF
- FAQ for the package
Identifies and explains some of the issues that can go wrong when compiling, installing and using the package.