Rcompression package for in-memory compression
Last Release:
0.5-0 (17 Sep 2008)
This package is a basic interface to the zlib and bzip2 facilities
for compressing and uncompressing data that are in memory rather
than in files. This is useful when the data we have to work with
is never in a file on our local file system but rather
given to us as part of a transaction with a remote server.
For example, we might receive a gzipped-text file from
retrieving a URI via the RCurl
package. Or we might receive a compressed micro-array file from a Web
service
via the SSOAP package.
Rather than having to collect that data, then write it to disk
and then read it back into R, we can uncompress it directly
in memory. This avoids unecessary I/O and also improves
"security" as our scripts do not need to access the file system.
(This is currently not that important as R is not secure in any way,
but as we use R more extensively in embedded situations,
e.g. in databases, Web servers, spreadsheets, other languages like
Perl & Python, etc., this does become an issue).
The current interface is quite basic. It provides access to
- standard Adler compress/decompress from zlib
- gunzip for uncompressing a GNU zip'ped data vector
- bunzip2 for inflating a bzip2'ed data vector.
- tar archives (compressed using GZ or regular if read into R
ahead of time)
At present, one must have the entire data vector in memory before the
call and the tools operate on it directly. It is entirely feasible to
allow us to generalize this and have the tools ask for more data as it
is needed by the decompression libraries. And we can do the same
thing with the output. In this way, it could work with the existing
connections mechanism in R at the R level. Unfortunately, the
connections API at the C-level is not public and it is not amenable to
extensions implemented in R packages, i.e. externally from the R
source code.
Installation
You will need to have libz (a.ka. zlib) and libbz2
(a.k.a bzip2) installed.
The configuration script attempts to find these but is currently
not very flexible or aggressive about finding them.
I will add more facilities as people start to use this.
So please send me mail rather than just hacking the code yourself.
(Although sending your changes is even better!)
You can find the libraries at
Both are trivial to install on almost all machines.
Duncan Temple Lang
<duncan@wald.ucdavis.edu>
Last modified: Mon Feb 11 09:48:57 NZDT 2008