The Rstem package provides R language
bindings to the word stemming facilities provided by Martin Porter's Snowball. The package
provides a simple way to compute the stems of each word in a character
vector via the the function
wordStem(). This is
important in text analysis such as documents and email.
The package provides stemming for different languages.
One can query the supported languages via the
Note that certain languages require UTF support.
The package is extensible in two ways.
- The caller can specify C routines to use to create and destroy
the stemming context used by the Snowball code. This can be
to provide support for other languages,
or to implement different approaches to managing contexts.
- Code for additional languages can be retrieved from the
Snowball Web site and incorporated into this source package.
A script in this package provides the mechanism to perform
the downloads and update the C code to be aware of these
This is a convenient way to add new languages or simply
update the code for the existing languages.
The package currently is written for R. However, it is relatively
straightforward to transform it to work with S-Plus (5.0 or greater).
Duncan Temple Lang
Last modified: Thu Aug 18 17:02:00 PDT 2011