Let's think about a reasonably simple example
of a document which describes the ideas of the Central Limit Theorem.
We would have a description of the idea,
perhaps a statement of the theorem in mathematical form
and then an example of where we sample from a population
and see how the distribution of the mean changes as the sample size
increases.
To make things slightly more interesting for the student, we
might add a section that allows them to specify the
sample size and population distribution
and the statistic of interest, e.g. the median rather than the mean.
We want to engage the reader so we give them these interactive
controls and hope that they will explore different aspects of the
theory.
We can create the first part - the non-interactive aspect -
as regular HTML. We can create the plots of the
population density and the sample mean distributions
for different sample sizes as JPEG images
from within R and then have them displayed
in HTML using <img src="filename.jpg">
elements in the HTML document.
The interactive component requires a little more
work but we can think about it in two different
ways.
We want a choice menu for the selection of the statistic, and
similarly for the choice of population distribution from a fixed
set. (We may want to allow a free form R expression, but we'll return
to that later.)
We might have a text field to
specify the sample size, or more interestingly a spin box which would
constrain the content.
And we want to display the density of the population and
also the sampled values as R plots.
We could put these in two plots in the same R graphics device
using
par(mfrow = c(1, 2)), for example.
Alternatively, we could put them in separate,
independent graphics devices.
And, regardless of which approach, the device(s) needs to be inside
the HTML window as part of the display and not a stand-alone window.
If we don't get too ambitious with the controls (e.g. using a spin
box), we can use a simple HTML form to provide the choice menus via
the SELECT & OPTION elements. And we can use a TEXTAREA element
to allow the specification of the sample size. And we have a button
to perform the updates, although we may want the display to be updated
whenever any of the inputs change.
The graphics devices are a little trickier because clearly there is no
HTML tag for an R graphics device. So we have to add our own
mechanism for embedding an R graphics device within an HTML window.
How do we do this? There are two approaches. One is to introduce or
make up our own HTML tag such as <Rdevice> to identify an
embedded R graphics device.
Unfortunately, if we present this document to another
HTML viewer, e.g. a Web browser, it will not understand
this tag and so not process it correctly.
A more general approach is to use the more generic <OBJECT> tag
that HTML provides. This says that the content is creating an
embedded object within the document and that the details of how to
handle are left to the application and are based on the attributes and
content of the HTML element. We specify the type of embedded object
via the type attribute which specifies the MIME type of the target
application. Well, of course, we don't have an official MIME type for
an R graphics device, so we make one up and try to ensure that it
doesn't conflict with an actual, official MIME type. So, following
the guidelines for this, we use a MIME type value with a prefix of
"app/x", e.g. "app/x-R-device".
We can specify the dimensions for the device in either approach using
attributes. And we can specify code to crete the initial contents or
do any other R calculations within a child of either the Rdevice or
OBJECT tag. So either approach works just fine. We might prefer one
over the other based on whether we think the document will be viewed
only in R (use Rdevice) or more generally in other browsers (use
OBJECT). But the OBJECT approach will work for either and so may be
prefferable, but it is a marginal decision.
Suppose we want to use the
<Rdevice> approach.
In our document, we might have
<Rdevice name="population" width="300" height="300">
curve(rnorm(x, 0, 1), -3, 3, col = "red")
</Rdevice>
This says to create an R graphics device embedded within
the document at this location within the document
and that it should be 300 x 300 pixels in dimension.
We also want to be able to refer to it as the R variable
population. And finally,
we want the initial display on the device to be
a normal density created with the R expression.
It is the parser that will see this HTML text and we need to help it
to understand what we want to have happen. To this end, we have some
facilities in R that make this relatively easy. But they sit on
lower-level facilities provided by wxWidgets.
Let's look at these lower-level facilities first and then see how
we have made it slightly simpler in R.
When we create a wxHtmlWindow object, we can ask it for
the associated HTML parser that is specific to that window.
The method
wxHtmlWindow_GetParser() does this for us:
html = wxHtmlWindow(parent, wxID_ANY)
parser = html$GetParser()
|
|---|
Now, we want to tell the parser that whenever it sees
the tag <Rdevice>,
it should call a function we give it to create the
embedded object. The function will find all the specifications
from the HTML element and its child nodes and
create the graphics device and insert it into the target
HTML window.
The parser doesn't need to know anything about the tag or
what the function does, but will simply hand control over to our function
and expect things to be done for it. The function should return
TRUE or
FALSE to indicate success or failure.
We need to write the function and then tell the HTML parser to use it
for each <Rdevice> node it sees. Let's assume we have written
the function and called it
RdeviceHTMLHandler()
. Then,
to connect it to the Rdevice nodes for the parser, we have to create
and register a wxHtmlTagHandler. This is a C++ class that wxWidgets
provides. The idea is that we create an instance of such a class with
the name of the node that it can handle and then register it with the
parser. When the parser encounters a node, it finds the relevant
handler and calls its HandleTag() method. Now, we need it to call our
R function, so our handler needs to be slightly different from a
standard C++ handler. Its HandleTag() method in C++ needs to invoke
our R function. To do this, we have a new C++ class named
RwxHtmlWinTagHandler that inherits from wxHtmlTagHandler and provides a
different implementation of the HandleTag() method. It just calls the
R function that we specified when we create an instance of this
RwxHtmlWinTagHandler class.
So we create an instance of this new type of handler
and then add it to the parser.
We do these two steps with the R code
handler = RwxHtmlWinTagHandler("Rdevice", RdeviceHTMLHandler)
parser$AddTagHandler(handler)
|
|---|
Now, when the parser encounters an <Rdevice> node in the
HTML document, it will call our
RdeviceHTMLHandler()
.
So what should this handler function look like? Firstly, it will be
called with three arguments: the handler object itself that we
created, the object representing the HTML tag that we are to process,
and lastly the parser object.
So our function should be defined as
RdeviceHTMLHandler =
function(h, tag, parser)
{
}
|
|---|
We typically don't make much use of the handler.
It is the tag and the parser we work on.
Now, what should this function do.
It should create a new graphics device that is embedded
within the HTML document.
We use the
RwxDevice package for this,
so we need to ensure that it is loaded
via a call
library(RwxDevice).
Next, we create the device via a call
to
RwxCanvas()
.
That function needs the parent widget for the new device canvas
and the parent should be the HTML window associated with the parser.
We don't have easy access to that, but we can access it via the
parser with the call
parser$GetWindow().
So we can create our new canvas for the device with
canvas = RwxCanvas(parser$GetWindow())
|
|---|
We'll come back to providing information about the size of the canvas.
After we have created the canvas on which R might draw plots, we need
to tell R that it can be used as a regular graphics device.
We call the function
asWxDevice()
to do this,
passing it the newly created canvas object.
And the last step is to put the canvas into the appropriate
place in the HTML document. The canvas has the HTML window
as its parent, but it doesn't know where to locate itself.
That is the job of the layout of the document. So, we call
insertEmbeddedComponent()
, giving
it the
canvas and the
parser.
It then arranges to put the widget into the right place.
At this point, we have the basic graphics device.
The code for the handler is
RdeviceHTMLHandler =
function(h, tag, parser)
{
library(RwxDevice)
canvas = RwxCanvas(parser$GetWindow())
asWxDevice(canvas)
insertEmbeddedComponent(canvas, parser)
TRUE
}
|
|---|
Note that we return
TRUE to indicate that we successfully
processed the tag. If we wanted, we could return the canvas object
and the internal handler code would take care of calling
insertEmbeddedComponent()
(or doing it internally,
actually).
The only piece that we have omitted is that we have not dealt
with the width and size attributes or the name, and we also want to process the R
code within the <Rdevice>.
Let's start with dimensions.
We need to ask the HTML tag object (
tag)
whether it has a width or a height attribute.
The
tag object is an instance of the wxHtmlTag class in wxWidgets
and has several methods for accessing its information. See the documentation.
We can use
tag$HasParam("width")
to see if it has an attribute/parameter named "width".
If this returns
TRUE, we get the value with
tag$GetParam("width")
and the coerce it to an integer.
Since the values are expected to be numbers,
we can also use
getParamNumber()
and provide a default value and a method to coerce the string
value if it is present to the target type, an integer.
getParamNumber(tag, "width", -1, as.integer)
|
|---|
Note that when we use -1 for a size dimension, wxWidgets
will understand that as a default value and determine the
correct value rather than interpreting that value literally
as a dimension!
So we can change our function slightly to use any
width and height attributes as
sz = c(getParamNumber(tag, "width", -1), getParamNumber(tag, "height", -1))
canvas = RwxCanvas(parser$GetWindow(), size = sz)
|
|---|
We can deal with a name attribute also by checking if it
was provided in the HTML tag and if so, accessing its value.
if(tag$HasParam("name"))
name = tag$GetParam("name")
|
|---|
Now the question is what we do with it. The intent is that we assign
the value of the local
canvas variable to a globally
accessible variable identified by the value of the "name" attribute,
e.g.
population in our example.
We do this in our function as
if(tag$HasParam("name"))
assign(tag$GetParam("name"), canvas, globalenv())
|
|---|
Precisely where we want the assignment to be done, i.e. in which
environment or symbol table is something we will talk about much
later. Our code above puts it into our work session. If we have two
HTML windows each of which has a device with the same name,
e.g. displaying the same document, we will have problems. So we need
to allow each window have its own private space for these variables
and arrange for the code to look for them appropriately.
The last bit of work we have to do is to collect up the text within
the <Rdevice> node and treat it as an R command. The code might
be to produce an R plot, or might do some behind the scenes work such
as registering an event handler on the device, etc. From the point of
view of our tag handler function, we don't care what the code does; we
just want to evaluate it as a regular R command. But it is not being
typed at the R prompt or source()'d in from a file. So we need a new
mechanism.
Firstly, we can get the text in the tag using
txt = tag$GetContent(parser)
|
|---|
This a convenience function provided by
RwxWidgets
which does several low-level operations.
The result is that, in our example,
txt contains the
string 'curve(rnorm(x, 0, 1), -3, 3, col = "red")'
We want to evaluate this as if it were typed at
the R prompt.
To do this, we must first parse it to verify that it is
a legal command and to turn it into something R can evaluate.
Then we can evaluate it.
We do this with the code
expr = parse(text = txt)
eval(expr, globalenv())
|
|---|
Note that we have to tell R "where" to evaluate the expression and we
use
globalenv()
for convenience. This controls to what
variables this expression can refer. For example, if it needed to see
the
canvas variable, it would not be able to see it as
that is local to the particular call of our handler function. But
there are ways to tell
eval()
where it should evaluate
the expression so that it could see the variables in our function
call. But then we would have to agree with the users about the names
for identifying the different variables. In this case, if the user
wants to refer to the RwxCanvas object, she should use a "name"
attribute and we should assign the object to that name before
evaluating the code in the body of the tag.
Let's put all this together.
RdeviceHTMLHandler =
function(h, tag, parser)
{
library(RwxDevice)
sz = c(getParamNumber(tag, "width", -1), getParamNumber(tag, "height", -1))
canvas = RwxCanvas(parser$GetWindow(), size = sz)
asWxDevice(canvas)
insertEmbeddedComponent(canvas, parser)
if(tag$HasParam("name"))
assign(tag$GetParam("name"), canvas, globalenv())
txt = tag$GetContent(parser)
if(nchar(txt)) {
expr = parse(text = txt)
eval(expr, globalenv())
}
TRUE
}
|
|---|
Then, we register this with the parser as
html = wxHtmlWindow(parent, wxID_ANY)
parser = html$GetParser()
handler = RwxHtmlWinTagHandler("Rdevice", RdeviceHTMLHandler)
parser$AddTagHandler(handler)
|
|---|
There is a slightly simplified way to do the last part of this.
The function
createHtmlViewer()
in
RwxWidgets
arranges to create an HTML windget and load a document.
And it also takes a list of tag handlers.
We can use this as
createHtmlViewer("myDoc.html", win,
tagHandlers = htmlTagHandlers(list(Rdevice = RdeviceHTMLHandler)))
|
|---|
It doesn't save us much effort, but is somewhat convenient.
If we were to be well-behaved HTML citizens
and create proper HTML, we would use the OBJECT
tag to identify our R graphics device.
Our HTML node would look like
<OBJECT type="app/x-R-device" width="300" height="300">
<PARAM name="init" value="curve(rnorm(x, 0, 1), -3, 3, col = 'red')"/>
</OBJECT>
The same information is present but the type of the object is now no
longer in the tag name but in the type attribute. And the
initialization code is explicitly in a child node named PARAM with
name and value attributes. This is all very general and so HTML can
support arbitrary embedded OBJECTs, but it is not necessarily very
convenient.
This generality
means that an HTML parser may have to deal with numerous object
types. So we can provide a simple tag parser for
the generic OBJECT tag, and then find the value of
the type attribute.
Then we can use that to find the relevant tag handler function
for the tag (OBJECT, type) pairing. We
provide that in our
createHtmlViewer()
function
via the
tagHandlers argument and
the
htmlTagHandlers()
function.
If we have a handler function, say named foo, to handle this
(OBJECT, "app/x-R-device") pairing,
we can register it to be called with
createHtmlViewer("myDoc.html", win,
tagHandlers = htmlTagHandlers(objectHandlers = c(defaultObjectHandlers(),
"app/x-R-device" = foo)))
|
|---|
Now that we can arrange to be invoked,
how do we actually perform the processing of the node.
Again, we are given the tag and the parser
and we have to create the graphics device, etc.
That code is essentially unchanged.
The only potentially difficult aspect is how we
process the <PARAM> sub-node
so that we can evaluate the initialization code.
When we had this in the <Rdevice>
tag, we specified the format as free-flowing text.
Now we are dealing with a structured HTML node
as the content or inner part of the
tag we are handling.
So we need to deal with it more carefully;
we can't just suck it up as raw text.
There are two ways to go about this. The first approach is to walk
the children (just one in this case) and process the sub-nodes
recursively. We'll assume that the variable
tag is the
top-level <OBJECT> node. We ask it for its children with the
function
wxHtmlTag_GetChildren()
.
kids = tag$GetChildren(TRUE)
|
|---|
This returns a list with an element for each direct child
under this tag.
Then, we can process each of those.
In our example, we will have
a single node corresponding to
the <PARAM> node.
Then, we can access its attributes using
if(kids[[1]]$GetParam("name") == "call")
eval(parse(text = kids[[1]]$GetParam("value")))
|
|---|
The approach of recursively processing the sub-nodes is perfectly
natural. It is a little cumbersome if we already have general
top-level tag handlers for nodes that might appear as sub-nodes. For
example, suppose we register tag handlers for "button" and within our
Rdevice node we also allow a "button" node. If we are recursively
processing the nodes by hand, then we have to replicate the tag
handler code or arrange to call our original handler's function. A
different approach would be to tell the parser to continue to process
all of the sub-nodes under this node that we are currently handling
and to stop when it has finished just those nodes. The parser would
then do this using the registered handlers and we would get control
back at the end of that sub-parsing step. If we arrange for those
general handlers to store their information somewhere that we can
access, then we have essentially picked up all the information from
the sub-nodes without having to manually navigate the nodes. The way
we case this sub-parsing to happen is by calling the tag handler's
ParseInner() method on the specified tag.
function(handler, tag, parser)
{
handler$ParseInner(tag)
}
|
|---|
The code in
htmlFormTagHandlers()
provides an example for using this approach,
in particular the
handlers for the tags
<FORM> and <SELECT>.