Table of Contents

Using the wxHTML widget
An example
The OBJECT approach
Connecting the pieces

Using the wxHTML widget

One of the important components of the wxWidget toolkit from our perspective is the wxHtmlWindow widget and support for displaying HTML in a customizable way. We make use of this in the StatDocs project and for creating dynamic, interactive documents in which the reader can control the calculations and perform "what if" alternatives. The wxHtmlWindow and associated wxWidgets classes are quite flexible. We can specify handlers for the parser to use for existing HTML tags and also for new tags that are not part of the HTML specification, but that we make up and include in our documents. We can process these tags and insert arbitrary content into the document to be displayed in its place. This makes the content dynamic, i.e. we compute it at run-time. This content can even include arbitrary widgets/GUI controls. It is this that allows the content to be interactive. This is like what we did with SNetscape, having R embedded inside the Web browser and processing OBJECT tags. However, this is done entirely within R and simplifies the installation, etc. and works on various platforms.

In this article, we will outline the basic steps and pieces in the RwxWidgets for working with the wxHtmlWindow and its documents. Some of these facilities come directly from wxWidgets, and others are higher-level or value-added functionality that we have provided within R.

One can create a GUI using an HTML document with tags that will be converted into GUI controls. Such documents are a nice way to mix the controls with text in a reasonably easy way to edit. One can change the text and and the controls and re-display the document. The alternative is to create a regular GUI and insert the elements into containers and specify how they are to be resized. This is more complex, at least initially, for many people. However, it is more flexible as the resizing of the top-level window does propogate to the GUI's elements. So the two approaches are different and should be used appropriately.

An example

Let's think about a reasonably simple example of a document which describes the ideas of the Central Limit Theorem. We would have a description of the idea, perhaps a statement of the theorem in mathematical form and then an example of where we sample from a population and see how the distribution of the mean changes as the sample size increases. To make things slightly more interesting for the student, we might add a section that allows them to specify the sample size and population distribution and the statistic of interest, e.g. the median rather than the mean. We want to engage the reader so we give them these interactive controls and hope that they will explore different aspects of the theory.

We can create the first part - the non-interactive aspect - as regular HTML. We can create the plots of the population density and the sample mean distributions for different sample sizes as JPEG images from within R and then have them displayed in HTML using <img src="filename.jpg"> elements in the HTML document. The interactive component requires a little more work but we can think about it in two different ways.

We want a choice menu for the selection of the statistic, and similarly for the choice of population distribution from a fixed set. (We may want to allow a free form R expression, but we'll return to that later.) We might have a text field to specify the sample size, or more interestingly a spin box which would constrain the content. And we want to display the density of the population and also the sampled values as R plots. We could put these in two plots in the same R graphics device using par(mfrow = c(1, 2)), for example. Alternatively, we could put them in separate, independent graphics devices. And, regardless of which approach, the device(s) needs to be inside the HTML window as part of the display and not a stand-alone window.

If we don't get too ambitious with the controls (e.g. using a spin box), we can use a simple HTML form to provide the choice menus via the SELECT & OPTION elements. And we can use a TEXTAREA element to allow the specification of the sample size. And we have a button to perform the updates, although we may want the display to be updated whenever any of the inputs change.

The graphics devices are a little trickier because clearly there is no HTML tag for an R graphics device. So we have to add our own mechanism for embedding an R graphics device within an HTML window. How do we do this? There are two approaches. One is to introduce or make up our own HTML tag such as <Rdevice> to identify an embedded R graphics device. Unfortunately, if we present this document to another HTML viewer, e.g. a Web browser, it will not understand this tag and so not process it correctly.

A more general approach is to use the more generic <OBJECT> tag that HTML provides. This says that the content is creating an embedded object within the document and that the details of how to handle are left to the application and are based on the attributes and content of the HTML element. We specify the type of embedded object via the type attribute which specifies the MIME type of the target application. Well, of course, we don't have an official MIME type for an R graphics device, so we make one up and try to ensure that it doesn't conflict with an actual, official MIME type. So, following the guidelines for this, we use a MIME type value with a prefix of "app/x", e.g. "app/x-R-device".

We can specify the dimensions for the device in either approach using attributes. And we can specify code to crete the initial contents or do any other R calculations within a child of either the Rdevice or OBJECT tag. So either approach works just fine. We might prefer one over the other based on whether we think the document will be viewed only in R (use Rdevice) or more generally in other browsers (use OBJECT). But the OBJECT approach will work for either and so may be prefferable, but it is a marginal decision.

Suppose we want to use the <Rdevice> approach. In our document, we might have



<Rdevice name="population" width="300" height="300">
 curve(rnorm(x, 0, 1), -3, 3, col = "red")
</Rdevice>

This says to create an R graphics device embedded within the document at this location within the document and that it should be 300 x 300 pixels in dimension. We also want to be able to refer to it as the R variable population. And finally, we want the initial display on the device to be a normal density created with the R expression.

It is the parser that will see this HTML text and we need to help it to understand what we want to have happen. To this end, we have some facilities in R that make this relatively easy. But they sit on lower-level facilities provided by wxWidgets. Let's look at these lower-level facilities first and then see how we have made it slightly simpler in R.

When we create a wxHtmlWindow object, we can ask it for the associated HTML parser that is specific to that window. The method wxHtmlWindow_GetParser() does this for us:
  html = wxHtmlWindow(parent, wxID_ANY)
  parser = html$GetParser()

Now, we want to tell the parser that whenever it sees the tag <Rdevice>, it should call a function we give it to create the embedded object. The function will find all the specifications from the HTML element and its child nodes and create the graphics device and insert it into the target HTML window. The parser doesn't need to know anything about the tag or what the function does, but will simply hand control over to our function and expect things to be done for it. The function should return TRUE or FALSE to indicate success or failure.

We need to write the function and then tell the HTML parser to use it for each <Rdevice> node it sees. Let's assume we have written the function and called it RdeviceHTMLHandler() . Then, to connect it to the Rdevice nodes for the parser, we have to create and register a wxHtmlTagHandler. This is a C++ class that wxWidgets provides. The idea is that we create an instance of such a class with the name of the node that it can handle and then register it with the parser. When the parser encounters a node, it finds the relevant handler and calls its HandleTag() method. Now, we need it to call our R function, so our handler needs to be slightly different from a standard C++ handler. Its HandleTag() method in C++ needs to invoke our R function. To do this, we have a new C++ class named RwxHtmlWinTagHandler that inherits from wxHtmlTagHandler and provides a different implementation of the HandleTag() method. It just calls the R function that we specified when we create an instance of this RwxHtmlWinTagHandler class. So we create an instance of this new type of handler and then add it to the parser. We do these two steps with the R code
handler = RwxHtmlWinTagHandler("Rdevice", RdeviceHTMLHandler)
parser$AddTagHandler(handler)

Now, when the parser encounters an <Rdevice> node in the HTML document, it will call our RdeviceHTMLHandler() .

So what should this handler function look like? Firstly, it will be called with three arguments: the handler object itself that we created, the object representing the HTML tag that we are to process, and lastly the parser object. So our function should be defined as
RdeviceHTMLHandler =
function(h, tag, parser)
{

}

We typically don't make much use of the handler. It is the tag and the parser we work on.

Now, what should this function do. It should create a new graphics device that is embedded within the HTML document. We use the RwxDevice package for this, so we need to ensure that it is loaded via a call library(RwxDevice).

Next, we create the device via a call to RwxCanvas() . That function needs the parent widget for the new device canvas and the parent should be the HTML window associated with the parser. We don't have easy access to that, but we can access it via the parser with the call parser$GetWindow(). So we can create our new canvas for the device with
 canvas = RwxCanvas(parser$GetWindow())

We'll come back to providing information about the size of the canvas.

After we have created the canvas on which R might draw plots, we need to tell R that it can be used as a regular graphics device. We call the function asWxDevice() to do this, passing it the newly created canvas object.

And the last step is to put the canvas into the appropriate place in the HTML document. The canvas has the HTML window as its parent, but it doesn't know where to locate itself. That is the job of the layout of the document. So, we call insertEmbeddedComponent() , giving it the canvas and the parser. It then arranges to put the widget into the right place.

At this point, we have the basic graphics device. The code for the handler is
RdeviceHTMLHandler =
function(h, tag, parser)
{
 library(RwxDevice)
 canvas = RwxCanvas(parser$GetWindow())
 asWxDevice(canvas)
 insertEmbeddedComponent(canvas, parser)

 TRUE
}

Note that we return TRUE to indicate that we successfully processed the tag. If we wanted, we could return the canvas object and the internal handler code would take care of calling insertEmbeddedComponent() (or doing it internally, actually).

The only piece that we have omitted is that we have not dealt with the width and size attributes or the name, and we also want to process the R code within the <Rdevice>. Let's start with dimensions.

We need to ask the HTML tag object (tag) whether it has a width or a height attribute. The tag object is an instance of the wxHtmlTag class in wxWidgets and has several methods for accessing its information. See the documentation. We can use tag$HasParam("width") to see if it has an attribute/parameter named "width". If this returns TRUE, we get the value with tag$GetParam("width") and the coerce it to an integer. Since the values are expected to be numbers, we can also use getParamNumber() and provide a default value and a method to coerce the string value if it is present to the target type, an integer.
getParamNumber(tag, "width", -1, as.integer)

Note that when we use -1 for a size dimension, wxWidgets will understand that as a default value and determine the correct value rather than interpreting that value literally as a dimension!

So we can change our function slightly to use any width and height attributes as
 sz = c(getParamNumber(tag, "width", -1), getParamNumber(tag, "height", -1))
 canvas = RwxCanvas(parser$GetWindow(), size = sz)

We can deal with a name attribute also by checking if it was provided in the HTML tag and if so, accessing its value.
 if(tag$HasParam("name")) 
    name = tag$GetParam("name")

Now the question is what we do with it. The intent is that we assign the value of the local canvas variable to a globally accessible variable identified by the value of the "name" attribute, e.g. population in our example. We do this in our function as
 if(tag$HasParam("name")) 
    assign(tag$GetParam("name"), canvas, globalenv())

Precisely where we want the assignment to be done, i.e. in which environment or symbol table is something we will talk about much later. Our code above puts it into our work session. If we have two HTML windows each of which has a device with the same name, e.g. displaying the same document, we will have problems. So we need to allow each window have its own private space for these variables and arrange for the code to look for them appropriately.

The last bit of work we have to do is to collect up the text within the <Rdevice> node and treat it as an R command. The code might be to produce an R plot, or might do some behind the scenes work such as registering an event handler on the device, etc. From the point of view of our tag handler function, we don't care what the code does; we just want to evaluate it as a regular R command. But it is not being typed at the R prompt or source()'d in from a file. So we need a new mechanism.

Firstly, we can get the text in the tag using
 txt = tag$GetContent(parser)

This a convenience function provided by RwxWidgets which does several low-level operations. The result is that, in our example, txt contains the string 'curve(rnorm(x, 0, 1), -3, 3, col = "red")'

We want to evaluate this as if it were typed at the R prompt. To do this, we must first parse it to verify that it is a legal command and to turn it into something R can evaluate. Then we can evaluate it. We do this with the code
 
  expr = parse(text = txt)
  eval(expr, globalenv())

Note that we have to tell R "where" to evaluate the expression and we use globalenv() for convenience. This controls to what variables this expression can refer. For example, if it needed to see the canvas variable, it would not be able to see it as that is local to the particular call of our handler function. But there are ways to tell eval() where it should evaluate the expression so that it could see the variables in our function call. But then we would have to agree with the users about the names for identifying the different variables. In this case, if the user wants to refer to the RwxCanvas object, she should use a "name" attribute and we should assign the object to that name before evaluating the code in the body of the tag.

Let's put all this together.
RdeviceHTMLHandler =
function(h, tag, parser)
{
 library(RwxDevice)

 sz = c(getParamNumber(tag, "width", -1), getParamNumber(tag, "height", -1))
 canvas = RwxCanvas(parser$GetWindow(), size = sz)

 asWxDevice(canvas)
 insertEmbeddedComponent(canvas, parser)

 if(tag$HasParam("name")) 
    assign(tag$GetParam("name"), canvas, globalenv())

 txt = tag$GetContent(parser)
 if(nchar(txt)) {
    expr = parse(text = txt)
    eval(expr, globalenv())
 }

 TRUE
}

Then, we register this with the parser as
  html = wxHtmlWindow(parent, wxID_ANY)
  parser = html$GetParser()

  handler = RwxHtmlWinTagHandler("Rdevice", RdeviceHTMLHandler)
  parser$AddTagHandler(handler)

There is a slightly simplified way to do the last part of this. The function createHtmlViewer() in RwxWidgets arranges to create an HTML windget and load a document. And it also takes a list of tag handlers. We can use this as
createHtmlViewer("myDoc.html", win, 
                  tagHandlers = htmlTagHandlers(list(Rdevice = RdeviceHTMLHandler)))

It doesn't save us much effort, but is somewhat convenient.

The OBJECT approach

If we were to be well-behaved HTML citizens and create proper HTML, we would use the OBJECT tag to identify our R graphics device. Our HTML node would look like



<OBJECT type="app/x-R-device" width="300" height="300">
 <PARAM name="init" value="curve(rnorm(x, 0, 1), -3, 3, col = 'red')"/>
</OBJECT>

The same information is present but the type of the object is now no longer in the tag name but in the type attribute. And the initialization code is explicitly in a child node named PARAM with name and value attributes. This is all very general and so HTML can support arbitrary embedded OBJECTs, but it is not necessarily very convenient.

This generality means that an HTML parser may have to deal with numerous object types. So we can provide a simple tag parser for the generic OBJECT tag, and then find the value of the type attribute. Then we can use that to find the relevant tag handler function for the tag (OBJECT, type) pairing. We provide that in our createHtmlViewer() function via the tagHandlers argument and the htmlTagHandlers() function. If we have a handler function, say named foo, to handle this (OBJECT, "app/x-R-device") pairing, we can register it to be called with
createHtmlViewer("myDoc.html", win, 
                 tagHandlers = htmlTagHandlers(objectHandlers = c(defaultObjectHandlers(), 
                                                                  "app/x-R-device" = foo)))

Now that we can arrange to be invoked, how do we actually perform the processing of the node. Again, we are given the tag and the parser and we have to create the graphics device, etc. That code is essentially unchanged. The only potentially difficult aspect is how we process the <PARAM> sub-node so that we can evaluate the initialization code. When we had this in the <Rdevice> tag, we specified the format as free-flowing text. Now we are dealing with a structured HTML node as the content or inner part of the tag we are handling. So we need to deal with it more carefully; we can't just suck it up as raw text.

There are two ways to go about this. The first approach is to walk the children (just one in this case) and process the sub-nodes recursively. We'll assume that the variable tag is the top-level <OBJECT> node. We ask it for its children with the function wxHtmlTag_GetChildren() .
 kids = tag$GetChildren(TRUE)

This returns a list with an element for each direct child under this tag. Then, we can process each of those. In our example, we will have a single node corresponding to the <PARAM> node. Then, we can access its attributes using
 if(kids[[1]]$GetParam("name") == "call")
     eval(parse(text = kids[[1]]$GetParam("value")))

The approach of recursively processing the sub-nodes is perfectly natural. It is a little cumbersome if we already have general top-level tag handlers for nodes that might appear as sub-nodes. For example, suppose we register tag handlers for "button" and within our Rdevice node we also allow a "button" node. If we are recursively processing the nodes by hand, then we have to replicate the tag handler code or arrange to call our original handler's function. A different approach would be to tell the parser to continue to process all of the sub-nodes under this node that we are currently handling and to stop when it has finished just those nodes. The parser would then do this using the registered handlers and we would get control back at the end of that sub-parsing step. If we arrange for those general handlers to store their information somewhere that we can access, then we have essentially picked up all the information from the sub-nodes without having to manually navigate the nodes. The way we case this sub-parsing to happen is by calling the tag handler's ParseInner() method on the specified tag.
function(handler, tag, parser)
{
   handler$ParseInner(tag)
}

The code in htmlFormTagHandlers() provides an example for using this approach, in particular the handlers for the tags <FORM> and <SELECT>.

Connecting the pieces

So now we have seen how to add new tag handlers for the HTML parser and we have been able to provide a way for creating the R graphics device using <Rdevice> or <OBJECT> nodes in the HTML. Our example was to provide two different choice menus, one for the choice of distribution and one for the statistic of interest, and a text control for specifying the sample size. We can do this entirely with form elements since the type of controls are quite simple and available in the HTML form specifications. The HTML is given by Now, when the user clicks on the "Simulate" button, the form is "submitted" with the relevant values. If there is an R expression provided as an attribute for the <form> node, then that expression is evaluated with three variables made available to it as form, params and values. The first of these gives the object representing the form and its components, default values, etc. The second may not be set at this point and is character vector. And the last of these gives us a list of the name-value pairs for the selected settings. So, we can add an "onsubmit" attribute which contains the R code to do the simulation and update the sample histogram.
 n = as.integer(values[["sampleSize"]])
 f = populations[[values [["distribution"]] ]]
 stat = statistics[[ values [[ "statistic" ]] ]]
 x = replicate(1000, stat(f(n)))
 sampleDist$hist(x)