An XML package for the S language

Last Release: 3.98-1 (Tue Jun 18 19:04:03 PDT 2013)

Note: In version 2.4-0, there is a new approach to garbage collecting internal/C-level nodes and documents returned from, e.g. xmlParse(), getNodeSet(), xpathApply(), newXMLNode(). This endeavors to avoid freeing a document when there is an R variable refering to one of its nodes, and to garbage collect a document when all nodes are unreferenced. This has been tested and appears to work, however there may be some cases that we have not encountered. So if you encounter problems, please send me email.

This package provides facilities for the S language to

It is an interface to the libxml2 library. It can be combined with the RCurl package for parsing documents that require more involved HTTP requests to fetch the document.

Download

The source for the S package can be downloaded as XML_3.98-1.tar.gz.

There is also a Windows version available from the Omegahat repository. Use

install.packages("XML", repos = "http://www.omegahat.org/R")

Documentation

  • Best practices for using the XML package
    PDF version.
  • A short overview: HTML, PDF
  • A brief introduction to parsing XML in R: HTML, PDF
  • A reasonably detailed overview of the package and what we might use XML for.
  • A manual in and a quick guide to the package (PDF).
  • A short overview of the package.
  • Brief and incomplete Notes on generating XML within S
  • FAQ for the package.
  • Changes to the packages (by release).
  • Examples of Reading Generic XML files

  • XML form of plist (property list) files (e.g. property lists on OS X, old iTunes databases)
    keyValueDB.R
    library(XML)
    source(url("http://www.omegahat.org/RSXML/keyValueDB.R"))
    o = readKeyValueDB("http://www.omegahat.org/RSXML/plist.xml")      
    
  • XML "solr" files that are similar to JSON and name-value pairs with nodes of the form
    
          A string
          103
          1000012310303
          true
          2011-02-10T11:29:03Z
    ]]>
    
    solrDocs.R
    library(XML)
    source(url("http://www.omegahat.org/RSXML/solrDocs.R"))
    o = readSolrDoc("http://www.omegahat.org/RSXML/solr.xml")      
    

  • Duncan Temple Lang <duncan@wald.ucdavis.edu>
    Last modified: Sun Dec 25 09:52:10 PST 2011