12lever.tex

% !Rnw root = learnR.Rnw

\input{preamble}


\section{Manipulation of Language Constructs}

Language structures can be manipulated, just like any other object.
Below, we will show how formulae, expressions, and argument lists
for functions, can be pasted together.

\subsection{Manipulation of Formulae}

Formulae are a key idea in R, though their implementation is
incomplete. They are widely available for specifying graphs,
models and tables. Details will be given below.

\subsection*{Model, graphics and table formulae}
We demonstrate the construction of model or graphics formulae from
text strings. The following plots the column \txtt{mpg}, from the data
frame \txtt{mtcars} (\textit{MASS}), against \txtt{disp}:
\begin{Schunk}
\begin{Sinput}
plot(mpg ~ disp, data=mtcars)
\end{Sinput}
\end{Schunk}
The following gives the same result:
\begin{Schunk}
\begin{Sinput}
yvar <- "mpg"
xvar <- "disp"
form <- as.formula(paste(yvar, "~", xvar))
plot(form, data=mtcars)
\end{Sinput}
\end{Schunk}
With this second approach, \txtt{yvar} and \txtt{xvar}
can be arguments to a function, and \txtt{xvar} and
\txtt{yvar} can be any pair of columns.  A suitable
functionis:
\begin{Schunk}
\begin{Sinput}
plot.mtcars <- function(xvar="disp", yvar="mpg"){
    form <- as.formula(paste(yvar, "~", xvar))
    plot(form, data=mtcars)
}
\end{Sinput}
\end{Schunk}

\begin{marginfigure}
The data frame \txtt{mtcars} has 11 columns from which
the two axes for a scatterplot might be chosen:

\begin{Schunk}
\begin{Sinput}
names(mtcars)
\end{Sinput}
\begin{Soutput}
 [1] "mpg"  "cyl" 
 [3] "disp" "hp"  
 [5] "drat" "wt"  
 [7] "qsec" "vs"  
 [9] "am"   "gear"
[11] "carb"
\end{Soutput}
\end{Schunk}

\end{marginfigure}

The following calls the function with \txtt{xvar="hp"} and
\txtt{yvar="mpg"}:
\begin{Schunk}
\begin{Sinput}
plot.mtcars(xvar="hp", yvar="mpg", data=mtcars)
\end{Sinput}
\end{Schunk}

\subsection{Extraction of names from a formula}

Use the function \txtt{all.vars()} to extract the variable names from
a formula, thus:
\begin{Schunk}
\begin{Sinput}
all.vars(mpg ~ disp)
\end{Sinput}
\begin{Soutput}
[1] "mpg"  "disp"
\end{Soutput}
\end{Schunk}

As well as using a formula to specify the graph, the following gives
more informative $x$- and $y$-labels:
\begin{fullwidth}
\begin{Schunk}
\begin{Sinput}
plot.mtcars <- function(form = mpg ~ disp){
   yvar <- all.vars(form)[1]
   xvar <- all.vars(form)[2]
   ## Include information that allows a meaningful label
   mtcars.info <-
     c(mpg= "Miles/(US) gallon",        cyl= "Number of cylinders",
       disp= "Displacement (cu.in.)",   hp= "Gross horsepower",
       drat= "Rear axle ratio",         wt= "Weight (lb/1000)",
       qsec= "1/4 mile time",           vs= "V/S",
       gear= "Number of forward gears",
       carb= "Number of carburettors",
       am= "Transmission (0 = automatic, 1 = manual)")
   xlab <- mtcars.info[xvar]
   ylab <- mtcars.info[yvar]
   plot(form, xlab=xlab, ylab=ylab)
}
\end{Sinput}
\end{Schunk}
\end{fullwidth}

\section{Function Arguments and Environments}

\subsection{Extraction of arguments to functions}

A simple use of \txtt{substitute()} is to extract a text string
representation of a function argument:
\begin{Schunk}
\begin{Sinput}
plot.mtcars` <-
  function(x = disp, y = mpg){
    xvar <- deparse(substitute(x))
    yvar <- deparse(substitute(y))
    form <- formula(paste(yvar, "~", xvar))
    plot(form, xlab=xvar, ylab=yvar, data=mtcars)
  }
\end{Sinput}
\end{Schunk}

\subsection{Use of a list to pass parameter values}
The following are equivalent:


Use of \txtt{do.call()} allows the parameter list to be set up in
advance of the call. The following shows the use of \txtt{do.call()}
to achieve the same effect as \verb!mean(possum$totlngth)!:
\begin{Schunk}
\begin{Sinput}
do.call("mean", list(x=possum$totlngth))
\end{Sinput}
\end{Schunk}
This makes more sense in a function, thus:
\begin{Schunk}
\begin{Sinput}
`average` <-
  function(x=possum$chest, FUN=function(x)mean(x)){
    fun <- deparse(substitute(FUN))
    do.call(fun, list(x=x))
  }
\end{Sinput}
\end{Schunk}

This allows, e.g., the following:
\begin{Schunk}
\begin{Sinput}
average()
average(FUN=median)
\end{Sinput}
\end{Schunk}

Note also \txtt{call()}, which sets up an unevaluated expression.
The expression can be evaluated at some later time, using \txtt{eval()}.
Here is an example:
\begin{Schunk}
\begin{Sinput}
mean.call <- call("mean", x=rnorm(5))
eval(mean.call)
\end{Sinput}
\begin{Soutput}
[1] 0.06572
\end{Soutput}
\begin{Sinput}
eval(mean.call)
\end{Sinput}
\begin{Soutput}
[1] 0.06572
\end{Soutput}
\end{Schunk}
Notice that the argument \txtt{x} was evaluated when \txtt{call()}
was evoked. The result is therefore unchanged upon repeating the
call \txtt{eval(mean.call)}. This can be verified by printing out the
expression:
\begin{fullwidth}

\begin{Schunk}
\begin{Sinput}
mean.call
\end{Sinput}
\begin{Soutput}
mean(x = c(-0.654951646429221, -0.679896362331012, 0.979408998110931, 
1.01194857431004, -0.327894174927872))
\end{Soutput}
\end{Schunk}

\end{fullwidth}

\subsection{Function environments}
Every call to a function creates a frame that contains the local
variables created in the function. This combines with the environment
in which the function was defined to create a new environment.
The global environment, \txtt{.Globalenv}, is the workspace.  This
is frame 0. The frame number increases by 1 with each new function
call.\sidenote[][-2cm]{Additionally, frames may be referred to by
name. Use
\begin{list}{}{
\setlength{\itemsep}{1pt}
\setlength{\parsep}{1pt}}
\item[] \txtt{sys.nframe()} to get the number of the
current evaluation frame

\item[] \txtt{sys.frame(sys.nframe())} to identify the frame by
name

\item[] \txtt{sys.parent()} to get the number of the parent frame.
\end{list}
}

\begin{Schunk}
\begin{Soutput}
[1] "test"
\end{Soutput}
\end{Schunk}

\begin{marginfigure}
Now change the function name to \margtt{newtest()}:
\begin{Schunk}
\begin{Sinput}
newtest <- test
newtest()
\end{Sinput}
\begin{Soutput}
[1] "newtest"
\end{Soutput}
\end{Schunk}
\end{marginfigure}
Here is code that determines, from within a function,
the function name:
\begin{Schunk}
\begin{Sinput}
test <- function(){
 fname <- as(sys.call(sys.parent())[[1]],
             "character")
  fname
}
test()
\end{Sinput}
\begin{Soutput}
[1] "test"
\end{Soutput}
\end{Schunk}

When a number of graphs are required, all for the one dociment, a
sequential naming system, e.g., \txtt{fig1()}, \txtt{fig2()}, \ldots,
may be convenient, with matching names \textbf{fig1.pdf},
\textbf{fig2.pdf}, \ldots for the respective graphics files.  The
following function \txtt{gf()} generates the file name automatically,
for passing to the graphics device that is opened.
\begin{Schunk}
\begin{Sinput}
gf <-
    function(width=2.25, height=2.25, pointsize=8){
        funtxt <- sys.call(1)
        fnam <- paste0(funtxt, ".pdf")
        print(paste0("Output is to the file '",
                     fnam, "'"))
        pdf(file=fnam, width=width, height=height,
            pointsize=pointsize)
    }
\end{Sinput}
\end{Schunk}

Now create a function that calls \txtt{gf()}:
\begin{Schunk}
\begin{Sinput}
fig1 <- function(){
    gf()             # Call with default parameters
    curve(sin, -pi, 2*pi)
    dev.off()
}
fig1()
\end{Sinput}
\end{Schunk}
\noindent
Output goes to the file \textbf{fig1.pdf}.  For a function
\txtt{fig2()} that calls \txtt{gf()}, output goes to the file
\textbf{fig2.pdf}, and so on.

\subsection*{Scoping of object names}
Local objects are those that are created within the body of the
function.  Objects that are not local and not passed as parameters are
first searched for in the frame of the function, then in the parent
frame, and so on. If they are not found in any of the frames, then
they are sought in the search list.

\section{Creation of R Packages}
\marginnote[10pt]{The RStudio documentation includes a large amount
of information on package preparation, testing, and submission to
CRAN or other repositories.  Click on\\
\underline{Help} | \underline{RStudio Docs}\\
\noindent and look under\\
\underline{PACKAGE DEVELOPMENT}.}
Much of the functionality of R, for many important tasks, comes from
the packages that are built on top of base R. Users who make extenive
use of R may soon find a need to document and organize both their
own functions and associated data. Packages are the preferred vehicle
for making functions and/or data available to others, or for use by
posterity.

Organisation of data and functions into a package may have the
following benefits:
\begin{itemize}
\item Where the package relates to a project, it should be straightforward
to return to the project at some later time, and/or to pass the project
across to someone else.
\item Attaching the packages give immediate access to functions, data
  and associated documentation.
\item Where a package is submitted to CRAN (Comprehensive R Archive
  Network) and used by others, this extends opportunities for testing
  and/or getting contributions from other workers. Checks that are
  required by CRAN ensure that the package (code and documentation)
  meets certain formal standards.  CRAN checks include checks for
  consistency between code and documentation, e.g., in names of
  arguments.  Code must conform to CRAN standards.
\end{itemize}

\subsection*{Namespaces}
Packages can have their own namespaces, with private functions and
classes that are not ordinarily visible from the command line, or from
other packages.  For example, the function \txtt{intervals.lme()}
that is part of the \textit{lme} package must be called via the generic
function \txtt{intervals()}.

\section{S4 Classes and Methods}\label{sec:s4}
There are two implementations of classes and methods -- those of
version 3 of the S language (S3), and those of version 4 of the S
language (S4).  The \textit{methods} package supplies the
infrastructure for the S4 implementation. This extends the abilities
available under S3, builds in checks that are not available with S3,
and are is conducive to good software engineering practice.  The
Bioconductor bundle of packages makes extensive use of S4 style
classes and methods. See \txtt{help(Methods)} (note the upper case M)
for a brief overview of S4 classes and methods.

Where available, extractor functions should be used to extract slot
contents. If this is not possible, use the function
\txtt{slotNames()} to obtain the names of the slots, and either the
function \txtt{slot()} or the operator \verb!@! to extract or
replace a slot.  For example:
\begin{Schunk}
\begin{Sinput}
library(DAAG)
library(lme4)
\end{Sinput}
\begin{Sinput}
hp.lmList <- lmList(o2 ~ wattsPerKg | id,
                    data=humanpower1)
slotNames(hp.lmList)
\end{Sinput}
\begin{Soutput}
[1] ".Data"     "call"      "pool"      "groups"   
[5] "origOrder"
\end{Soutput}
\end{Schunk}

The following are alternative ways to display the contents of the
\txtt{"call"} slot:
\begin{fullwidth}
\begin{Schunk}
\begin{Sinput}
hp.lmList@call
\end{Sinput}
\begin{Soutput}
lmList(formula = o2 ~ wattsPerKg | id, data = humanpower1)
\end{Soutput}
\begin{Sinput}
slot(hp.lmList, "call")
\end{Sinput}
\begin{Soutput}
lmList(formula = o2 ~ wattsPerKg | id, data = humanpower1)
\end{Soutput}
\end{Schunk}
\end{fullwidth}

Where available, use an extractor function to extract some relevant
part of the output, thus:
\begin{Schunk}
\begin{Sinput}
coef(hp.lmList)
\end{Sinput}
\begin{Soutput}
  (Intercept) wattsPerKg
1      -1.155      15.35
2       1.916      13.65
3     -12.008      18.81
4       8.029      11.83
5      11.553      10.36
\end{Soutput}
\end{Schunk}

For moderately simple examples of the definition and use of S4 classes
and methods, see \txtt{help(setClass)} and \txtt{help(setMethod)}.

How is it possible to identify, for a particular S4 class, the
function that implements a method.  To identify the function
in the \txtt{sp} package that implements the \txtt{spplot} method
for \txtt{SpatialGridDataFrame} objects, type:
\begin{Schunk}
\begin{Sinput}
library(sp)
selectMethod("spplot",
             signature="SpatialGridDataFrame")
\end{Sinput}
\begin{Soutput}
Method Definition:

function (obj, ...) 
spplot.grid(as(obj, "SpatialPixelsDataFrame"), ...)
<bytecode: 0x1166d7098>
<environment: namespace:sp>

Signatures:
        obj                   
target  "SpatialGridDataFrame"
defined "SpatialGridDataFrame"
\end{Soutput}
\end{Schunk}
This makes it clear that the \txtt{spplot} method for
\txtt{SpatialGridDataFrame} objects calls the function
\txtt{spplot.grid()}. To display the function \txtt{spplot.grid()},
type:
\begin{Schunk}
\begin{Sinput}
getFromNamespace("spplot.grid", ns="sp")
\end{Sinput}
\end{Schunk}
\noindent
Alternatively, use the less targeted
\txtt{getAnywhere("spplot.grid")}.

Use \txtt{showMethods()} to show all the methods for one or more
classes of object.  For example:
\begin{Schunk}
\begin{Sinput}
showMethods(classes='SpatialGridDataFrame')
\end{Sinput}
\end{Schunk}

\section{Summary}
\begin{itemize}
\item[] Language structures (formulae and expressions) can be manipulated,
just like any other object.

\item[] R uses formulae to specify models, graphs and (\txtt{xtabs()}
only) tables.

\item[] The expression syntax allows the plotting of juxtaposed text
strings, which may include mathematical text.

\item[] All evaluations have an environment that determines what
  objects will be visible. This can be especially important for the
writing and testing of functions.

\item[] Packages are the preferred vehicle for making substantial
  collections of functions and/or data available to others, or for use
  by posterity.  They facilitate re-use of code and enforce checks for
  common inconsistencies. They make it straighforward to enforce high
  standards of documentation.

\item[] Many of R's more recent packages use S4 classes and methods.
Extractor functions are available that will extract the most 
commonly required types of information.

\end{itemize}