-
Notifications
You must be signed in to change notification settings - Fork 0
/
12lever.tex
479 lines (425 loc) · 13.7 KB
/
12lever.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
% !Rnw root = learnR.Rnw
\input{preamble}
\section{Manipulation of Language Constructs}
Language structures can be manipulated, just like any other object.
Below, we will show how formulae, expressions, and argument lists
for functions, can be pasted together.
\subsection{Manipulation of Formulae}
Formulae are a key idea in R, though their implementation is
incomplete. They are widely available for specifying graphs,
models and tables. Details will be given below.
\subsection*{Model, graphics and table formulae}
We demonstrate the construction of model or graphics formulae from
text strings. The following plots the column \txtt{mpg}, from the data
frame \txtt{mtcars} (\textit{MASS}), against \txtt{disp}:
\begin{Schunk}
\begin{Sinput}
plot(mpg ~ disp, data=mtcars)
\end{Sinput}
\end{Schunk}
The following gives the same result:
\begin{Schunk}
\begin{Sinput}
yvar <- "mpg"
xvar <- "disp"
form <- as.formula(paste(yvar, "~", xvar))
plot(form, data=mtcars)
\end{Sinput}
\end{Schunk}
With this second approach, \txtt{yvar} and \txtt{xvar}
can be arguments to a function, and \txtt{xvar} and
\txtt{yvar} can be any pair of columns. A suitable
functionis:
\begin{Schunk}
\begin{Sinput}
plot.mtcars <- function(xvar="disp", yvar="mpg"){
form <- as.formula(paste(yvar, "~", xvar))
plot(form, data=mtcars)
}
\end{Sinput}
\end{Schunk}
\begin{marginfigure}
The data frame \txtt{mtcars} has 11 columns from which
the two axes for a scatterplot might be chosen:
\begin{Schunk}
\begin{Sinput}
names(mtcars)
\end{Sinput}
\begin{Soutput}
[1] "mpg" "cyl"
[3] "disp" "hp"
[5] "drat" "wt"
[7] "qsec" "vs"
[9] "am" "gear"
[11] "carb"
\end{Soutput}
\end{Schunk}
\end{marginfigure}
The following calls the function with \txtt{xvar="hp"} and
\txtt{yvar="mpg"}:
\begin{Schunk}
\begin{Sinput}
plot.mtcars(xvar="hp", yvar="mpg", data=mtcars)
\end{Sinput}
\end{Schunk}
\subsection{Extraction of names from a formula}
Use the function \txtt{all.vars()} to extract the variable names from
a formula, thus:
\begin{Schunk}
\begin{Sinput}
all.vars(mpg ~ disp)
\end{Sinput}
\begin{Soutput}
[1] "mpg" "disp"
\end{Soutput}
\end{Schunk}
As well as using a formula to specify the graph, the following gives
more informative $x$- and $y$-labels:
\begin{fullwidth}
\begin{Schunk}
\begin{Sinput}
plot.mtcars <- function(form = mpg ~ disp){
yvar <- all.vars(form)[1]
xvar <- all.vars(form)[2]
## Include information that allows a meaningful label
mtcars.info <-
c(mpg= "Miles/(US) gallon", cyl= "Number of cylinders",
disp= "Displacement (cu.in.)", hp= "Gross horsepower",
drat= "Rear axle ratio", wt= "Weight (lb/1000)",
qsec= "1/4 mile time", vs= "V/S",
gear= "Number of forward gears",
carb= "Number of carburettors",
am= "Transmission (0 = automatic, 1 = manual)")
xlab <- mtcars.info[xvar]
ylab <- mtcars.info[yvar]
plot(form, xlab=xlab, ylab=ylab)
}
\end{Sinput}
\end{Schunk}
\end{fullwidth}
\section{Function Arguments and Environments}
\subsection{Extraction of arguments to functions}
A simple use of \txtt{substitute()} is to extract a text string
representation of a function argument:
\begin{Schunk}
\begin{Sinput}
plot.mtcars` <-
function(x = disp, y = mpg){
xvar <- deparse(substitute(x))
yvar <- deparse(substitute(y))
form <- formula(paste(yvar, "~", xvar))
plot(form, xlab=xvar, ylab=yvar, data=mtcars)
}
\end{Sinput}
\end{Schunk}
\subsection{Use of a list to pass parameter values}
The following are equivalent:
Use of \txtt{do.call()} allows the parameter list to be set up in
advance of the call. The following shows the use of \txtt{do.call()}
to achieve the same effect as \verb!mean(possum$totlngth)!:
\begin{Schunk}
\begin{Sinput}
do.call("mean", list(x=possum$totlngth))
\end{Sinput}
\end{Schunk}
This makes more sense in a function, thus:
\begin{Schunk}
\begin{Sinput}
`average` <-
function(x=possum$chest, FUN=function(x)mean(x)){
fun <- deparse(substitute(FUN))
do.call(fun, list(x=x))
}
\end{Sinput}
\end{Schunk}
This allows, e.g., the following:
\begin{Schunk}
\begin{Sinput}
average()
average(FUN=median)
\end{Sinput}
\end{Schunk}
Note also \txtt{call()}, which sets up an unevaluated expression.
The expression can be evaluated at some later time, using \txtt{eval()}.
Here is an example:
\begin{Schunk}
\begin{Sinput}
mean.call <- call("mean", x=rnorm(5))
eval(mean.call)
\end{Sinput}
\begin{Soutput}
[1] 0.06572
\end{Soutput}
\begin{Sinput}
eval(mean.call)
\end{Sinput}
\begin{Soutput}
[1] 0.06572
\end{Soutput}
\end{Schunk}
Notice that the argument \txtt{x} was evaluated when \txtt{call()}
was evoked. The result is therefore unchanged upon repeating the
call \txtt{eval(mean.call)}. This can be verified by printing out the
expression:
\begin{fullwidth}
\begin{Schunk}
\begin{Sinput}
mean.call
\end{Sinput}
\begin{Soutput}
mean(x = c(-0.654951646429221, -0.679896362331012, 0.979408998110931,
1.01194857431004, -0.327894174927872))
\end{Soutput}
\end{Schunk}
\end{fullwidth}
\subsection{Function environments}
Every call to a function creates a frame that contains the local
variables created in the function. This combines with the environment
in which the function was defined to create a new environment.
The global environment, \txtt{.Globalenv}, is the workspace. This
is frame 0. The frame number increases by 1 with each new function
call.\sidenote[][-2cm]{Additionally, frames may be referred to by
name. Use
\begin{list}{}{
\setlength{\itemsep}{1pt}
\setlength{\parsep}{1pt}}
\item[] \txtt{sys.nframe()} to get the number of the
current evaluation frame
\item[] \txtt{sys.frame(sys.nframe())} to identify the frame by
name
\item[] \txtt{sys.parent()} to get the number of the parent frame.
\end{list}
}
\begin{Schunk}
\begin{Soutput}
[1] "test"
\end{Soutput}
\end{Schunk}
\begin{marginfigure}
Now change the function name to \margtt{newtest()}:
\begin{Schunk}
\begin{Sinput}
newtest <- test
newtest()
\end{Sinput}
\begin{Soutput}
[1] "newtest"
\end{Soutput}
\end{Schunk}
\end{marginfigure}
Here is code that determines, from within a function,
the function name:
\begin{Schunk}
\begin{Sinput}
test <- function(){
fname <- as(sys.call(sys.parent())[[1]],
"character")
fname
}
test()
\end{Sinput}
\begin{Soutput}
[1] "test"
\end{Soutput}
\end{Schunk}
When a number of graphs are required, all for the one dociment, a
sequential naming system, e.g., \txtt{fig1()}, \txtt{fig2()}, \ldots,
may be convenient, with matching names \textbf{fig1.pdf},
\textbf{fig2.pdf}, \ldots for the respective graphics files. The
following function \txtt{gf()} generates the file name automatically,
for passing to the graphics device that is opened.
\begin{Schunk}
\begin{Sinput}
gf <-
function(width=2.25, height=2.25, pointsize=8){
funtxt <- sys.call(1)
fnam <- paste0(funtxt, ".pdf")
print(paste0("Output is to the file '",
fnam, "'"))
pdf(file=fnam, width=width, height=height,
pointsize=pointsize)
}
\end{Sinput}
\end{Schunk}
Now create a function that calls \txtt{gf()}:
\begin{Schunk}
\begin{Sinput}
fig1 <- function(){
gf() # Call with default parameters
curve(sin, -pi, 2*pi)
dev.off()
}
fig1()
\end{Sinput}
\end{Schunk}
\noindent
Output goes to the file \textbf{fig1.pdf}. For a function
\txtt{fig2()} that calls \txtt{gf()}, output goes to the file
\textbf{fig2.pdf}, and so on.
\subsection*{Scoping of object names}
Local objects are those that are created within the body of the
function. Objects that are not local and not passed as parameters are
first searched for in the frame of the function, then in the parent
frame, and so on. If they are not found in any of the frames, then
they are sought in the search list.
\section{Creation of R Packages}
\marginnote[10pt]{The RStudio documentation includes a large amount
of information on package preparation, testing, and submission to
CRAN or other repositories. Click on\\
\underline{Help} | \underline{RStudio Docs}\\
\noindent and look under\\
\underline{PACKAGE DEVELOPMENT}.}
Much of the functionality of R, for many important tasks, comes from
the packages that are built on top of base R. Users who make extenive
use of R may soon find a need to document and organize both their
own functions and associated data. Packages are the preferred vehicle
for making functions and/or data available to others, or for use by
posterity.
Organisation of data and functions into a package may have the
following benefits:
\begin{itemize}
\item Where the package relates to a project, it should be straightforward
to return to the project at some later time, and/or to pass the project
across to someone else.
\item Attaching the packages give immediate access to functions, data
and associated documentation.
\item Where a package is submitted to CRAN (Comprehensive R Archive
Network) and used by others, this extends opportunities for testing
and/or getting contributions from other workers. Checks that are
required by CRAN ensure that the package (code and documentation)
meets certain formal standards. CRAN checks include checks for
consistency between code and documentation, e.g., in names of
arguments. Code must conform to CRAN standards.
\end{itemize}
\subsection*{Namespaces}
Packages can have their own namespaces, with private functions and
classes that are not ordinarily visible from the command line, or from
other packages. For example, the function \txtt{intervals.lme()}
that is part of the \textit{lme} package must be called via the generic
function \txtt{intervals()}.
\section{S4 Classes and Methods}\label{sec:s4}
There are two implementations of classes and methods -- those of
version 3 of the S language (S3), and those of version 4 of the S
language (S4). The \textit{methods} package supplies the
infrastructure for the S4 implementation. This extends the abilities
available under S3, builds in checks that are not available with S3,
and are is conducive to good software engineering practice. The
Bioconductor bundle of packages makes extensive use of S4 style
classes and methods. See \txtt{help(Methods)} (note the upper case M)
for a brief overview of S4 classes and methods.
Where available, extractor functions should be used to extract slot
contents. If this is not possible, use the function
\txtt{slotNames()} to obtain the names of the slots, and either the
function \txtt{slot()} or the operator \verb!@! to extract or
replace a slot. For example:
\begin{Schunk}
\begin{Sinput}
library(DAAG)
library(lme4)
\end{Sinput}
\begin{Sinput}
hp.lmList <- lmList(o2 ~ wattsPerKg | id,
data=humanpower1)
slotNames(hp.lmList)
\end{Sinput}
\begin{Soutput}
[1] ".Data" "call" "pool" "groups"
[5] "origOrder"
\end{Soutput}
\end{Schunk}
The following are alternative ways to display the contents of the
\txtt{"call"} slot:
\begin{fullwidth}
\begin{Schunk}
\begin{Sinput}
hp.lmList@call
\end{Sinput}
\begin{Soutput}
lmList(formula = o2 ~ wattsPerKg | id, data = humanpower1)
\end{Soutput}
\begin{Sinput}
slot(hp.lmList, "call")
\end{Sinput}
\begin{Soutput}
lmList(formula = o2 ~ wattsPerKg | id, data = humanpower1)
\end{Soutput}
\end{Schunk}
\end{fullwidth}
Where available, use an extractor function to extract some relevant
part of the output, thus:
\begin{Schunk}
\begin{Sinput}
coef(hp.lmList)
\end{Sinput}
\begin{Soutput}
(Intercept) wattsPerKg
1 -1.155 15.35
2 1.916 13.65
3 -12.008 18.81
4 8.029 11.83
5 11.553 10.36
\end{Soutput}
\end{Schunk}
For moderately simple examples of the definition and use of S4 classes
and methods, see \txtt{help(setClass)} and \txtt{help(setMethod)}.
How is it possible to identify, for a particular S4 class, the
function that implements a method. To identify the function
in the \txtt{sp} package that implements the \txtt{spplot} method
for \txtt{SpatialGridDataFrame} objects, type:
\begin{Schunk}
\begin{Sinput}
library(sp)
selectMethod("spplot",
signature="SpatialGridDataFrame")
\end{Sinput}
\begin{Soutput}
Method Definition:
function (obj, ...)
spplot.grid(as(obj, "SpatialPixelsDataFrame"), ...)
<bytecode: 0x1166d7098>
<environment: namespace:sp>
Signatures:
obj
target "SpatialGridDataFrame"
defined "SpatialGridDataFrame"
\end{Soutput}
\end{Schunk}
This makes it clear that the \txtt{spplot} method for
\txtt{SpatialGridDataFrame} objects calls the function
\txtt{spplot.grid()}. To display the function \txtt{spplot.grid()},
type:
\begin{Schunk}
\begin{Sinput}
getFromNamespace("spplot.grid", ns="sp")
\end{Sinput}
\end{Schunk}
\noindent
Alternatively, use the less targeted
\txtt{getAnywhere("spplot.grid")}.
Use \txtt{showMethods()} to show all the methods for one or more
classes of object. For example:
\begin{Schunk}
\begin{Sinput}
showMethods(classes='SpatialGridDataFrame')
\end{Sinput}
\end{Schunk}
\section{Summary}
\begin{itemize}
\item[] Language structures (formulae and expressions) can be manipulated,
just like any other object.
\item[] R uses formulae to specify models, graphs and (\txtt{xtabs()}
only) tables.
\item[] The expression syntax allows the plotting of juxtaposed text
strings, which may include mathematical text.
\item[] All evaluations have an environment that determines what
objects will be visible. This can be especially important for the
writing and testing of functions.
\item[] Packages are the preferred vehicle for making substantial
collections of functions and/or data available to others, or for use
by posterity. They facilitate re-use of code and enforce checks for
common inconsistencies. They make it straighforward to enforce high
standards of documentation.
\item[] Many of R's more recent packages use S4 classes and methods.
Extractor functions are available that will extract the most
commonly required types of information.
\end{itemize}