forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathEnvironments.rmd
717 lines (534 loc) · 29 KB
/
Environments.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
---
title: Environments
layout: default
---
# Environments
<!-- Comment from reviewer: There's lots of signposting but I cannot say that I sense an underlying logical progression. Maybe there isn't one for this material. But I think it would benefit from an afternoon of contemplation - - can we interest the reader in the details of introducing capabilities of basic environment construction for a) very basic workspace management at the heart of interactive R, b) for the sake of reference semantics for large data objects (this was done in Bioconductor exprSet long
ago) and then c) towards more complicated things like active bindings, with various other things along the way perhaps. -->
## Introduction
Understanding environments is important to understanding scoping. This chapter will teach you:
* what environments are and how to inspect and manipulate them
* the four types of environments associated with a function
* how to work in an environment outside of a function with `local()`
* the four ways of binding names to values in an environment
While the [functions](#functions) chapter focusses on the essence of how scoping works, this chapter focuses on the details and shows you how you can implement the behaviour yourself. It also introduces ideas that will be useful for [metaprogramming](#metaprogramming).
This chapter uses many functions found in the `pryr` package to pry open R and look inside at the messy details. You can install `pryr` by running `devtools::install_github("hadley/pryr")`
## Environment basics
### What is an environment?
The job of an environment is to associate, or __bind__, a set of names to a set of values. Environments are the data structures that power scoping. An __environment__ is very similar to a list, with three important exceptions:
* Environments have reference semantics. So R's usual copy on modify rules do not apply. Whenever you modify an environment, you modify every copy.
In the following code chunk, we create a new environment, create a "copy" and then modify the original environment. Notice that the copy also changes. If you change `e` to a list (or any other R datastructure), `f` will become a list. `e` and `f` are identical.
```{r}
e <- new.env()
f <- e
e$a <- 10
f$a
```
* Environments have parents. If an object is not found in an environment, then R will look at its parent (and so on). There is only one exception: the __empty__ environment does not have a parent.
We use the metaphor of a family to refer to other environments. So the grandparent of a environment is the parent's parent, and ancestors include all parent environments all the way up to the empty environment. It's rare to talk about the children of an environment because there are no back links: given an environment we have no way to find its children.
* Every object in an environment must have a name. And, those names must be unique.
Technically, an environment is made up of a __frame__, a collection of named objects (like a list), and a reference to a parent environment.
As well as powering scoping, environments can also be useful data structures because they have reference semantics and can work like a hashtable. Unlike almost every other type of object in R, modification of environments takes place without a copy. This is not something that you should do without forethought since it violates users' expectations about how R code works. However, it can sometimes be critical for high performance code. That said, since the addition of [reference classes](#rc), you're generally better off using reference classes instead of raw environments. Environments can be used to simulate hashmaps common in other packages because name lookup is implemented with a hash internal , and hence is O(1). See the CRAN package `hash` for more development of this idea.
### Manipulating and inspecting environments
You can create environments with `new.env()`, see their contents with `ls()`, and inspect their parent with `parent.env()`.
```{r, eval = FALSE}
e <- new.env()
# the default parent provided by new.env() is environment from which it is called
parent.env(e)
#> <environment: R_GlobalEnv>
identical(e, globalenv())
#> [1] FALSE
ls(e)
#> character(0)
```
You can modify environments in the same way you modify lists:
```{r}
ls(e)
e$a <- 1
ls(e)
e$a
```
By default `ls` only shows names that don't begin with `.`. Use `all.names = TRUE` (or `all` for short) to show all bindings in an environment:
```{r}
e$.a <- 2
ls(e)
ls(e, all = TRUE)
```
Another useful technique to view an environment is to coerce it to a list:
```{r}
as.list(e)
str(as.list(e))
str(as.list(e, all.names = TRUE))
```
You can extract elements of an environment using `$` or `[[`, or `get`. While `$` and `[[` will only look within an environment, `get`, using the regular scoping rules, will also look in the parent if needed. `$` and `[[` will return `NULL` if the name is not found, while `get` returns an error.
```{r}
e$b <- 2
e$b
e[["b"]]
get("b", e)
```
Deleting objects from environments works a little differently from lists. With a list you can remove an entry by setting it to `NULL`. That doesn't work in environments. Instead you need to use `rm()`.
```{r}
e <- new.env()
e$a <- 1
e$a <- NULL
ls(e)
rm("a", envir = e)
ls(e)
```
Generally, when you create your own environment, you want to manually set the parent environment to the empty environment. This ensures you don't accidentally inherit objects from somewhere else:
```{r, error = TRUE}
x <- 1
e1 <- new.env()
get("x", e1)
e2 <- new.env(parent = emptyenv())
get("x", e2)
```
You can determine if a binding exists in a environment with the `exists()` function. Like `get()`, the default is to follow regular scoping rules and look in parent environments. If you don't want this behavior, use `inherits = FALSE`:
```{r}
exists("x", e1)
exists("x", e1, inherits = FALSE)
```
### Special environments
There are a few special environments that you can access directly:
* `globalenv()`: the user's workspace
* `baseenv()`: the environment of the base package
* `emptyenv()`: the ultimate ancestor of all environments, the only environment without a parent.
The most common environment is the global environment (`globalenv()`). It corresponds to the top-level workspace. The parent of the global environment is one of the packages you have loaded (the exact order will depend on the order in which packages were loaded). The penultimate parent will be the base environment, which is the environment of "base R" functionality. Its parent is the empty environment.
`search()` lists all environments between and including the global and base environments. This is called the search path because any object in these environments can be found from the top-level interactive workspace. It contains an environment for each loaded package and for each object (environment, list or Rdata file) that you've `attach()`ed. It also contains a special environment called `Autoloads` which is used to save memory by only loading package objects (like big datasets) when needed. You can access the environments of any environment on the search list using `as.environment()`.
```{r}
search()
as.environment("package:stats")
```
### Where
We can apply our new knowledge of environments to create a helpful function called `where` that tells us the environment where a variable is located:
```{r}
library(pryr)
where("where")
where("mean")
where("t.test")
x <- 5
where("x")
```
`where()` obeys the regular rules of variable scoping, but instead of returning the value associated with a name, it returns the environment in which it was defined.
The definition of `where()` is fairly straightforward. It has two arguments: the name to look for (as a string), and the environment in which to start the search. (We'll learn later why `parent.frame()` is a good default.)
```{r}
where
```
It's natural to work with environments recursively, so we'll see this style of function structure frequently. There are three main components:
* the base case (what happens when we've recursed up to the empty environment)
* a Boolean that determines if we've found what we wanted
* the recursive statement that re-calls the function using the parent of the current environment.
If we remove all the details of where, and just keep the structure, we get a function that looks like this:
```{r}
f <- function(..., env = parent.frame()) {
if (identical(env, emptyenv())) {
# base case
}
if (success) {
# return value
} else {
# inspect parent
f(..., env = parent.env(env))
}
}
```
Note that to check if the environment is the same as the empty environment, we need to use `identical()`. Unlike the element-wise `==`, this performs a whole object comparison.
It is also possible to write this function with a loop instead of with recursion. This might run slightly faster (because we eliminate some function calls), but I find it harder to understand what's going on. I include it because you might find it easier to see what's happening if you're less familiar with recursive functions.
```{r}
is.emptyenv <- function(x) identical(x, emptyenv())
f2 <- function(..., env = parent.frame()) {
while(!is.emptyenv(env)) {
if (success) {
# return value
return()
}
# inspect parent
env <- parent.env(env)
}
# base case
}
```
### Exercises
* Using `parent.env()` and a loop (or a recursive function), verify that the ancestors of `globalenv()` include `baseenv()` and `emptyenv()`. Use the same basic idea to implement your own version of `search()`.
* Write your own version of `get()` using a function written in the style of `where()`.
* Write a function called `fget()` that finds only function objects. It should have two arguments, `name` and `env`, and should obey the regular scoping rules for functions: if there's an object with a matching name that's not a function, look in the parent. (This function should be an equivalent of `match.fun()` extended to take a second argument). For an added challenge, also add an `inherits` argument which controls whether the function recurses up to the parents or only looks in one environment.
* Write your own version of `exists(inherits = FALSE)` (Hint: use `ls()`). Write a recursive version that behaves like `inherits = TRUE`.
## Function environments
Most of the time, you do not create environments directly. They are created as a consequence of working with functions. This section discuss the four types of environment associated with a function.
There are multiple environments associated with each function, and it's easy to get them confused.
* the environment where the function is created
* the environment where the function resides
* the environment created when a function is run
* the environment where a function is called
The following sections will explain why each of these environments are important, how to access them, and how you might use them.
### The environment where the function is created
When a function is created, it gains a reference to the environment where it was made. This is the parent, or enclosing, environment of the function used by lexical scoping. You can access this environment with the `environment()` function:
```{r, eval = FALSE}
y <- 1
f <- function(x) x + y
environment(f)
#> <environment: R_GlobalEnv>
environment(plot)
#> <environment: namespace:graphics>
environment(t.test)
#> <environment: namespace:stats>
```
To make an equivalent function that is safer (it throws an error if the input isn't a function), more consistent (it can take a function name as an argument not just a function), and more informative (it has a better name), we'll create `funenv()`:
```{r}
funenv <- function(f) {
f <- match.fun(f)
environment(f)
}
funenv("plot")
funenv("t.test")
```
Unsurprisingly, the enclosing environment is particularly important for closures:
```{r, eval = FALSE}
plus <- function(x) {
function(y) x + y
}
plus_one <- plus(1)
plus_one(10)
#> [1] 11
plus_two <- plus(2)
plus_two(10)
#> [1] 12
environment(plus_one)
#> <environment: 0x106f1e788>
parent.env(environment(plus_one))
#> <environment: R_GlobalEnv>
environment(plus_two)
#> <environment: 0x106e39c98>
parent.env(environment(plus_two))
#> <environment: R_GlobalEnv>
environment(plus)
#> <environment: R_GlobalEnv>
str(as.list(environment(plus_one)))
#> List of 1
#> $ x: num 1
str(as.list(environment(plus_two)))
#> List of 1
#> $ x: num 2
```
It's also possible to modify the environment of a function, using the assignment form of `environment`. This is rarely useful, but we can use it to illustrate how fundamental scoping is to R. One complaint that people sometimes make about R is that the function `f` defined above really should generate an error because there is no variable `y` defined inside of `f`. We could fix that by manually modifying the environment of `f` so it can't find y in the global environment:
```{r, error = TRUE}
f <- function(x) x + y
environment(f) <- emptyenv()
f(1)
```
But when we run it, we don't get the error we expect. Because R uses its scoping rules consistently for everything (including looking up functions), we get an error that `f` can't find the `+` function. (See the discussion in [scoping](#dynamic-lookup) for alternatives that actually work.)
### The environment where the function resides
The environment of a function and the environment where it resides might be different. In the example above, we changed the environment of `f` to be the `emptyenv()` but it still resides in the `globalenv()`:
```{r, eval = FALSE}
f <- function(x) x + y
funenv("f")
#> <environment: R_GlobalEnv>
where("f")
#> <environment: R_GlobalEnv>
environment(f) <- emptyenv()
funenv("f")
#> <environment: R_EmptyEnv>
where("f")
#> <environment: R_GlobalEnv>
```
The environment where the function lives determines how we find the function. The environment of the function determines how we find values inside the function. This important distinction is what enables package [namespaces](#namespaces) to work.
For example, take `t.test()`:
```{r}
funenv("t.test")
where("t.test")
```
We find `t.test()` in the `package::stats` environment, but its parent (where it looks up values) is the `namespace::stats` environment. The _package_ environment contains only functions and objects that should be visible to the user, but the _namespace_ environment contains both internal and external functions. There are over 400 objects that are defined in the `stats` package but which are not available to the user:
```{r}
length(ls(funenv("t.test")))
length(ls(where("t.test")))
```
This mechanism makes it possible for packages to have internal objects that can be accessed by its functions, but not by external functions.
### The environment created when a function is run
Recall how function scoping works. What will the following function return the first time we run it? What about the second?
```{r, eval = FALSE}
f <- function(x) {
if (!exists("a", inherits = FALSE)) {
message("Defining a")
a <- 1
} else {
a <- a + 1
}
a
}
```
In fact, it will return the same value each and every time it is called. This is because each time a function is called, a new environment is created to host execution.
Calling `environment()` with no arguments returns the environment in which the call was made, so we can use it to confirm that functions have new hosting environments at every invocation. We can also use `parent.env()` to see that these newly created environments all have the same parent environment, which is the environment where the function was created:
```{r, eval = FALSE}
f <- function(x) {
list(
e = environment(),
p = parent.env(environment())
)
}
str(f())
#> List of 2
#> $ e:<environment: 0x10528b5f0>
#> $ p:<environment: R_GlobalEnv>
str(f())
#> List of 2
#> $ e:<environment: 0x106aa7a70>
#> $ p:<environment: R_GlobalEnv>
funenv("f")
#> <environment: R_GlobalEnv>
```
### The environment where the function is called
Look at the following code. What do you expect `g()` to return when the code is run?
```{r, eval = FALSE}
f <- function() {
x <- 10
function() {
x
}
}
g <- f()
x <- 20
g()
```
The top-level `x` is a red herring: using the regular scoping rules, `g()` looks first where it is defined and finds the value of `x` is 10. However, it is still meaningful to ask what value `x` is associated with in the environment where `g()` is called. `x` is 10 in the environment where `g()` is defined, but it is 20 in the environment where `g()` is called.
We can access this environment using the confusingly named `parent.frame()`. This function returns the __environment__ where the function is called. We can also use this function to look up the value of names in that environment:
```{r}
f2 <- function() {
x <- 10
function() {
def <- get("x", environment())
cll <- get("x", parent.frame())
list(defined = def, called = cll)
}
}
g2 <- f2()
x <- 20
str(g2())
```
In more complicated scenarios, there's not just one parent call, but a sequence of calls which lead all the way back to the initiating function, called from the top-level. We can get a list of all calling environments using `sys.frames()`
```{r, eval = FALSE}
x <- 0
y <- 10
f <- function(x) {
x <- 1
g(x)
}
g <- function(x) {
x <- 2
h(x)
}
h <- function(x) {
x <- 3
i(x)
}
i <- function(x) {
x <- 4
sys.frames()
}
es <- f()
sapply(es, function(e) get("x", e, inherits = TRUE))
# [1] 1 2 3 4
sapply(es, function(e) get("y", e, inherits = TRUE))
# [1] 10 10 10 10
```
There are two separate strands of parents when a function is called: calling environments and enclosing environments. Each calling environment will also have a stack of enclosing environments. Note that a called function has both a stack of called environments and a stack of enclosing environments. However, an environment (or a function object) has only a stack of enclosing environments.
Looking up variables in the calling environment rather than in the defining argument is called __dynamic scoping__. Few languages implement dynamic scoping (Emacs Lisp is a [notable exception](http://www.gnu.org/software/emacs/emacs-paper.html#SEC15)). This is because dynamic scoping makes it much harder to reason about how a function operates: not only do you need to know how it was defined, you also need to know in what context it was called. Dynamic scoping is primarily useful for developing functions that aid interactive data analysis. It is one of the topics discussed in [metaprogramming](#metaprogramming).
### Exercises
* Write an enhanced version of `str()` that provides more information about functions. Show where the function was found and what environment it was defined in. Can you list objects that the function, but not the user, will be able to access?
## Explicit scoping with `local`
Sometimes it's useful to be able to create a new scope without embedding inside a function. The `local` function allows you to do exactly that. For example, to make an operation easier to understand, you can make temporary variables:
```{r}
df <- local({
x <- 1:10
y <- runif(10)
data.frame(x = x, y = y)
})
```
This is equivalent to:
```{r}
df <- (function() {
x <- 1:10
y <- runif(10)
data.frame(x = x, y = y)
})()
```
(If you're familiar with JavaScript you've probably seen this pattern before. It's the immediately invoked function expression (IIFE). It's used extensively by most JavaScript libraries to avoid polluting the global namespace.)
`local` has relatively limited uses (typically because most of the time scoping is best accomplished using R's regular function based rules) but it can be useful in conjunction with `<<-`. For example, you can use `local()` if you want to make a "private" variable only accessible from two functions:
```{r}
a <- 10
my_get <- NULL
my_set <- NULL
local({
a <- 1
my_get <<- function() {
a
}
my_set <<- function(value) {
a <<- value
}
})
my_get()
my_set(20)
a
my_get()
```
However, it can be easier to see what's going on if you avoid the implicit environment and create and access it explicitly:
```{r}
my_env <- new.env(parent = emptyenv())
my_env$a <- 1
my_get <- function() {
my_env$a
}
my_set <- function(value) {
my_env$a <- value
}
```
These techniques are useful if you want to store state in your package.
## Assignment: binding names to values {#binding}
Assignment is the act of binding (or rebinding) a name to a value in an environment. It is the counterpart to scoping, the set of rules that determines how to find the value associated with a name. Compared to most languages, R has extremely flexible tools for binding names to values. In fact, you can not only bind values to names, but you can also bind expressions (promises) or even functions, so that every time you access the value associated with a name, you get something different!
The remainder of this section will discuss the four main ways of binding names to values in R:
* With the regular behaviour, `name <- value`, the name is immediately associated with the value in the current environment. `assign("name", value)` works similarly, but allows assignment in any environment.
* The double arrow, `name <<- value`, assigns in a similar way to variable lookup, so that `i <<- i + 1` modifies the binding of the original `i`, which is not necessarily in the current environment.
* Lazy assignment, `delayedAssign("name", expression)`, binds an expression that isn't evaluated until you look up the name.
* Active assignment, `makeActiveBinding("name", function, environment)` binds the name to a function, so it is "active" and can return a different value each time the name is found.
### Regular binding
You have probably used regular assignment in R thousands of times. Regular assignment immediately creates a binding between a name and a value in the current environment.
There are two types of names: syntactic and non-syntactic. Generally, syntactic names consist of letters, digits, `.` and `_`, and must start with a letter or `.` not followed by a number (so `.a` and `._` are syntactic but `.1` is not). There are also a number of reserved words (e.g. `TRUE`, `NULL`, `if`, `function`, see `make.names()`). A syntactic name can be used on the left hand side of `<-`:
```{r}
a <- 1
._ <- 2
a_b <- 3
```
However, a name can actually be any sequence of characters; if it's non-syntactic you just need to do a little more work:
```{r, eval = FALSE}
`a + b` <- 3
`:)` <- "smile"
` ` <- "spaces"
ls()
# [1] " " ":)" "a + b"
```
`<-` creates a binding in the current environment. There are three techniques to create a binding in another environment:
* treating an environment like a list
```{r}
e <- new.env()
e$a <- 1
```
* use `assign()`, which has three important arguments: the name, the value, and the environment in which to create the binding
```{r}
e <- new.env()
assign("a", 1, envir = e)
```
* evaluate `<-` inside the environment. (More on this in [evaluation](#non-standard-evaluation).)
```{r}
e <- new.env()
eval(quote(a <- 1), e)
# alternatively, you can use the helper function evalq
# evalq(x, e) is exactly equivalent to eval(quote(x), e)
evalq(a <- 1, e)
```
I generally prefer to use the first form because it is so compact. However, you'll see all three forms in R code in the wild.
#### Constants
There's one extension to regular binding: constants. What are constants? They're variable whose values can not be changed; they can only be bound once, and never re-bound. We can simulate constants in R using `lockBinding`, or the infix `%<c-%` found in pryr:
```{r, error = TRUE}
x <- 10
lockBinding(as.name("x"), globalenv())
x <- 15
rm(x)
x %<c-% 20
x <- 30
rm(x)
```
`lockBinding()` is used to prevent you from modifying objects inside packages:
```{r, error = TRUE}
assign("mean", function(x) sum(x) / length(x), env = baseenv())
```
### `<<-`
Another way to modify the binding between name and value is `<<-`. The regular assignment arrow, `<-`, always creates a variable in the current environment. The special assignment arrow, `<<-`, never creates a variable in the current environment, but instead modifies an existing variable found by walking up the parent environments.
```{r}
x <- 0
f <- function() {
g <- function() {
x <<- 2
}
x <- 1
g()
x
}
f()
x
h <- function() {
x <- 1
x <<- 2
x
}
h()
x
```
If `<<-` doesn't find an existing variable, it will create one in the global environment. This is usually undesirable, because global variables introduce non-obvious dependencies between functions.
`name <<- value` is equivalent to `assign("name", value, inherits = TRUE)`.
To give you more idea how this works, we could implement `<<-` ourselves. I'm going to call it `rebind`, and emphasise that it's normally used to modify an existing binding. We'll implement it with our recursive recipe for working with environments. For the base case, we'll throw an error (where `<<-` would assign in the global environment), which emphasises the rebinding nature of this function. Otherwise we check to see if the name is found in the current environment: if it is, we do the assignment there; if not, we recurse.
```{r}
rebind <- function(name, value, env = parent.frame()) {
if (identical(env, emptyenv())) {
stop("Can't find ", name, call. = FALSE)
}
if (exists(name, envir = env, inherits = FALSE)) {
assign(name, value, envir = env)
} else {
rebind(name, value, parent.env(env))
}
}
rebind("a", 10)
a <- 5
rebind("a", 10)
a
f <- function() {
g <- function() {
rebind("x", 2)
}
x <- 1
g()
x
}
f()
```
We'll come back to this idea in depth, and see where it is useful in [functional programming](#functional-programming).
### Delayed bindings
Another special type of assignment is a delayed binding: rather than assigning the result of an expression immediately, it creates and stores a promise to evaluate the expression when needed (much like the default lazy evaluation of arguments in R functions). We can create delayed bindings with the special assignment operator `%<d-%`, provided by the pryr package.
```{r, cache = TRUE}
library(pryr)
a %<d-% 1
system.time(a)
b %<d-% {Sys.sleep(1); 1}
system.time(b)
```
Note that we need to be careful with more complicated expressions because user-created infix functions have very high precedence. They're higher in precedence than every other infix operator apart from `^`, `$`, `@`, and `::`. For example, `x %<d-% a + b` is interpreted as `(x %<d-% a) + b`, so we need to use parentheses ourselves:
```{r}
x %<d-% (a + b)
a <- 5
b <- 5
x
```
`%<d-%` is a wrapper around the base `delayedAssign()` function, which you may need to use directly if you need more control. `delayedAssign()` has four parameters:
* `x`: a variable name given as a quoted string
* `value`: an unquoted expression to be assigned to x
* `eval.env`: the environment in which to evaluate the expression
* `assign.env`: the environment in which to create the binding
Writing `%<d-%` is straightforward, bearing in mind that `makeActiveBinding` uses non-standard evaluation to capture the representation of the second argument, so we need to use substitute to construct the call manually. Once you've read [metaprogramming](#metaprogramming), you might want to read the source code and think about how it works.
One application of `delayedAssign` is `autoload`, a function that powers `library()`. `autoload` makes R behave as if the code and data in a package is loaded in memory, but it doesn't actually do any work until you call one of the functions or access a dataset. This is the way that data sets in most packages work - you can call (e.g.) `diamonds` after `library(ggplot2)` and it just works, but it isn't loaded into memory unless you actually use it.
### Active bindings
You can create __active__ bindings where the value is recomputed every time you access the name:
```{r}
x %<a-% runif(1)
x
x
```
`%<a-%` is a wrapper for the base function `makeActiveBinding()`. You may want to use this function directly if you want more control. It has three arguments:
* `sym`: a variable name, represented as a name object or a string
* `fun`: a single argument function. Getting the value of `sym` calls `fun` with zero arguments, and setting the value of `sym` calls `fun` with one argument, the value.
* `env`: the environment in which to create the binding.
### Exercises
* In `rebind()` it's unlikely that we want to assign in an ancestor of the global environment (i.e. a loaded package), so modify the function to avoid recursing past the global environment.
* Create a version of `assign()` that will only bind new names, never re-bind old names. Some programming languages only do this, and are known as [single assignment](http://en.wikipedia.org/wiki/Assignment_(computer_science)#Single_assignment) languages.
* Implement `str` for environments, listing all bindings in the environment, and briefly describing their contents (you might want to use `str` recursively). Use `bindingIsActive()` to determine if a binding is active. Indicate if bindings are locked (see `bindingIsLocked()`). Show the amount of memory the environment occupies using `object.size()`
* Write an assignment function that can do active, delayed and locked bindings. What might you call it? What arguments should it take? Can you guess which sort of assignment it should do based on the expression?