Exceptions-Debugging.rmd

---
title: Exceptions and debugging
layout: default
---

# Debugging, condition handling and defensive programming

What happens when something goes wrong? This chapter will teach you how you to fix unanticipated problems (debugging), show you how functions can communicate expected problems to their users and how you can take action based on that communication (condition handling), and teach you how to avoid some problems before they occur (defensive programming).

Debugging is the art and science of fixing unexpected problems in your code. In this section you'll learn tools and techniques help you get to the root cause of an error when you encounter it. You'll learn both general strategies for debugging, as well as RStudio and R specific tools like `traceback()` and `browser()`. 

Not all problems are unexpected. When writing a function, you can often anticipate potential problems (like a file not existing, or the wrong type of input). Communicating these problems back to the user is the job of __conditions__, which include errors, warnings and messages:

* Fatal errors are rasied by `stop()` and force all execution to stop. 
  Errors are used when there is no way for a function to continue.

* Warnings are generated by `warning()` and are used to display potential
  problems, or when some elements of a vectorised input are invalid, 
  for example `log(-1:2)` and `sqrt(-1:2)`.

* Messages are generated by `message()` and are used to give informative output
  in a way that can easily be suppressed by the user 
  (with `suppressMessages()`). I often use messages when filling in 
  important missing arguments that have a non-trivial impact on the function. 

Conditions are usually printed in bold or coloured red (depending on your R interface). You can tell them apart because errors always start with "Error" and warnings with "Warning message". By default, warnings are aggregated together and displayed in a batch when you call `warnings()`. Function authors can also communicate with their users with `print()` or `cat()`, but I don't recommend it because it's hard to capture and selectively ignore this sort of output (and it's not a condition, so you can't use any of the useful condition handling tools).

Condition handling tools, like `try()`, `tryCatch()` and `withCallingHandlers()`, allow you to take specific actions when a condition occurs. For example, you could continue fitting models even if fitting one dataset fails with an error because the model doesn't converge. R offers an exceptionally powerful condition handling system based on ideas from common lisp, but it's currently not very well documented or often used. This chapter will introduce you to the most important basics, but if you want to learn more, I recommend the following two primary sources:

* [A prototype of a condition system for R](http://homepage.stat.uiowa.edu/~luke/R/exceptions/simpcond.html) by Robert Gentleman and Luke Tierney. This is describes an early version of R's condition system. The implementation changed somewhat since this was written, but it provides a good overview of how the pieces fit together, and some motivation for the design.

* [Beyond Exception Handling: Conditions and Restarts](http://www.gigamonkeys.com/book/beyond-exception-handling-conditions-and-restarts.html) by Peter Seibel. This describes exception handling in LISP, but the ideas are very similar in R, it provides useful motivation and examples. I have provided an R translation of the chapter at [beyond-exception-handling.html](beyond-exception-handling.html).

Finally, you can many avoid errors in the first place by using programming "defensively". You'll spend more time upfront writing your code, but you'll say time in the long run by reducing errors and providing more informative error messages. The basic principle is to "fail fast", raising an error as soon as you know there's something wrong, rather than trying to silently struggle through. In R, this has three particular applications: checking inputs are correct, avoiding non-standard evaluation and avoiding functions that can return different types of output.

## Debugging techniques

> Finding your bug is a process of confirming the many things
> that you believe are true --- until you find one which is not
> true. \
> --- Norm Matloff

Debugging code is challenging. R provides some useful tools which we'll discuss in the next section, but if you have a good technique, you can still productively debug a problem with just `print()`. There are four key components to the debugging process:

1. __Realise that you have a bug__

    If you're reading this chapter, you've probably already completed this step. 
    But this is a surprisingly important step: you can't fix a bug until you're 
    aware of it. This is one reason why automated test suites are so important
    when producing high-quality code. Automated testing is unfortunately 
    outside the scope of this book, but you can read some notes about it at
    http://adv-r.had.co.nz/Testing.html.

2. __Make it repeatable__

    Once you've determined you have a bug, you need to be able to recreate it 
    on command. This can be the most frustrating part of debugging, but if you 
    can't consistently recreate the bug, then it's extremely difficult to 
    isolate why it's occuring, and it's impossible to confirm that you've 
    fixed it. 
    
    Generally, you will start with a big block of code that you know causes the 
    error and then slowly whittle it down to get to the smallest possible 
    snippet that still causes the error. If it takes a long time to generate 
    the bug, it's also worthwhile figuring how to make it faster. It may be 
    worthwhile to use a caching strategy to save incremental results (but be 
    careful that you don't create new bugs by doing that).
    
    As you work on creating a minimal example, you'll also discover similar 
    inputs that don't cause the bug. Make a note of those: they will be 
    helpful when diagnosing the cause of the bug. 

    If you're using automated testing, this is a good time to create an
    automated test case. If your existing test coverage is low, take the 
    opportunity to add some nearby tests to reduce your chances of creating
    a new bug.
    
3. __Figure out where it is.__

    If you're lucky, one of the tools in the following section will allow you 
    to quickly navigate to the line of code that's causing the bug. Usually, 
    however, you'll probably have to think a bit more about the problem. Two 
    general useful techniques are binary search and the scientific process.
   
    To do a binary search, you repeatedly remove half of the code. The bug
    will either bug appear or not; but either way you've reduced the amount of
    code to look through by half. This allows you to quickly narrow down the
    problem even if you have a lot of code.

    If binary search doesn't work, adopt the scientific process. Generate 
    hypotheses, design experiments to test them and then record your results. 
    This does seem like more work, but a systematic approach will end up 
    saving you time in the long run because each step you take will move you 
    towards a solution. You can generate initial hypothesis by comparing the
    inputs that cause the bug with those that don't.

4. __Fix it and test it.__

    Once you've found the bug, you need to figure out how to fix it, and then 
    check that it actually worked.  Again, it's very useful to have 
    automated tests so that you can ensure that you've actually fixed the bug, 
    and you haven't created any new bugs in the process.

In my experience, it doesn't matter so much exactly what your process is, just that you have one. I often end up wasting too much time trying to rely on my intuition when I would have been better off taking a systematic approach.

## Debugging tools

As well as a broad strategy to follow when debugging code, you also need some concrete tools. In this section you'll learn tools provided both by R and the RStudio IDE. Rstudio's integrated debugging support makes life easier, but it mostly exposes existing R tools in a user friendly way. I'll show you both the Rstudio way and the regular R way so that you can work with whatever environment you have. You may also want to refer to the official [Rstudio debugging documentation](http://www.rstudio.com/ide/docs/debugging/overview) - this will always reflect the functionality in the latest version of Rstudio.

There are three key debugging tools:

* Determining the sequence of calls that lead to the error with the Rstudio
  error inspector or `traceback()`.

* Entering an interactive session where an error occured with Rstudio's "Rerun 
  on debug" or `recover()`.

* Entering an interactive session in arbitrary code with Rstudio's breakpoints 
  or `browser()`.

I'll explain each tool in more detail below.

Note that you shouldn't need to use these tools when writing new functions. If you find yourself using them frequently with new code, you may want to reconsider your approach: it's much easier to start simple and test interactively as you go, rather than writing something big and complicated and then trying to figure out exactly where the problem is. 

### Determining the sequence of calls

The most important tool to start with is the traceback, the sequence of calls that lead up to an error. Here's a simple an example: you can see that `f()` calls `g()` calls `h()` calls `i()` which adds together a number and a string creating a error:

```{r, eval = FALSE}
f <- function(a) g(a)
g <- function(b) h(b)
h <- function(c) i(c)
i <- function(d) "a" + d
f(10)
```

When we run this code in Rstudio we see:

![Initial traceback display](traceback-hidden.png)

If you click "Show traceback" you see:

![Traceback display after clicking "show traceback"](traceback-shown.png)

If you're not using Rstudio, you can use the `traceback()` function to get the equivalent information:

```{r, eval = FALSE}
traceback()
# 4: i(c) at error.R#3
# 3: h(b) at error.R#2
# 2: g(a) at error.R#1
# 1: f(10)
```

You read the call stack from bottom to top: the initial call is `f()`, which eventually calls `i()` which triggers the error. If you're calling code that you `source()`d into R, the traceback will also display the location of the function, in the form `filename.r#linenumber`. These are clickable in Rstudio, and will take you to the corresponding line of code in the editor.

Sometimes this is enough information to let you track down the error and fix it. However, it's usually not enough: it shows you where the error occured, but not why. The next useful tool is the interactive debugger, which allows you to pause execution of a function and interactively explore its state.

### Browsing on error

The easiest way to enter the interactive debugger is through RStudio's "Rerun with debug" tool. This reruns the command that created the error, pausing execution where the error occured. You're now in an interactive state just like the regular R console, but you're inside the function, and can interact with any object defined their. You'll see the objects in the current environment in the Environment pane, the traceback in a new traceback pane and you can run arbitrary R code in the console to figure out what went wrong.

As well as any regular R function, there are a few special commands you can use in debug model. You can access them either with the Rstudio toolbar (![](debug-toolbar.png)) or with the keyboard:

* Next, `n`: executes the next step in the function. Be careful if you have a 
  variable named `n`; to print it you'll need to do `print(n)`.

* Continue, `c`: leaves interactive debugging and continues regular execution
  of the function. This is useful if you've fixed the bad state and want to 
  check that the function proceeds correctly.

* Stop, `Q`: stops debugging, terminates the function and return to the global
  workspace.

There are two other useful commands:

* Enter: repeats the previous command. I find this too easy to activate while
  debugging, so I turn it off using `options(browserNLdisabled = TRUE)`.

* `where`: prints stack trace of active calls (the interactive equivalent of
  `traceback`)

To enter this style of debugging outside of Rstudio, you can use the `error` option. This specifies a function to run when an error occurs. The function most similar to Rstudio's debug is `browser()`: this will start an interactive console in the environment where the error occured. Use `options(error = browser)` to turn it on, re-run the previous command, then use `options(error = NULL)` to return to the default error behaviour. You could automate this with the `browseOnce()` function as defined below:

```{r, eval = FALSE}
browseOnce <- function() {
  old <- getOption("error")
  function() {
    options(error = old)
    browser()
  }
}
options(error = browseOnce())

f <- function() stop("!")
# Enters browser
f()
# Runs normally
f()
```

There are two other useful functions that you can use with the `error` option:

* `recover` is this is a step up from `browser`, as it allows you to enter the
  environment of any of the calls in the call stack. This is useful because
  often the cause of the error is a number of calls back.

* `dump.frames` is an equivalent to `recover` for non-interactive code. It 
  creates a `last.dump.rda` file in the current working directory that you 
  can load into an interactive later R session using `debugger()`, and 
  recreates the error as if you had called `recover`. This allows interactive 
  debugging of batch code.

    ```{r, eval = FALSE}
    # In batch R process
    dump_and_quit <- function() {
      # Save debugging info to file last.dump.rda
      dump.frames(to.file = TRUE)
      # Quit R with error status
      q(status = 1)
    }
    options(error = dump_and_quit)

    # Then in an interactive R session:
    load("last.dump.rda")
    debugger()
    ```

Finally, to reset error behaviour to the default, use `options(error = NULL)`. Then errors will print a message and abort function execution.

### Browsing arbitrary code

As well as entering an interactive console on error, you can enter it at an arbitrary location in your code by using either an Rstudio breakpoint or `browser()`. You can set a breakpoint in Rstudio by clicking to the left of the line number, or pressing `Shift + F9`, or equivalently, add `browser()` when you want execution to pause. Breakpoints behave similarly to `browser()` but they are easier to set (one click instead of nine key presses), and you don't run the risk of accidentally including a `browser()` statement in your source code. There are few places that breakpoints are not equivalent to `browser()`: read [breakpoint troubleshooting](http://www.rstudio.com/ide/docs/debugging/breakpoint-troubleshooting) for more details. One downside of breakpoints is that you can't set them conditionally, whereas you can always put `browser()` inside an `if` statement.

As well as adding `browser()` yourself, there are two functions that will add it to code:

* `debug()` inserts a browser statement in the first line of the specified
  function. `undebug()` will remove it, or you can use `debugonce()` to insert 
  to browse only on the next run.

* `utils::setBreakpoint()` works similarly, but instead of taking a function 
  name, it takes a file name and line number and finds the appropriate function
  for you.

These two functions are both special cases of `trace()`, which inserts arbitrary code at any position in an existing function. `trace()` is occasionally useful when you're debugging code that you don't have the source for.  To remove tracing from a function, use  `untrace()`. Also note that you can only perform one trace per function.

### The call stack: `traceback(), `where` and `recover()`.

Unfortunately the call stacks printed by `traceback()`, `browser()` + `where` and `recover()` are not consistent. Using the simple nested set of calls below, the call backs look like this table. Note that numbering is different between `traceback()` and `where`, and `recover()` displays calls in the opposite order, and omits the call to `stop()`.

`traceback()`     `where`                 `recover()`
----------------  ----------------------- ------------
4: stop("Error")  where 1: stop("Error")   1: f()  
3: h(x)           where 2: h(x)            2: g(x)
2: g(x)           where 3: g(x)            3: h(x)
1: f()            where 4: f() 

Rstudio displays calls in the same order as `traceback()` but omits the numbers.

```{r, eval = FALSE, echo = FALSE}
f <- function(x) g(x)
g <- function(x) h(x)
h <- function(x) stop("Error")
f(); traceback()
options(error = browser); f()
options(error = recover); f()
options(error = NULL)
```

### Other types of failure

There are other ways for a function to fail apart from throwing an error or returning an incorrect result. 

* A function may generate an unexpected warning. The easiest way to track down
  warnings is to convert them into errors with `options(warn = 2)`. The you can
  use the regular debugging tools. When you do this you'll see some extra calls 
  in the call stack, like to `doWithOneRestart()`, `withOneRestart()`, 
  `withRestarts()` and `.signalSimpleWarning()`. Ignore these: they are 
  internal functions used to turn warnings into errors.

* A function may generate an unexpected message. There's no built in tool to 
  help solve like for warnings, but it's easy to create one (you'll learn how
  this function works in the next section):

    ```{r, error = TRUE}
    message2error <- function(code) {
      withCallingHandlers(code, message = function(e) stop(e))  
    }

    f <- function() g()
    g <- function() message("Hi!")
    g()
    message2error(g())
    traceback()
    ```
  
    As with warnings, you'll need to ignore some of the calls on the tracback 
    (i.e. the first two and the last 7).

* A function might never return. This is particularly hard to debug 
  automatically, but sometimes terminating the function and looking at the 
  call stack is informative. Otherwise, use the basic debugging strategies 
  described above.

* The worst scenario is that your code might crash R completely, leaving you 
  no way to interactively debug your code. This typically indicates a bug with 
  underlying C code, and the tools are much harder to use.  Sometimes an 
  interative debugger, like `gdb`, can be useful, but describing how to use 
  one is beyond the scope of this book.  If it's in base R code, posting a 
  reproducible example to R-help is a good idea. If it's in a package, contact
  the maintainer. If it's your own C or C++ code, you'll  need to use 
  numerous `print()` statements to narrow down the location of the bug, and 
  then you'll need to use many more print statements to figure out which 
  data structure doesn't have the properties that you expect.

## Error handling

Unexpected errors require interactive debugging to figure out what went wrong. Some errors, however, are expected, and you want to handle them automatically. In R, expected errors crop up most frequently when you're fitting many models to different datasets or bootstrap replicates. Sometimes the model might fail to fit and throw an error, but you don't want to stop everything; instead you want to fit as many models as possible and then perform diagnostics after the fact. In R, there are two tools for handling exceptions programmatically: `try()` (simple) and `tryCatch()` (complex). 

### Basic error handling with try()

`try()` allows execution to continue even after an error has occured. For example, normally if you run a function that throws an error, it terminates immediately and doesn't return a value:

```{r, error = TRUE}
f1 <- function(x) {
  log(x)
  10
}
f1("x")
```

However, if you wrap the statement that creates the error in `try()`, the error message will  be printed but execution will continue:

```{r}
f2 <- function(x) {
  try(log(x))
  10
}
f2()
```

You can suppress the message with `try(..., silent = TRUE)`. To pass larger blocks of code to `try()`, wrap them in `{}`:

```{r}
try({
  a <- 1
  b <- "x"
  a + b
})
a
b
```

You can also capture the output of the `try()` function. If successful, it will be the last result evaluated in the block (just like a function); if unsuccessful it will be an (invisible) object of class "try-error":

```{r}
success <- try(1 + 2)
failure <- try("a" + "b")
str(success)
str(failure)
```

`try()` is particularly useful when you're applying a function to multiple elements in a list:

```{r, error = TRUE}
elements <- list(1:10, c(-1, 10), c(T, F), letters)
results <- lapply(elements, log)
results <- lapply(elements, function(x) try(log(x)))
```

There isn't a built-in function for testing for this class, so we'll define one. Then you can easily find the locations of errors with `sapply()` (as discussed in the Functions chapter), and extract the successes or look at the inputs that lead to failures.

```{r}
is.error <- function(x) inherits(x, "try-error")
succeeded <- !sapply(results, is.error)

# look at successful results
str(results[succeeded])

# look at inputs that failed
str(elements[!succeeded])
```

Another useful `try()` idiom is setting a default value if an expression fails. Simply assign the default value outside the try block, and then run the risky code:

```{r, eval = FALSE}
default <- NULL
try(default <- read.csv("possibly-bad-input.csv"), silent = TRUE)
```

The function operators chapter discusses the `failwith()` function operator which makes this pattern particularly useful.

### Advanced error handling with `tryCatch()`

`tryCatch()` is more powerful than `try()`, because as well as dealing with errors, it also allows you to take specific actions for messages, warnings and interrupts. You've seen messages (made by `message()`) and warnings (made by `warn()`) before, but interrupts are new. They can't be generated directly by the programmer, but are raised when the user attempts to terminate execution by by presses Ctrl + Break, Escape, or Ctrl + C (depending on the platform). `tryCatch()` also provides the finally hook to run code regardless of whether or not an error occured.

The `tryCatch()` has three arguments:

* `expr`: the code to run.

* `...`: a set of named functions. If an condition is raised, `tryCatch` will 
  call the first handler whose name matches one of the classes of the condition. 
  The only useful names for built-in conditions are `error`, `warning`,
  `message`, `interrupt` and `condition`. Handler functions are passed a single 
  object, representing the condition that was raised.

* `finally`: code to run regardless of whether `expr` succeeds or fails. This
  is useful for clean up, as described below. All handlers have been turned
  off by the time the `finally` code is run, so errors will propagate as
  usual. (Note that this is functionally equivalent to using `on.exit()` 
  but it can wrap smaller chunks of code than an entire function).

The following examples illustrate the basic properties of `tryCatch`:

```{r, error = TRUE}
# If multiple handlers match, the first is used
tryCatch(stop("error"), 
  error = function(c) "a",
  error = function(c) "b"
)

# If multiple signals are nested, the the most internal is used first.
tryCatch(
  tryCatch(stop("error"), error = function(c) "a"),
  error = function(c) "b"
)

# Uncaught conditions propagate outwards. 
tryCatch(
  tryCatch(stop("error")),
  error = function(c) "b"
)

# No matter what happens, finally is run:
tryCatch(stop("error"), 
  finally = print("Done."))
tryCatch(a <- 1, 
  finally = print("Done."))
  
# Any errors that occur in the finally block are handled normally
a <- 1
tryCatch(a <- 2, finally = stop("Error!"))
```

Catching interrupts can be useful if you want to take special action when the user tries to abort running code.

```{r, eval = FALSE}
# Don't let the user interrupt the code
i <- 1
while(i < 3) {
  tryCatch({
    Sys.sleep(0.5)
    message("Try to escape")
  }, interrupt = function(x) {
    message("Try again!")
    i <<- i + 1
  })  
}
```

A handler function can do anything, but typically it will either return a value, or pass the condition along. For example, we can write a simple version of `try` using `tryCatch`: 

```{r}
try2 <- function(code, silent = FALSE) {
  tryCatch(code, error = function(c) {
    msg <- conditionMessage(c)
    if (!silent) message("Error: ", c)
    invisible(structure(conditionMessage(c), class = "try-error"))
  })
}

try2(1)
try2(stop("Hi"))
try2(stop("Hi"), silent = TRUE)
```

The real version of `try` is considerably more complicated to make the error message look more like what you'd see if `tryCatch()` wasn't used.

### Advanced error handling

One of the downsides of most functions in R is that they just call `stop()` with a string. That means if you want to figure out if a particular error occured, you have to look at the text of the error message. This is error prone, not only because the text of the error might change over time, but also because many error messages are translated, so the message might be completely different to what you expect.

However, R has a little known and little used feature that alleviates this problem. Conditions are S3 classes, so you can define your own for specific types of errors. Each function, `stop()`, `warning()` and `message()` can be given either a list of strings, or a custom S3 condition object. Custom condition objects are not used very often, but are very useful because they make it possible for the user to respond to different errors in different ways. For example, "expected" (like a model failing to converge for some input datasets) can be silently ignored, while unexpected errors (like no disk space available) can be propagated to the user. 

R doesn't come with a built-in constructer function for conditions, but we can easily add one. Conditions must contain a `message` and `call` components, but can contain anything else that is useful for  When creating a new condition, it should always inherit from `condition` and one of `error`, `warning` and `message`.

```{r}
condition <- function(subclass, message, call = sys.call(-1), ...) {
  structure(
    class = c(subclass, "condition"),
    list(message = message, call = call),
    ...
  )
}
is.condition <- function(x) inheritis(x, "condition")
```

You can signal an arbitrary condition with `signalCondition()`, but nothing will happen unless you've instantiated a custom signal handler. Instead, use `stop()`, `warning()` or `message()` as appropriate to trigger the usual handling. (Note that R won't complain if the class of your condition doesn't match the function, but you should avoid this in real code).

```{r, error = TRUE}

c <- condition(c("my_error", "error"), message = "This is an error")
signalCondition(c)
stop(c)
warning(c)
message(c)
```

Note that when using `tryCatch()` with multiple handlers and custom classes, the first handler to match any class in the hierarchy is called, not necessarily the best match. For this reason, you need to make sure to put the most specific handlers first:

```{r}
tryCatch(stop(c), 
  error = function(c) "error",
  my_error = function(c) "my_error"
)
tryCatch(stop(a), 
  my_error = function(c) "my_error",
  error = function(c) "error"
)
```

There is one other way to capture conditions: `withCallingHandlers()`. There are two main differences between `tryCatch()` and `withCallingHandlers()`:

* The default behaviour of `tryCatch()` handlers is handle the error and return
  a value, where the return value of `withCallingHandlers()` handlers is 
  ignored by default:

    ```{r, error = TRUE}
    f <- function() stop("!")
    tryCatch(f(), error = function(e) 1)
    withCallingHandlers(f(), error = function(e) 1)
    ```

* The handlers in `withCallingHandlers()` are called in the context of the 
  call that generated the condition; the handlers in `tryCatch()` are called 
  in the context of `tryCatch()`:
  
    ```{r, eval = FALSE}
    f <- function() g()
    g <- function() h()
    h <- function() stop("!")

    tryCatch(f(), error = function(e) print(sys.calls()))
    # [[1]] tryCatch(f(), error = function(e) print(sys.calls()))
    # [[2]] tryCatchList(expr, classes, parentenv, handlers)
    # [[3]] tryCatchOne(expr, names, parentenv, handlers[[1L]])
    # [[4]] value[[3L]](cond)
    
    withCallingHandlers(f(), error = function(e) print(sys.calls()))
    # [[1]] withCallingHandlers(f(), error = function(e) print(sys.calls()))
    # [[2]] f()
    # [[3]] g()
    # [[4]] h()
    # [[5]] stop("!")
    # [[6]] .handleSimpleError(function (e) print(sys.calls()), "!", quote(h()))
    # [[7]] h(simpleError(msg, call))
    ```
    
    This also affects the order in which `on.exit()` is called.

These subtle differences are rarely useful, except when you're trying to capture exactly what went wrong and pass it on to another function. For most purposes, you should never need to use `withCallingHandlers()`

### Exercises

* Compare the following two implementations of `message2error()`. What is the 
  main advantage of `withCallingHandlers()` in this scenario? (Hint: look 
  carefully at the traceback.)

    ```{r}
    message2error <- function(code) {
      withCallingHandlers(code, message = function(e) stop(e))  
    }
    message2error <- function(code) {
      tryCatch(code, message = function(e) stop(e))  
    }
    ```

## Defensive programming

Defensive programming is the art of making code fail in a well-defined manner even when something unexpected occurs. A general principle for errors is to "fail fast" - as soon as you figure out something as wrong, and your inputs are not as expected, you should raise an error. This is more work for you as the function author, but will make it easier for the user to debug because they get errors early on, not after unexpected input has passed through several functions and caused a problem.

This principle has three main applications in R:

* Be strict about what you accept. If your function is not vectorised in its 
  inputs, but uses functions that are, make sure to check that the inputs are 
  scalars. You can use `stopifnot()`, the 
  [assertthat](https://github.com/hadley/assertthat) package or simple `if`
  statements and `stop()`.

* Avoid functions that use special evaluation (e.g. `subset`, `with`, 
  `transform`). These functions make assumptions to reduce typing, but when
  those assumptions are not met, they will often fail with uninformative error
  messages.
  
* Avoid functions that return different types of output depending on their 
  input. The two biggest offenders are `[` and `sapply`. Whenever using 
  subsetting a data frame in a function, you should always use `drop = TRUE`
  otherwise you will accidentally convert 1-column data frames into vectors.
  Similarly, never use `sapply()` inside a function: always use the stricter
  `vapply()` which will throw an error if the inputs are incorrect types and
  return the correct type of output even if for zero-length inputs.

There is a tension between interactive analysis and programming. When you a doing an analysis, you want R to do what you mean, and if it guesses wrong, then you'll discover it right away and can fix it. When you're programming, you want robust functions with no magic that give you errors as quickly as possible. It's useful to keep this tension in mind when writing functions. If you're making a function to faciliate interactive data analysis, it's free to guess what the analyst wants or recover from minor misspecifications; but if you're making a function to program with, it should be quite strict with its inputs.

### Exercises

* The goal of the `col_means()` function defined below is to compute the means
  of all numeric columns in a data frame. 
  
    ```{r}
    col_means <- function(df) {
      numeric <- sapply(df, is.numeric)
      numeric_cols <- df[, numeric]
      
      data.frame(lapply(numeric_cols, mean))
    }
    ```
    
    However, the function as written, is not robust to unusual inputs. Look at 
    the following results, decide which ones are incorrect, and modify `col_means`
    to be more robust. (Hint: there are two function calls in `col_means` that 
    are particularly prone to problems.)

    ```{r, eval = FALSE}
    col_means(mtcars)
    col_means(mtcars[, 0])
    col_means(mtcars[0, ])
    col_means(mtcars[, "mpg", drop = F])
    col_means(1:10)
    col_means(as.matrix(mtcars))
    col_means(as.list(mtcars))
    
    mtcars2 <- mtcars
    mtcars2[-1] <- lapply(mtcars2[-1], as.character)
    col_means(mtcars2)
    ```

* The following function "lags" a vector, returning a version of `x` that is `n` 
  values behind the original. Improve the function so that (1) it returns a
  useful error message if `n` is not a vector, (2) it has reasonable behaviour
  when `n` is 0 or longer than `x`.

    ```{r}
    lag <- function(x, n = 1L) {
      xlen <- length(x)      
      c(rep(NA, n), x[seq_len(xlen - n)])
    }
    ```