forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathExceptions-Debugging.rmd
638 lines (456 loc) · 23.5 KB
/
Exceptions-Debugging.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
---
title: Exceptions and debugging
layout: default
---
# Exceptions and debugging
There are three ways that a function can fail. It can:
* abort with an error
* generate an unexpected warning or other message
* return an incorrect result
* never return
* crash R
As a programmer, you need to understand both how these are generated, so you can use errors and warnings in your own code, and how to debug them, finding out why a function failed.
R functions have three main ways to communicate to the user:
* By sending a `message()`, which usually is printed in bold font.
Messages can be suppressed with `suppressMessages()`
* By generating a `warning()`, which is prefixed by "Warning message".
Multiple errors are aggregate together by default. You can see them all
with `warnings()` or force to display individually with `options(warn = 1)`.
Warnings can be suppressed with `suppressWarnings()`
* By raising a fatal error with `stop()`. Errors force all execution to stop,
and are displayed like messages with an additional "Error" prefix.
This chapter describes techniques to use when things go wrong:
* Debugging: tools and techniques to figure out what went wrong.
* Exceptions: the objects that underly error handling in R.
* Defensive programming: writing programs that are less likely to fail,
and when they do fail, produce useful error messages.
## Debugging tools
We'll start by discussing debugging tools, and then discuss strategies by which you can deploy them
This section discusses how to debug from the command-line. These are tools of last resort and if you find you're using them very frequently when writing functions you may want to reconsider your approach: it's much easier to start simple and test as you go, than write something big and complicated and then figure out exactly where the problem is.
Unlike the rest of the book, this chapter describes a specific interface for working with R: RStudio. This is becuase debugging benefits so much from editor support. It's possible to debug code in R in a completely reproducible way, where every debugging action has a corresponding line in the code. Using Rstudio is less reproducible, but generally easier.
Debugging is the art of determining what code raised the error or warning, why it was raised, and what you need to do to fix it. This section will introduce you to the three most important tools for debugging:
* Finding out the sequence of calls that lead to the error with `traceback()`
* Entering an interactive session in an arbitrary code location with
`browser()` or breakpoints.
* Entering an interactive session in the middle of a sequence of calls with
`recover()`.
### Traceback
The most important function to start with `traceback()`, which displays the __call stack__, the sequence of functions calls leading up to the error. Here's an example:
```{r, eval = FALSE}
f <- function(a) g(a)
g <- function(b) h(b)
h <- function(c) i(c)
i <- function(d) "a" + d
f(10)
# Error in "a" + d : non-numeric argument to binary operator
traceback()
# 4: i(c)
# 3: h(b)
# 2: g(a)
# 1: f(10)
```
You read the call stack from bottom to top: `f()` calls `g()` calls `h()` calls `i()`. If you're calling your own code that you've `source()`d into R, the traceback will also include filenames and line numbers in the form `filename.r#linenumber`, to make it easier to jump to where the error occured.
If you're using RStudio, you don't need to call `traceback()` explicitly. Whenever you encounter an error, one of the options will be "Show Traceback". This will display a traceback as described above, and any source refernces will be clickable, so you can jump to exactly where the error occured.
This is very helpful to determine exactly where in a stack of calls an error occured. However, it's not so helpful if you have a recursive function, or other situations where the same function is called in multiple places:
```{r, eval = FALSE}
j <- function(i = 5) {
if (i == 1) "a" + 1
j(i - 1)
}
j()
# Error in "a" + 1 : non-numeric argument to binary operator
traceback()
# 5: j(i - 1) at #3
# 4: j(i - 1) at #3
# 3: j(i - 1) at #3
# 2: j(i - 1) at #3
# 1: j()
```
### Browser
Trackback can help you figure out where the error occurred, but to understand why the error occured and to fix it, it's often easier to explore interactively. `browser()` allows you to do this by pausing execution and returning you to an interactive state. Here you can run any regular R command, as well as some extra single letter commands:
* `c`: leave interactive debugging and continue execution
* `n`: execute the next step. Be careful if you have a variable named `n`: to
print it you'll need to be explicit `print(n)`.
* `\n`: the default behaviour is the same as `c`, but this is somewhat
dangerous as it makes it very easy to accidentally continue during
debugging. I recommend `options(browserNLdisabled = TRUE)` so that a new
line is simply ignored.
* `Q`: stops debugging, terminate the function and return to the global
workspace
* `where`: prints stack trace of active calls (the interactive equivalent of
`traceback`)
Don't forget that you can combine `if` statements with `browser()` to only debug when a certain situation occurs.
In Rstudio, there's another alternative to using `browser()`: breakpoints. You can set a breakpoint in R code by clicking to the left of the line number in an R script, or pressing `Shift + F9`. Breakpoints are effectively equivalent to `browser()` but they are easier to set (one click instead of nine key presses), and you don't run the risk of accidentally including a `browser()` statement in your source code. There are few places that breakpoints are not equivalent to `browser()`: read [breakpoint troubleshooting](http://www.rstudio.com/ide/docs/debugging/breakpoint-troubleshooting) for more details.
### Browsing arbitrary R code
As well as adding `browser()` yourself, there are two functions that will added it to code:
* `debug()` inserts a browser statement in the first line of the specified
function. `undebug` will remove it, or you can use `debugonce` to insert a
browser call for the next run, and have it automatically removed afterwards.
* `utils::setBreakpoint()` does the same thing, but instead inserts `browser()`
in the function corresponding to the specified file name and line number.
These two functions are both special cases of `trace()`, which allows you to insert arbitrary code in any position in an existing function. The complement of `trace()` is `untrace()`. You can only perform one trace per function - subsequent traces will replace prior.
Locating warnings is a little trickier. The easiest way to turn it in an error with `options(warn = 2)` and then use the standard functions described above. Turn back to default behaviour with `options(warn = 0)`.
### Browsing on error
It's also possible to start `browser` automatically when an error occurs, by setting `options(error = browser)`. This will start the interactive debugger in the environment in which the error occurred. Other functions that you can supply to `error` are:
* `recover`: a step up from `browser`, as it allows you to drill down into any
of the calls in the call stack. This is useful because often the cause of
the error is a number of calls back - you're just seeing the consequences.
This is the result of "fail-slow" code
* `dump.frames`: an equivalent to `recover` for non-interactive code. Will
save an `rdata` file containing the nested environments where the error
occurred. This allows you to later use `debugger` to re-create the error as
if you had called `recover` from where the error occurred
```{r, eval = FALSE}
# Saves debugging info to file last.dump.rda
options(error = quote({dump.frames(to.file = TRUE); q()}))
# Then in an interactive R session:
print(load("last.dump.rda"))
debugger("last.dump")
```
* `NULL`: the default. Prints an error message and stops function execution.
Use this to reset back to the regular behaviour.
Warnings are harder to track down because they don't provide any information about where they occured. Currently, the best way to debug them into turn them into errors using `options(warn = 2)`: then you can apply any of the techniques described above.
### The call stack: `traceback(), `where` and `recover()`.
Unfortunately the call stacks printed by `traceback()`, `browser()` + `where` and recover are not consistent. Using the simple nested set of calls below, the call backs look like this table. Note that the numbering is different between `traceback()` and `where`, and `recover()` displays in the opposite order, and omits the call to `stop()`.
`traceback()` `where` `recover()`
---------------- ----------------------- ------------
4: stop("Error") where 1: stop("Error") 1: f()
3: h(x) where 2: h(x) 2: #1: g(x)
2: g(x) where 3: g(x) 3: #1: h(x)
1: f() where 4: f()
```{r, eval = FALSE}
f <- function(x) g(x)
g <- function(x) h(x)
h <- function(x) stop("Error")
f(); traceback()
options(error = browser); f()
options(error = recover); f()
options(error = NULL)
```
## Debugging techniques
> Finding your bug is a process of confirming the many things
> that you believe are true --- until you find one which is not
> true.
> --- Norm Matloff
Binary search
http://en.wikibooks.org/wiki/Computer_Programming_Principles/Maintaining/Debugging
1. Recognize that a bug exists
2. Isolate the source of the bug
3. Identify the cause of the bug
4. Determine a fix for the bug
5. Apply the fix and test it
http://cm.bell-labs.com/cm/cs/tpop/debugging.html
Make the bug reproducible.
Divide and conquer.
Study the numerology of failures.
Draw a picture.
Keep records.
http://www.et.byu.edu/~rhelps/eet340/html/debugging_principles.htm: Strict debugging sequence
- Testing (design a test program)
- Stabilization (bug repeatability)
- Localization (hypothesize, analyze)
- Correction (errors in implementation and design)
## Exceptions
The fine details of exceptions are not particularly well documented in R. If you want to learn more about the internals, I recommend the following two primary sources:
* [A prototype of a condition system for R](http://homepage.stat.uiowa.edu/~luke/R/exceptions/simpcond.html) by Robert Gentleman and Luke Tierney. This is describes an early version of R's condition system. The implementation changed somewhat since this was written, but it provides a good overview of how the pieces fit together, and some motivation for the design.
* [Beyond Exception Handling: Conditions and Restarts](http://www.gigamonkeys.com/book/beyond-exception-handling-conditions-and-restarts.html) by Peter Seibel. This describes exception handling in LISP, but the ideas are basically
the same in R, and it provides some more complicated use cases.
### Creation
You create errors in R with `stop()`.
### Basic error handling
Error handling is performed with the `try()` (simple) and `tryCatch()` (complex) functions. `try()` allows execution to continue even after an exception has occured. For example, normally if you run a function that throws an error, it terminates immediately and doesn't return a value:
```{r}
f1 <- function(x) {
log(x)
10
}
f1("x")
```
However, if you wrap the statement that creates the error in `try()`, the error message will still be printed but execution will continue:
```{r}
f2 <- function(x) {
try(log(x))
10
}
f2()
```
Note that you pass larger blocks of code to `try()` by wrapping them in `{}`:
```{r}
try({
a <- 1
b <- "x"
a + b
})
a
b
```
You can also capture the output of the `try()` function. If succesful, it will be the last result evaluated in the block (just like a function); if unsuccesful it will be an (invisible) object of class "try-error":
```{r}
success <- try(1 + 2)
failure <- try("a" + "b")
str(success)
str(failure)
```
You can use the second argument to `try()`, `silent`, to suppress the printing of the error message.
`try()` is particularly useful when you're applying a function to multiple elements in a list:
```{r}
elements <- list(1:10, c(-1, 10), c(T, F), letters)
results <- lapply(elements, log)
results <- lapply(elements, function(x) try(log(x)))
```
There isn't a built-in function for testing for this class, so we'll define one. Then you can easily find the locations of errors with `sapply()` (learn more about it in the functionals chapter), and extract the successes or look at the inputs that lead to failures.
```{r}
is.error <- function(x) inherits(x, "try-error")
succeeded <- !sapply(results, is.error)
# look at successful results
str(results[succeeded])
# look at inputs that failed
str(elements[!succeeded])
```
Another useful `try()` idiom is setting a default value if an expression fails. Simply assign the default value outside the try block, and then run the risky code:
```{r, eval = FALSE}
default <- NULL
try(default <- read.csv("possibly-bad-input.csv"), silent = TRUE)
```
### Advanced error handling
`tryCatch` gives more control than `try`, but to understand how it works, we first need to learn a little about conditions, the S3 objects that represent errors, warnings and messages.
```{r}
is.condition <- function(x) inherits(x, "condition")
```
There are three convenience methods for creating errors, warnings and messages. All take two arguments: the `message` to display, and an optional `call` indicating where the condition was created
```{r}
e <- simpleError("My error", quote(f(x = 71)))
w <- simpleWarning("My warning")
m <- simpleMessage("My message")
```
There is one class of conditions that can't be generated directly: interrupts, which occur when the user presses Ctrl + Break, Escape, or Ctrl + C (depending on the platform) to terminate execution.
The components of a condition can be extracted with `conditionMessage` and `conditionCall`:
```{r}
conditionMessage(e)
conditionCall(e)
```
Conditions can be signalled using `signalCondition`. By default, no one is listening, so this doesn't do anything.
```{r}
signalCondition(e)
signalCondition(w)
signalCondition(m)
```
To listen to signals, we have two tools: `tryCatch()` and `withCallingHandlers()`. `tryCatch()` is an exiting handler: it catches the condition, but the rest of the code after the exception is not run. `withCallingHandlers()` sets up calling handlers: it catches the condition, and then resumes execution of the code. We will focus first on `tryCatch()`.
The `tryCatch()` call has three arguments:
* `expr`: the code to run.
* `...`: a set of named arguments setting up error handlers. If an error
occurs, `tryCatch` will call the first handler whose name matches one of the
classes of the condition. The only useful names for built-in conditions are
`interrupt`, `error`, `warning` and `message`.
* `finally`: code to run regardless of whether `expr` succeeds or fails. This
is useful for clean up, as described below. All handlers have been turned
off by the time the `finally` code is run, so errors will propagate as
usual.
The following examples illustrate the basic properties of `tryCatch`:
```{r}
# Handlers are passed a single argument
tryCatch(stop("error"),
error = function(...) list(...)
)
# This argument is the signalled condition, so we'll call
# it c for short.
# If multiple handlers match, the first is used
tryCatch(stop("error"),
error = function(c) "a",
error = function(c) "b"
)
# If multiple signals are nested, the the most internal is used first.
tryCatch(
tryCatch(stop("error"), error = function(c) "a"),
error = function(c) "b"
)
# Uncaught signals propagate outwards.
tryCatch(
tryCatch(stop("error")),
error = function(c) "b"
)
# The first handler that matches a class of the condition is used,
# not the "best" match:
a <- structure(list(message = "my error", call = quote(a)),
class = c("a", "error", "condition"))
tryCatch(stop(a),
error = function(c) "error",
a = function(c) "a"
)
tryCatch(stop(a),
a = function(c) "a",
error = function(c) "error"
)
# No matter what happens, finally is run:
tryCatch(stop("error"),
finally = print("Done."))
tryCatch(a <- 1,
finally = print("Done."))
# Any errors that occur in the finally block are handled normally
a <- 1
tryCatch(a <- 2,
finally = stop("Error!"))
```
What can handler functions do?
* Return a value.
* Pass the condition along, by re-signalling the error with `stop(c)`, or
`signalCondition(c)` for non-error conditions.
* Kill the function completely and return to the top-level with
`invokeRestart("abort")`
* Invoke another restart defined by `withRestarts()`.
We can write a simple version of `try` using `tryCatch`. The real version of `try` is considerably more complicated to preserve the usual error behaviour.
```{r}
try <- function(code, silent = FALSE) {
tryCatch(code, error = function(c) {
if (!silent) message("Error:", conditionMessage(c))
invisible(structure(conditionMessage(c), class = "try-error"))
})
}
try(1)
try(stop("Hi"))
try(stop("Hi"), silent = TRUE)
rm(try)
withCallingHandlers({
a <- 1
stop("Error")
a <- 2
}, error = function(c) {})
```
### Using `tryCatch`
With the basics in place, we'll next develop some useful tools based the ideas we just learned about.
The `finally` argument to `tryCatch` is particularly useful for clean up, because it is always called, regardless of whether the code executed successfully or not. This is useful when you have:
* modified `options`, `par` or locale
* opened connections, or created temporary files and directories
* opened graphics devices
* changed the working directory
* modified environment variables
The following function changes the working directory, executes some code, and always resets the working directory back to what it was before, even if the code raises an error.
```{r}
in_dir <- function(path, code) {
cur_dir <- getwd()
tryCatch({
setwd(path)
force(code)
}, finally = setwd(cur_dir))
}
getwd()
in_dir(R.home(), dir())
getwd()
in_dir(R.home(), stop("Error!"))
getwd()
```
Another more casual way of cleaning up is the `on.exit` function, which is called when the function terminates. It's not as fine grained as `tryCatch`, but it's a bit less typing.
```{r}
in_dir <- function(path, code) {
cur_dir <- getwd()
on.exit(setwd(cur_dir))
force(code)
}
```
If you're using multiple `on.exit` calls, make sure to set `add = TRUE`, otherwise they will replace the previous call. **Caution**: Unfortunately the default in `on.exit()` is `add = FALSE`, so that every time you run it, it overwrites existing exit expressions. Because of the way `on.exit()` is implemented, it's not possible to create a variant with `add = TRUE`, so you must be careful when using it.
### Exercises
1. Write a function that opens a graphics device, runs the supplied code, and closes the graphics device (always, regardless of whether or not the plotting code worked).
## Defensive programming
Defensive programming is the art of making code fail in a well-defined manner even when something unexpected occurs. There are two components of this art related to exceptions: raising exceptions as soon as you notice something has gone wrong, and responding to errors as cleanly as possible.
A general principle for errors is to "fail fast" - as soon as you figure out something as wrong, and your inputs are not as expected, you should raise an error. This is more work for you as the function author, but will make it easier for the user to debug because they get errors early on, not after unexpected input has passed through several functions and caused a problem.
There is a tension between interactive analysis and programming. When you a doing an analysis, you want R to do what you mean, and if it guesses wrong, then you'll discover it right away and can fix it. If you're creating a function, then you want to make it as robust as possible so that any problems become apparent right away (see fail fast below).
* Be explicit:
* Check the types of inputs
* Be explicit about missings:
* Avoid functions that have non-standard evaluation rules (i.e
`subset`, `with`, `transform`). These functions save you time when working
interactively, but when they fail inside a function they usually don't
return a useful error message.
* Avoid functions that can return different types of objects:
* Make sure you use preserving subsetting.
* Don't use `sapply()`: use `vapply()`, or `lapply()` plus the appropriate
transformation
### Creating
There are a number of options for letting the user know when something has gone wrong:
* don't use `cat()` or `print()`, except for print methods, or for optional
debugging information.
* use `message()` to inform the user about something expected - I often do
this when filling in important missing arguments that have a non-trivial
computation or impact. Two examples are `reshape2::melt` package, which
informs the user what melt and id variables were used if not specified, and
`plyr::join`, which informs which variables were used to join the two
tables. You can suppress messages with `suppressMessages`.
* use `warning()` for unexpected problems that aren't show stoppers.
`options(warn = 2)` will turn warnings into errors. Warnings are often
more appropriate for vectorised functions when a single value in the vector
is incorrect, e.g. `log(-1:2)` and `sqrt(-1:2)`. You can suppress warnings
with `suppressWarnings`
* use `stop()` when the problem is so big you can't continue
* `stopifnot()` is a quick and dirty way of checking that pre-conditions for
your function are met. The problem with `stopifnot` is that if they aren't
met, it will display the test code as an error, not a more informative
message. Checking pre-conditions with `stopifnot` is better than nothing,
but it's better still to check the condition yourself and return an
informative message with `stop()`
### An example
The following function is naively written and might cause problems:
```{r}
col_means <- function(df) {
numeric <- sapply(df, is.numeric)
numeric_cols <- df[, numeric]
data.frame(lapply(numeric_cols, mean))
}
```
The ability to come up with a set of potential pathological inputs is a good skill to master. Common cases that I try and check are:
* dimensions of length 0
* dimensions of length 1 (in case dropping occurs)
* incorrect input types
The following code exercises some of those cases for `col_means`
```{r}
col_means(mtcars)
col_means(mtcars[, 0])
col_means(mtcars[0, ])
col_means(mtcars[, "mpg", drop = F])
col_means(1:10)
col_means(as.matrix(mtcars))
col_means(as.list(mtcars))
mtcars2 <- mtcars
mtcars2[-1] <- lapply(mtcars2[-1], as.character)
col_means(mtcars2)
```
A better version of `col_means` might be:
```{r}
col_means <- function(df) {
numeric <- vapply(df, is.numeric, logical(1))
numeric_cols <- df[, numeric, drop = FALSE]
data.frame(lapply(numeric_cols, mean))
}
```
We use `vapply` instead of `sapply`, remember to use `drop = FALSE`. It still doesn't check that the input is correct, or coerce it to the correct format.
## Example ideas
```{r}
insert <- function(x, value, pos) {
before <- x[1:pos]
after <- x[pos:length(x)]
c(before, value, after)
}
insert(1:5, 10, 3)
insert(1:5, 10, 1)
insert <- function(x, value, pos) {
before <- x[1:(pos - 1)]
after <- x[(pos - 1):length(x)]
c(before, value, after)
}
insert(1:5, 10, 3)
insert(1:5, 10, 1)
insert(1:5, 10, 5)
larger <- function(x, y) {
y.is.bigger <- y > x
x[y.is.bigger] <- y[y.is.bigger]
y
}
factorial1 <- function(x) {
x * factorial1(x - 1)
}
greetings <- function(name) {
first <- substr(name, 1, 1)
capital <- toupper(first)
name <- gsub(first, capital, names)
paste("Hello", name, "!")
}
```