class: center, middle, inverse, title-slide .title[ # Debugging and defensive programming ] .author[ ### MACSS 30500
University of Chicago ] --- class: inverse, middle # R Base Data Structures --- ## Bugs > An error, flaw, failure or fault in a computer program or system that causes it to produce an incorrect or unexpected result, or to behave in unintended ways. When we talk about a bug mostly we refer to a syntactic error or a typo, but a bug is also related to understanding R rules, data types, structures, and how to use functions. Computers are powerful tools but you need to follow their rules Debugging has two goals: * Prevent bugs from occurring in the first place * Fix bugs once they occur **Debugging (like programming) requires patience!** --- class: inverse, middle # Defensive programming --- ## Defensive programming Two elements of defensive programming... **Style guide for writing code:** * Notation and naming guide (file names, object names, etc.) * Syntax (spacing, curly braces, line length, indentation, assignment, calling functions) * Comments (# and space) * Auto-formatting in RStudio **Failing fast:** * Condition handling --- ## Writing code Programming | Language ------------|---------- Scripts | Essays Sections | Paragraphs Lines Breaks | Sentences Parentheses | Punctuation Functions | Verbs Variables | Nouns --- ### A text with no syntax "alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, 'and what is the use of a book,' thought alice 'without pictures or conversation?' so she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a white rabbit with pink eyes ran close by her.there was nothing so very remarkable in that; nor did alice think it so very much out of the way to hear the rabbit say to itself, 'oh dear! oh dear! i shall be late!' (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the rabbit actually took a watch out of its waistcoat-pocket, and looked at it, and then hurried on, alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge." ... --- ### A text with syntax Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, 'and what is the use of a book,' thought Alice 'without pictures or conversation?' So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her. There was nothing so VERY remarkable in that; nor did Alice think it so VERY much out of the way to hear the Rabbit say to itself, 'Oh dear! Oh dear! I shall be late!' (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the Rabbit actually TOOK A WATCH OUT OF ITS WAISTCOAT-POCKET, and looked at it, and then hurried on, Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge. In another moment down went Alice after it, never once considering how in the world she was to get out again... " --- ## Functions to transform ``` r alice %>% str_remove_all("\\.") %>% str_replace_all("[A-Z]", function(x) tolower(x)) ``` --- ## Object names ```r # Optimal (short and use of snake case) day_one day_1 # Not optimal first_day_of_the_month DayOne dayone djm1 ``` The tidyverse style guide https://style.tidyverse.org/ --- ## Overwriting objects Avoid assigning the same to object or functions that already exists in R or assigning something wrong: ```r # Not Good T <- FALSE c <- 10 x <- seq(from = 1, to = 10) mean <- function(x) sum(x) mean(x) ``` --- ## Line length R does not necessary require you to split your code across multiple lines in order to run, but it is good practice to do so: ```r # Optimal scdbv <- scdbv %>% mutate(chief = factor(chief, levels = c("Jay", "Rutledge", "Ellsworth", "Marshall", "Taney", "Chase", "Waite", "Fuller", "White", "Taft", "Hughes", "Stone", "Vinson", "Warren", "Burger", "Rehnquist", "Roberts"))) # Not optimal scdbv <- mutate(scdbv, chief = factor(chief, levels = c("Jay", "Rutledge", "Ellsworth", "Marshall", "Taney", "Chase", "Waite", "Fuller", "White", "Taft", "Hughes", "Stone", "Vinson", "Warren", "Burger", "Rehnquist", "Roberts"))) ``` --- ## Indentation Indentation makes the code more readable. For example, it is helpful to identify which values are part of which function: ```r # in a mutate() function scdbv <- scdbv %>% mutate(majority = majority - 1, chief = factor(chief, levels = c("Jay", "Rutledge", "Ellsworth", "Marshall", "Taney", "Chase", "Waite", "Fuller", "White", "Taft", "Hughes", "Stone", "Vinson", "Warren", "Burger", "Rehnquist", "Roberts"))) ``` --- ## Calling functions If you are using functions that you have not written (e.g., from packages), you do not have the ability to rename them. Sometimes functions have the **same name across different packages**: ```r library(purrr) map() ``` -- ```r library(purrr) library(maps) map() ``` -- `map()` is defined both in the `purrr` and `maps` package. By default, R will call the function from the package most recently loaded. --- ## `::` notation To fix this problem, we can detach and re-attach a package, but more frequently we use the `::` notation ```r library(purrr) library(maps) purrr::map() # use map() from the purrr library maps::map() # use map() from the maps library ``` -- We can also avoid loading a given package, and just load the specific function that we need from it: ```r library(purrr) map() # use map() from the purrr library maps::map() # use map() from the maps library ``` --- ## Auto-formatting in RStudio RStudio helps out with these issues: * `Code > Reformat Code` (Shift + Cmd/Ctrl + A) * `Code > Reindent Lines` (Cmd/Ctrl + I) * For better help see [`styler`](http://styler.r-lib.org/) <!-- * [This code example](/notes/style-guide/#exercise-style-this-code) --> --- ## Auto-formatting in RStudio Try it out! * option 1: `Code > Reformat Code` (Shift + Cmd/Ctrl + A) * option 2: `Code > Reindent Lines` (Cmd/Ctrl + I) * option 3: `install.packages("styler")` (should be able to find it under addins at top right) ``` r y<-10 if (y < 20) { x <- "Too low" } else { x <-"Too high"} ``` --- class: inverse, middle # Condition handling --- ## Condition handling **Coding style** is one way to practice defensive programming and prevent bugs. Another way is **condition handling**: set up our code in a way that it tells us if something is problematic. Three types of conditions: * (Fatal) Errors * Warnings * Messages --- ## Errors **Code is written incorrectly or asks R to do something that is not possible.** For example, this `addition()` function takes two arguments and adds them together. Notice the condition checks if either `x` or `y` is not a number. If that's TRUE, the `stop()` function triggers a error and notifies the user: ``` r addition <- function(x, y) { if (!is_numeric(c(x, y))) { stop("One of your inputs is not a number") } else { return(x + y) } } addition(3, "2") ``` ``` ## Error in is_numeric(c(x, y)): could not find function "is_numeric" ``` --- ## Errors Notice, a function can test for more than one error; you need to check each of them separately with multiple if-else statements. The function stops as soon as it encounters one of them. How to determine what errors to check for? 1. The more conditional tests you build into the function, the more robust is the function against incorrect uses; BUT the longer it takes to write it 1. Think about who is going to use that function and how frequently 1. Can provide documentation on how to use the function to reduce tests --- ## Warnings **Code runs but you might want to take a look, as it might be problematic.** For example, this code defines a function that takes as input `x` a probability value (between 0 and 1) and we want to convert it to a natural logarithm value. R will execute this code, but when the function is called with values outside the probability range, it gives a warning that says the result produces a “NaN” value ("Not a Number", impossible to calculate): ``` r logit <- function(x) { return(log(x / (1 - x))) } logit(-1) ``` ``` ## Warning in log(x/(1 - x)): NaNs produced ``` ``` ## [1] NaN ``` --- ## Warnings To fix the warning, we can add a condition that signals and triggers an error instead than a warning. For example, if `x` is not between 0 and 1, then stop the code: ``` r logit <- function(x) { if (x < 0 | x > 1) { stop('x not between 0 and 1') } else { return(log(x / (1 - x))) } } logit(-1) ``` ``` ## Error in logit(-1): x not between 0 and 1 ``` --- <!-- ## Warnings Same code of the previous slide, written more compactly: ``` r logit <- function(x) { if (x < 0 | x > 1) stop('x not between 0 and 1') log(x / (1 - x)) } logit(-1) ``` ``` ## Error in logit(-1): x not between 0 and 1 ``` Notice here we can write `if` and the condition one the same line without the `{}` and still preserve code legibility of this single `if` statement; we can also remove `return` --> ## Warnings If we do not want to stop the code from running, we can also fix the warning in other ways, without triggering an error. For example: (1) we can check if `x` is outside the range, if so, replace it with a missing value; (2) trigger a warning if `x` is a missing value (whose log is a missing value) ``` r logit <- function(x) { x <- if_else(x < 0 | x > 1, NA_real_, x) if (is.na(x)) { warning('x not between 0 and 1') return(log(x / (1 - x))) } } logit(-1) ``` ``` ## Warning in logit(-1): x not between 0 and 1 ``` ``` ## [1] NA ``` --- ## Messages **Messages do not indicate that something is wrong, but provide useful information to the user.** For example, here we are plotting with `geom_point()` and `geom_smooth()`, which automatically decides which smoothing algorithm to use to create the line (default is `gam` based on sample size): ``` r ggplot(diamonds, aes(carat, price)) + geom_point() + geom_smooth() ``` ``` ## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")' ``` <img src="index_files/figure-html/message_ggplot-1.png" width="35%" /> --- ## Messages To write a message in your code, use the `message()` function. ``` r demo_message <- function() message("This is a message") demo_message() ``` ``` ## This is a message ``` You can also suppress a message, if you want, with `suppressMessages()`: ``` r suppressMessages(demo_message()) # no output ``` -- You do not want to use `print()` to create a message. The `supressMessages()` function would not work with it. --- class: inverse, middle # Debugging --- ## Debugging techniques 1. Realize that you have a bug 1. Make it repeatable 1. Figure out where it is 1. Fix it --- ## The call stack **Often times, when we run a piece of code the actual cause of the problem is not in the line we run.** For example, here we have several functions that call other functions: `f` is a function that takes an input `a` and passes it into another function `g`; `g` takes an input `b` and passes it into function `h`, etc. The problem is in function `i`, but let's say in our code we call function `f`: ``` r f <- function(a) g(a) g <- function(b) h(b) h <- function(c) i(c) i <- function(d) "a" + d f(10) ``` ``` ## Error in "a" + d: non-numeric argument to binary operator ``` --- ## The call stack We cannot fix function `f`, because the problem does not occur there. We need to fix function `i` which triggers a the entire call sequence. Use `traceback()`, which is often shown automatically in RStudio, and read it from bottom to top. The line at the top is where the error occurred: ``` r traceback() ``` ``` # 4: i(c) at exceptions-example.R#3 # 3: h(b) at exceptions-example.R#2 # 2: g(a) at exceptions-example.R#1 # 1: f(10) ``` --- class: center, inverse, middle ## YOUR TURN!! `usethis::use_course("CFSS-MACSS/debugging")` --- ## Before our next class: * Install `rvest` * [GET AN API KEY](https://www.omdbapi.com/apikey.aspx) * [(optional) Register on geonames](https://www.geonames.org/export/web-services.html) --- ## Acknowledgments The content of these slides is derived in part from Sabrina Nardin and Benjamin Soltoff’s “Computing for the Social Sciences” course materials, licensed under the CC BY NC 4.0 Creative Commons License. Any errors or oversights are mine alone.