15 - Control Structures

Introduction to control the flow of execution of a series of R expressions
module 4
week 4
R
programming
Author
Affiliations

This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

Pre-lecture materials

Read ahead

Read ahead

Before class, you can prepare by reading the following materials:

  1. https://rafalab.github.io/dsbook/programming-basics
  2. https://r4ds.had.co.nz/iteration

Acknowledgements

Material for this lecture was borrowed and adopted from

Learning objectives

Learning objectives

At the end of this lesson you will:

  • Be able to use commonly used control structures including if, while, repeat, and for
  • Be able to skip an iteration of a loop using next
  • Be able to exit a loop immediately using break

Control Structures

Control structures in R allow you to control the flow of execution of a series of R expressions.

Basically, control structures allow you to put some “logic” into your R code, rather than just always executing the same R code every time.

Control structures allow you to respond to inputs or to features of the data and execute different R expressions accordingly.

Commonly used control structures are

  • if and else: testing a condition and acting on it

  • for: execute a loop a fixed number of times

  • while: execute a loop while a condition is true

  • repeat: execute an infinite loop (must break out of it to stop)

  • break: break the execution of a loop

  • next: skip an interation of a loop

Pro-tip

Most control structures are not used in interactive sessions, but rather when writing functions or longer expressions.

However, these constructs do not have to be used in functions and it’s a good idea to become familiar with them before we delve into functions.

if-else

The if-else combination is probably the most commonly used control structure in R (or perhaps any language). This structure allows you to test a condition and act on it depending on whether it’s true or false.

For starters, you can just use the if statement.

if(<condition>) {
        ## do something
} 
## Continue with rest of code

The above code does nothing if the condition is false. If you have an action you want to execute when the condition is false, then you need an else clause.

if(<condition>) {
        ## do something
} 
else {
        ## do something else
}

You can have a series of tests by following the initial if with any number of else ifs.

if(<condition1>) {
        ## do something
} else if(<condition2>)  {
        ## do something different
} else {
        ## do something different
}

Here is an example of a valid if/else structure.

Let’s use the runif(n, min=0, max=1) function which draws a random value between a min and max value with the default being between 0 and 1.

x <- runif(n = 1, min = 0, max = 10)
x
[1] 3.521267

Then, we can write and if-else statement that tests whethere x is greater than 3 or not.

x > 3
[1] TRUE

If x is greater than 3, then the first condition occurs. If x is not greater than 3, then the second condition occurs.

if (x > 3) {
    y <- 10
} else {
    y <- 0
}

Finally, we can auto print y to see what the value is.

y
[1] 10

This expression can also be written a different (but equivalent!) way in R.

y <- if (x > 3) {
    10
} else {
    0
}

y
[1] 10
Note

Neither way of writing this expression is more correct than the other.

Which one you use will depend on your preference and perhaps those of the team you may be working with.

Of course, the else clause is not necessary. You could have a series of if clauses that always get executed if their respective conditions are true.

if(<condition1>) {

}

if(<condition2>) {

}
Question

Let’s use the palmerpenguins dataset and write a if-else statement that

  1. Randomly samples a value from a standard normal distribution (Hint: check out the rnorm(n, mean = 0, sd = 1) function in base R).
  2. If the value is larger than 0, use dplyr functions to keep only the Chinstrap penguins.
  3. Otherwise, keep only the Gentoo penguins.
  4. Re-run the code 10 times and look at output.
# try it yourself

library(tidyverse)
library(palmerpenguins)
penguins
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>

for Loops

For loops are pretty much the only looping construct that you will need in R. While you may occasionally find a need for other types of loops, in my experience doing data analysis, I’ve found very few situations where a for loop was not sufficient.

In R, for loops take an iterator variable and assign it successive values from a sequence or vector.

For loops are most commonly used for iterating over the elements of an object (list, vector, etc.)

for (i in 1:10) {
    print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

This loop takes the i variable and in each iteration of the loop gives it values 1, 2, 3, …, 10, then executes the code within the curly braces, and then the loop exits.

The following three loops all have the same behavior.

## define the loop to iterate over
x <- c("a", "b", "c", "d")

## create for loop
for (i in 1:4) {
    ## Print out each element of 'x'
    print(x[i])
}
[1] "a"
[1] "b"
[1] "c"
[1] "d"

We can also print just the iteration value (i) itself

## define the loop to iterate over
x <- c("a", "b", "c", "d")

## create for loop
for (i in 1:4) {
    ## Print out just 'i'
    print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4

seq_along()

The seq_along() function is commonly used in conjunction with for loops in order to generate an integer sequence based on the length of an object (or ncol() of an R object) (in this case, the object x).

x
[1] "a" "b" "c" "d"
seq_along(x)
[1] 1 2 3 4

The seq_along() function takes in a vector and then returns a sequence of integers that is the same length as the input vector. It doesn’t matter what class the vector is.

Let’s put seq_along() and for loops together.

## Generate a sequence based on length of 'x'
for (i in seq_along(x)) {
    print(x[i])
}
[1] "a"
[1] "b"
[1] "c"
[1] "d"

It is not necessary to use an index-type variable (i.e. i).

for (babyshark in x) {
    print(babyshark)
}
[1] "a"
[1] "b"
[1] "c"
[1] "d"
for (candyisgreat in x) {
    print(candyisgreat)
}
[1] "a"
[1] "b"
[1] "c"
[1] "d"
for (RememberToVote in x) {
    print(RememberToVote)
}
[1] "a"
[1] "b"
[1] "c"
[1] "d"

You can use any character index you want (but not with symbols or numbers).

for (1999 in x) {
    print(1999)
}
Error: <text>:1:6: unexpected numeric constant
1: for (1999
         ^

For one line loops, the curly braces are not strictly necessary.

for (i in 1:4) print(x[i])
[1] "a"
[1] "b"
[1] "c"
[1] "d"

However, I like to use curly braces even for one-line loops, because that way if you decide to expand the loop to multiple lines, you won’t be burned because you forgot to add curly braces (and you will be burned by this).

Question

Let’s use the palmerpenguins dataset. Here are the tasks:

  1. Start a for loop
  2. Iterate over the columns of penguins
  3. For each column, extract the values of that column (Hint: check out the pull() function in dplyr).
  4. Using a if-else statement, test whether or not the values in the column are numeric or not (Hint: remember the is.numeric() function to test if a value is numeric).
  5. If they are numeric, compute the column mean. Otherwise, report a NA.
# try it yourself

Nested for loops

for loops can be nested inside of each other.

x <- matrix(1:6, nrow = 2, ncol = 3)
x
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
for (i in seq_len(nrow(x))) {
    for (j in seq_len(ncol(x))) {
        print(x[i, j])
    }
}
[1] 1
[1] 3
[1] 5
[1] 2
[1] 4
[1] 6
Note

The j index goes across the columns. That’s why we values 1, 3, etc.

Nested loops are commonly needed for multidimensional or hierarchical data structures (e.g. matrices, lists). Be careful with nesting though.

Nesting beyond 2 to 3 levels often makes it difficult to read/understand the code.

If you find yourself in need of a large number of nested loops, you may want to break up the loops by using functions (discussed later).

while Loops

while loops begin by testing a condition.

If it is true, then they execute the loop body.

Once the loop body is executed, the condition is tested again, and so forth, until the condition is false, after which the loop exits.

count <- 0
while (count < 10) {
    print(count)
    count <- count + 1
}
[1] 0
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9

while loops can potentially result in infinite loops if not written properly. Use with care!

Sometimes there will be more than one condition in the test.

z <- 5
set.seed(1)

while (z >= 3 && z <= 10) {
    coin <- rbinom(1, 1, 0.5)

    if (coin == 1) { ## random walk
        z <- z + 1
    } else {
        z <- z - 1
    }
}
print(z)
[1] 2
Pro-tip

What’s the difference between using one & or two && ?

If you use only one &, these are vectorized operations, meaning they can return a vector, like this:

-2:2
[1] -2 -1  0  1  2
((-2:2) >= 0) & ((-2:2) <= 0)
[1] FALSE FALSE  TRUE FALSE FALSE

If you use two && (as above), then these conditions are evaluated left to right. For example, in the above code, if z were less than 3, the second test would not have been evaluated.

(2 >= 0) && (-2 <= 0)
[1] TRUE
(-2 >= 0) && (-2 <= 0)
[1] FALSE

repeat Loops

repeat initiates an infinite loop right from the start. These are not commonly used in statistical or data analysis applications, but they do have their uses.

IMPORTANT (READ THIS AND DON’T FORGET… I’M SERIOUS… YOU WANT TO REMEMBER THIS.. FOR REALZ PLZ REMEMBER THIS)

The only way to exit a repeat loop is to call break.

One possible paradigm might be in an iterative algorithm where you may be searching for a solution and you do not want to stop until you are close enough to the solution.

In this kind of situation, you often don’t know in advance how many iterations it’s going to take to get “close enough” to the solution.

x0 <- 1
tol <- 1e-8

repeat {
    x1 <- computeEstimate()

    if (abs(x1 - x0) < tol) { ## Close enough?
        break
    } else {
        x0 <- x1
    }
}
Note

The above code will not run if the computeEstimate() function is not defined (I just made it up for the purposes of this demonstration).

Pro-tip

The loop above is a bit dangerous because there is no guarantee it will stop.

You could get in a situation where the values of x0 and x1 oscillate back and forth and never converge.

Better to set a hard limit on the number of iterations by using a for loop and then report whether convergence was achieved or not.

next, break

next is used to skip an iteration of a loop.

for (i in 1:100) {
    if (i <= 20) {
        ## Skip the first 20 iterations
        next
    }
    ## Do something here
}

break is used to exit a loop immediately, regardless of what iteration the loop may be on.

for (i in 1:100) {
    print(i)

    if (i > 20) {
        ## Stop loop after 20 iterations
        break
    }
}

Summary

  • Control structures like if, while, and for allow you to control the flow of an R program
  • Infinite loops should generally be avoided, even if (you believe) they are theoretically correct.
  • Control structures mentioned here are primarily useful for writing programs; for command-line interactive work, the “apply” functions are more useful.

Post-lecture materials

Final Questions

Here are some post-lecture questions to help you think about the material discussed.

Questions
  1. Write for loops to compute the mean of every column in mtcars.

  2. Imagine you have a directory full of CSV files that you want to read in. You have their paths in a vector, files <- dir("data/", pattern = "\\.csv$", full.names = TRUE), and now want to read each one with read_csv(). Write the for loop that will load them into a single data frame.

  3. What happens if you use for (nm in names(x)) and x has no names? What if only some of the elements are named? What if the names are not unique?

Additional Resources

R session information

options(width = 120)
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.1 (2023-06-16)
 os       macOS Ventura 13.5
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2023-08-17
 pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package        * version date (UTC) lib source
 cli              3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
 colorout         1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
 colorspace       2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
 digest           0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
 dplyr          * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
 evaluate         0.21    2023-05-05 [1] CRAN (R 4.3.0)
 fansi            1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
 fastmap          1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
 forcats        * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
 generics         0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
 ggplot2        * 3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
 glue             1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
 gtable           0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
 hms              1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
 htmltools        0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
 htmlwidgets      1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
 jsonlite         1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
 knitr            1.43    2023-05-25 [1] CRAN (R 4.3.0)
 lifecycle        1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
 lubridate      * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
 magrittr         2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
 munsell          0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
 palmerpenguins * 0.1.1   2022-08-15 [1] CRAN (R 4.3.0)
 pillar           1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
 pkgconfig        2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
 purrr          * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
 R6               2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
 readr          * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
 rlang            1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
 rmarkdown        2.24    2023-08-14 [1] CRAN (R 4.3.1)
 rstudioapi       0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
 scales           1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
 sessioninfo      1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
 stringi          1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
 stringr        * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
 tibble         * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
 tidyr          * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
 tidyselect       1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
 tidyverse      * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
 timechange       0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
 tzdb             0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
 utf8             1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
 vctrs            0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
 withr            2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
 xfun             0.40    2023-08-09 [1] CRAN (R 4.3.0)
 yaml             2.3.7   2023-01-23 [1] CRAN (R 4.3.0)

 [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────