Use the new R pipe built into R 4.1

The R language has a new, built-in pipe operator as of R version 4.1:  |> 

%>% is the pipe that most R users know. Originally from the magrittr package, it’s now used in many other packages as well. (If you’re wondering where the magrittr name came from, it’s a reference to Belgian artist Rene Magritte and one of his paintings, The Treachery of Images, that says in French: “This is not a pipe.”)

Here’s a somewhat trivial example using the %>% pipe with the mtcars data set and a couple of dplyr functions. This code filters the data for rows with more than 25 mpg and arranges the results by descending miles per gallon:

library(dplyr)
mtcars %>%
filter(mpg > 25) %>%
arrange(desc(mpg))

Not everyone likes the pipe syntax. But especially when using tidyverse functions, there are advantages in code readability, in not having to repeat the data frame name, and not creating new copies of a data set. Here are some non-pipe ways of writing the same dplyr code:

mtcars <- filter(mtcars, mpg > 25)
mtcars <- arrange(mtcars, desc(mpg))

# OR

arrange(filter(mtcars, mpg > 25), desc(mpg))

Run R 4.1 in Docker

If you’re not yet ready to install R 4.1 on your system, one easy way to try out the new pipe is by running R 4.1 inside a Docker container. I provide full general instructions in “How to run R 4.0 in Docker” — the only new part is using a Docker image with R 4.1. Basically, you need to download and install Docker if you don’t already have it, launch Docker, and then run the code below in a terminal window (not the R console). 

docker run -e PASSWORD=your_password_here --rm -p 8787:8787 -v /path/to/local/directory:/home/rstudio/morewithr rocker/tidyverse:4.1.0

The -v /path/to/local/directory:/home/rstudio/morewithr part of the code creates a volume connecting a directory inside the Docker container to files in a local directory. That’s optional but can be quite handy.

The new pipe in R 4.1

Why does R need a new, built-in pipe when magrittr already supplies one? It cuts down on external dependencies, so developers don’t have to rely on an external package for such a key operation. Also, the built-in pipe may be faster. 

The new base R and magrittr pipes work mostly the same, but there’s an important difference when handling functions that don’t have pipe-friendly syntax. By pipe friendly, I mean a function’s first argument is likely to be a value that will be passed through from piped code. For example, the str_detect() function in the stringr package uses the string to be searched as its first argument and the pattern to search for as the second argument. That works well with pipes. For example:

library(stringr)
# add column name with car model number
mtcars$model <- rownames(mtcars)
# filter for all cars that start with "F"
mtcars %>%
filter(str_detect(model, "^F"))

By contrast, grepl() in base R has the opposite syntax. Its first argument is the pattern and the second argument is the string to search. That causes problems for a pipe.

The maggritr pipe has a solution for non-pipe-friendly syntax, which is to use the . dot character to represent the value being piped in:

mtcars %>%
filter(grepl("^F", .[["model"]]))

Now let’s see how the base R pipe works. It runs the stringr code just fine:

mtcars |>
dplyr::filter(stringr::str_detect(model, "^F"))

However, it doesn’t use a dot to represent what’s being piped, so this code will not work:

mtcars |>
filter(grepl("^F", .[["model"]]))

At least for now,  there is no special character to represent the value being piped.

In this example it hardly matters, since you don’t need a pipe to do something this simple. But what about more complex calculations where there isn’t an existing function with pipe-friendly syntax? Can you still use the new pipe?

It’s often not the most efficient option, but you could create your own function using the original function and just switch arguments around or otherwise re-do code so that the first argument becomes pipe friendly. For example, my new mygrepl function has a data frame as its first argument, which is often the way pipes start out:

mygrepl <- function(mydf, mycolumn, mypattern) {
mydf[grepl(mypattern, mydf[[mycolumn]]),]
}

mtcars |>
mygrepl("model", "^F")

R 4.1 function shorthand

And speaking of functions, R 4.1 has another interesting new feature. You can now use the backslash character as a shorthand for “function” in R 4.1. I think this was done mostly for so-called anonymous functions — i.e., functions you create within code that don’t have their own names. But it works for all functions. Instead of creating a new function with function(), you can now use (). For example:

mygrepl2 <- (mydf, mycolumn, mypattern) {
mydf[grepl(mypattern, mydf[[mycolumn]]),]
}

mtcars |>
mygrepl2("model", "^F")

R pipes and functions without arguments

Finally, one last point about the new built-in pipe. If you’re piping into a function with no arguments, parentheses are optional with the maggritr pipe but required with the base R pipe. These both work for %>% :

#Works:
mtcars %>%
tail()

#Works:
mtcars %>%
tail

But only the first version works with |> :

#Works:
mtcars |>
tail()

#Doesn't work
mtcars |>
tail

You can see the new pipe in action, plus running R 4.1 in a Docker container, in the video at the top of this article.

For more R tips and tutorials, head to my Do More With R page.

Copyright © 2021 IDG Communications, Inc.

Source link