Hi. I’m Sharon Machlis at IDG Communications, here with Episode 46 of Do More With R: Check out three new features in R 4.0 by running R and RStudio in a Docker container.
Docker is a platform for creating “containers” – completely self-contained, isolated environments on your computer. Think of them like a mini system on your system. They include their own operating system, and then anything you want to add to that – application software, scripts, data . . .. Containers are useful for a lot of things. I’ll be focusing on just one of those things today: testing new versions of software without screwing up your existing local setup.
Running R 4.0 and the latest preview release of RStudio in a Docker container is pretty easy. If you don’t want to follow along with the Docker part of this tutorial, skip ahead in the video until you see a version of RStudio open in a Chrome browser tab. Then I’ll be showing a few new R features.
If you would like to follow along, install desktop Docker on your system if you don’t already have it: Head to docker dot com and download the right desktop version for your computer – Windows, Mac, or Linux. Then, launch it so you see the Docker icon like I have here.
Thanks to Adelmo Filho (a data scientist in Brazil) and the rocker R Docker project. I modified their Docker images to make the one I’m using here. You can think of a Docker image as a set of instructions to create a container with specific software included. Here’s the syntax to run a Docker image on your own system to create a container. In a command-line terminal window, you’d run something like this
docker is how you need to start any docker command. run means I want to run an image and create a container from that image. The dash dash r m flag means remove the container when it’s finished. You don’t have to have that; but if you run a lot of containers and don’t delete them, they’ll start taking up a lot of disk space. The dash p 8787:8787 is only needed for images that have to run on a system port, which RStudio (and shiny) need to. This is specifying port 8787, which is RStudio’s usual default.
The dash v creates a “volume”. Remember when I said Docker containers are self contained and isolated? That means isolated. By default, the container can’t access anything outside of it, and the rest of your system can’t access anything inside the container. But if you set up a volume, you can link a local folder with a folder inside the container. Then they automatically sync up.
The syntax is dash v, space, full path to your local folder, colon, path to a directory inside the container. With RStudio, you usually use slash, home, slash, rstudio, slash, and the name of a new folder you’d like to create. And then finally is the image you want to run. My image, like many Docker images, is stored on Docker Hub, a service set up by Docker for sharing images. Like with GitHub, you specify the hub user name, slash, and name of the repository. In this case you also add colon and a tag which helps if there are different versions of the same image.
OK, so here’s the code you’d use to run my image for R 4.0 and the latest preview release of RStudio
This won’t take that long for me to run, because I already have the image on my computer. When you run it for the first time, Docker will need to download it from Docker Hub, so it will probably take awhile. (After that, unless you delete your local copy of the image, it will be much faster).
Now if I open localhost at port 8787 in a browser, I should see RStudio. There it is! The default user name and password are both rstudio, which of course would be terrible if I was running this in the cloud. But it’s fine on my local machine, since I don’t normally have any password on my regular RStudio desktop.
Notice that I’ve got R version 4.0 here. And I’ve got RStudio version 1.3.947. If I go to my local desktop RStudio, I’ve got R version 3.6.1 and RStudio version 1.2.5033.
So now let’s look at 3 new features of R 4.0.
Here I’m making a simple data frame with info about 4 cities. Notice anything? City and State are characters, even though I didn’t specify stringsAsFactors = FALSE. Yes, at long last, the R data.frame default is stringsAsFactors = FALSE. If I go back to my older version of R and run the same code you see that City and State are factors.
Next, let’s look at a new built-in function in R 4.0: palette dot pals. This shows some built-in color palettes. Another new function, palette dot colors, gives info about those built-in palettes. You get the name of each color and its hex code. If you then run the scales package’s show_col() function on the results, you get a nice color display. I made a little function combining the two that could be useful for looking at some of the built-in palettes
None of this works in the earlier version of R
Finally, let’s look at a new function that makes it easier to include characters that usually need to be escaped in strings.
The syntax is r quotation mark open parentheses, your string, close parentheses, and close quote. Look at the first string, where I have un-escaped double quotes inside a pair of double quotes. RStudio hasn’t caught up with this new function – it’s showing an error. But if I run the code, it’s fine
I can see how this will help with the problem of multiple backslashes when escaping a backslash. I can print a literal backslash n inside the new function Without the special r quote parentheses function, that n is read as a line break Before this in base R, you needed to escape that backslash with a second backslash.
In older versions of R
There’s lots more new in R version 4.0, check out all the details at the R project website
That’s it for this episode, thanks for watching! For more R tips, head to the Do More With R page at bit-dot-l-y slash do more with R, all lowercase except for the R You can also find the Do More With R playlist on the YouTube IDG Tech Talk channel — where you can subscribe so you never miss an episode. Hope to see you next time. Stay healthy and safe, everyone!