Hi. I’m Sharon Machlis at IDG Communications, here with Episode 55 of Do More With R: Easy R error handling when iterating.
There’s little more annoying when running code over a lot of objects than having the code choke part of the way through. Something in one of those objects caused a problem, and you need to track down the offender.
Purrr’s possibly() function is an easy way to do that. Let’s take a look.
I run into this a lot when I’m trying to import CSV or Excel files that I think all have the same structure. But actually, some files’ value columns import as numbers while others come in as character strings, maybe because of a stray character in one of the cells.
In this demo, four files’ value columns will import as characters, but one will come in as numbers.
First I’ll load my libraries; then I’ll use base R’s list.files() function to get the names of all the files in my data directory.
Next, I’ll import the first file and look at its structure.
You can see that both the Value column and the Month column are coming in as character strings, instead of Value as a number and MonthStarting as a date.
Here, I wrote a little function to take care of this. It imports the file and then uses dplyr’s transmute() function to create a new Month column from MonthStarting, but as Date objects. And, a new Total column turns the character strings in Value into a number.
I like readr’s parse_number() function for turning character strings with commas, dollar signs and percent signs into numbers. But it has to have character strings as input. If a value is already a number, parse_number() will throw an error.
Let me try my new function on processing the first file. Seems to work fine.
Next, I’ll test purrr’s map_df() function on the first two files in my data directory. Also works fine. But if I try running my function on all the files, it’ll choke.
Ideally, I’d like to run through all the files, marking the ones with problems as errors but still processing all of them. That’s what possibly() will let me do.
This first line of code creates a brand new function, which I’m calling “safer_process_file()”. I’m doing this by taking my initial function, process_file(), and wrapping it in the possibly() function. The first argument for possibly() is my original function. The second argument, otherwise, tells possibly what to return if there’s an error.
Now look what happens if I run my new, safer function on all my files. Notice that I’m using the map() function and NOT purrr’s map_df() function to apply safer_process_file() on all my files. That’s because my safer function needs to return a list, not a data frame. And that’s because if there’s an error, results won’t be a data frame; they’ll just be the character string that I told otherwise to generate.
You can see here that the 4th item, from my 4th file, is the one with the error. That’s easy to see with only 5 items, but wouldn’t be quite as easy if I had a thousand files to import and 3 had errors.
If I name the list with my original file names and then take a look, it would be a bit easier to see.
I can even capture the results of str() in a text file if I want to parse it further.
Now that I know file4 is the problem, I can import just that one and see what the issue is. Ah, Value is coming in as numeric. I revise my process_file function to account for the possibility that Value isn’t a character string. See the new transmute Total column definition? I added an ifelse statement to check whether the value is a character.
Now if I use purrr’s map_df with my new process_file2() function, not wrapped in possibly(), it should work and give me a single data frame.
Just the data and format I wanted!
That’s it for this episode, thanks for watching! For more R tips, head to the Do More With R page at bit-dot-l-y slash do more with R, all lowercase except for the R.
You can also find the Do More With R playlist on YouTube’s IDG Tech Talk channel — where you can subscribe so you never miss an episode. Hope to see you next time. Stay healthy and safe, everyone!