How to create Drilldown graphs with R and highcharter

Hi. I’m Sharon Machlis at IDG Communications, here with Episode 51 of Do More With R: Create an interactive drilldown graph with R and the highcharter package.

Drilldowns can be a good way to present a lot of data in a digestible format. Here’s the example we’ll build today: Median home values by U.S. state; click a state to drill down to county-level data.

There are 3 main steps to making a highcharter drilldown graph:

1. Wrangle your data into the necessary format;
2. Create a basic top-level graph; and
3. Add the drilldown.
Let’s get started. If you want to follow along, download state- and county-level data sets for the Zillow Home Value Index from Zillow at Zillow dot com slash research slash. I’m using the ZHVI Single-Family Homes series.

First, I load packages I’ll be using: rio, dplyr, purrr, highcharter, scales, and stringr. Note that highcharter is an R wrapper for the Highcharts JavaScript library. And, that library is only free for personal, non-commercial use (including testing it locally); or use by non-profits, universities, or public schools. For anything else, you need to buy a license. (If you’re wondering, I talked my boss into buying me a developer’s license, so I have one).

Next, I import the state and county CSV files into R. Let’s take a look at the states file structure. These are time series with a LOT of months as columns. Next I want to see what the last column is, since I want to graph the most recent data. Now I know that I want June 30 2020 as my Median Value column. I’d like to compare that value to the start of the century, so I’ll also include January 31, 2000 as a “Price in 2000” column.

To create my “latest_states” data set, I’ll arrange the data by descending median value.
I’ll then add three new columns. MedianValueFormatted and PriceIn2000Formatted are just to get dollar signs and commas in the values to display in tooltips. You can do that with JavaScript code also, but I find it easier to do in R with the scales package. Finally, in that last row, I’m using slice() to get the 10 highest and 10 lowest state values. (Having 20 states instead of 50 plus DC makes the graph a bit less unwieldy).

I do pretty much the same thing to create the latest_counties data frame. Here I filter for rows that include any state in my latest_states data. I also remove the word “County” from all the county names to save space on my graph.

So far, this is usual R data wrangling. Next, though, I’m going to create a special drilldown version of the county data specifically for highcharter. Let me break down what’s happening here. First, I’m taking the latest_counties data frame and using dplyr’s group_nest() function to create a list column for each state’s row with just that state’s data. . The data column has a data frame for each state’s data by county. Notice it’s got 6 columns from the original data – everything but the State, which is in the other column.

But highcharter needs a little extra data here. The first two lines of code under mutate add an ID column – how the drilldown connects to the data one level up – and a type column, for the type of highcharter graph I want. That first map() line of code adds two columns to each data frame in the list column: “name” for the county name and “y” for the county value.

Finally, the last line in mutate() uses the highcharter function list_parse() to turn each data frame in the data column into a list with a different format.

My next step is an optional one: Formatting the tooltip. If I use the default, I’ll just see the name of the place and the Y value. But highcharter lets you add more with a tooltip_table() function. This is one of the easiest ways I’ve seen to customize tooltips for an R HTML widget. I create one vector with my category text and another vector with my values, and then use the tooltip_table() function.

Finally, we’re ready to make the graph! hchart() is a highcharter function to make a basic graph. It’s taking as arguments the data frame, the chart type, and then a ggplot-like aesthetic function. Its options include x column, y column, and drilldown column. Here you see the default tooltip, and there’s no drilldown functionality yet.

hc_drilldown() adds that drilldown. The series argument needs list_parse() again. hc_tooltip() uses that custom tooltip format I created before. The useHTML = TRUE argument just makes for nicer tooltip format. After that, there’s code to remove the x and y axis labels, and add a title and subtitle. And there we are: A drilldown graph with R and highcharter.

You can find out a lot more about the highcharter package at its website j.k.u.n.s.t dot com slash highcharter.

If you want all the code from this episode, check out the related InfoWorld article at the URL on screen.

That’s it for this episode, thanks for watching! For more R tips, head to the Do More With R page at bit-dot-l-y slash do more with R, all lowercase except for the R.

You can also find the Do More With R playlist on the YouTube IDG Tech Talk channel — where you can subscribe so you never miss an episode. Hope to see you next time. Stay healthy and safe, everyone!

Source link