How to create ggplot labels in R

Labeling all or some of your data with text can help tell a story — even when your graph is using other cues like color and size. ggplot has a couple of built-in ways of doing this, and the ggrepel package adds some more functionality to those options. 

For this demo, I’ll start with a scatter plot looking at percentage of adults with at least a four-year college degree vs. known Covid-19 cases per capita in Massachusetts counties. (The theory: A college education might mean you’re more likely to have a job that lets you work safely from home. Of course there are plenty of exceptions, and many other factors affect infection rates.)

If you want to follow along, you can get the code to re-create my sample data on page 2 of this article.

Creating a scatter plot with ggplot

To start, the code below loads several libraries and sets scipen = 999 so I don’t get scientific notation in my graphs:

library(ggplot2)
library(ggrepel)
library(dplyr)
options(scipen = 999)

Here is the data structure for the ma_data data frame:

head(ma_data)
                Place AdultPop Bachelors PctBachelors CovidPer100K Positivity    Region
1          Barnstable   165336     70795    0.4281887          7.0     0.0188 Southeast
2           Berkshire    92946     31034    0.3338928          9.0     0.0095      West
3             Bristol   390230    109080    0.2795275         30.8     0.0457 Southeast
4 Dukes and Nantucket    20756      9769    0.4706591         25.3     0.0294 Southeast
5               Essex   538981    212106    0.3935315         29.5     0.0406 Northeast
6            Franklin    53210     19786    0.3718474          4.7     0.0052      West

The next group of code creates a ggplot scatter plot with that data, including sizing points by total county population and coloring them by region. geom_smooth() adds a linear regression line, and I also tweak a couple of ggplot design defaults. The graph is stored in a variable called ma_graph.

Source link