How to count by group in R

Counting by multiple groups — sometimes called crosstab reports — can be a useful way to look at data ranging from public opinion surveys to medical tests. For example, how did people vote by gender and age group? How many software developers who use both R and Python are men vs. women?

There are a lot of ways to do this kind of counting by categories in R. Here, I’d like to share some of my favorites.

For the demos in this article, I’ll use a subset of the Stack Overflow Developers survey, which surveys developers on dozens of topics ranging from salaries to technologies used. I’ll whittle it down with columns for languages used, gender, and if they code as a hobby. I also added my own LanguageGroup column for whether a developer reported using R, Python, both, or neither.

If you’d like to follow along, the last page of this article has instructions on how to download and wrangle the data to get the same data set I’m using.

The data has one row for each survey response, and the four columns are all characters.

str(mydata)
'data.frame':	83379 obs. of  4 variables:
 $ Gender            : chr  "Man" "Man" "Man" "Man" ...
 $ LanguageWorkedWith: chr  "HTML/CSS;Java;JavaScript;Python" "C++;HTML/CSS;Python" "HTML/CSS" "C;C++;C#;Python;SQL" ...
 $ Hobbyist          : chr  "Yes" "No" "Yes" "No" ...
 $ LanguageGroup     : chr  "Python" "Python" "Neither" "Python" ...

I filtered the raw data to make the crosstabs more manageable, including removing missing values and taking the two largest genders only, Man and Woman.

Source link