Learning Objectives

For more details and additional tutorials on using ggplot2, see the official documentation.

Load required packages.

library(dplyr)
library(ggplot2)

Load or reload NHTS data.

The read_csv() function has a handy na = c(...) input that allows us to quickly deal with the many, many definitions of missing that NHTS has.

nhts_per <- readr::read_csv("data/nhts_per.csv", 
                            na = c("", "NA", -1, -8, -5, -4)) %>%
  select(HOUSEID, HHSIZE, HHVEHCNT, TIMETOWK, YEARMILE, USEPUBTR)

ggplot2 Basics

We will make two plots using the ggplot2 package. ggplot2 is a plotting package that makes it simple to create complex plots from data in a dataframe. It uses default settings, which help create publication quality plots with a minimal amount of settings and tweaking.

ggplot2 graphics are built step by step by adding elements.

To build a ggplot we need to:

  1. Bind the plot to a specific data frame using the data argument.
ggplot(data = nhts_per)
  1. Define aesthetics (aes), which maps variables in the data to axes on the plot or to plotting size, shape color, etc.

Think of aes() like the data_frame() function we used before.

ggplot(data = nhts_per, aes(x = TIMETOWK, y = YEARMILE))
  1. Add geoms, which define the graphical representation of the data in the plot (points, lines, bars). To add a geom to the plot use + operator. Anything you put in the ggplot() function can be seen by any geom layers that you add.
ggplot(data = nhts_per, aes(x = TIMETOWK, y = YEARMILE)) +
  geom_point()
## Warning: Removed 136 rows containing missing values (geom_point).

  1. (optional) Add better labeling.
ggplot(data = nhts_per, aes(x = TIMETOWK, y = YEARMILE)) +
  geom_point() +
  labs(title = 'Yearly Mileage vs Commute Time',
       x = 'Time to Work (mins)',
       y = 'Yearly Mileage')
## Warning: Removed 136 rows containing missing values (geom_point).

  1. (optional) Add a theme if you want.
ggplot(data = nhts_per, aes(x = TIMETOWK, y = YEARMILE)) +
  geom_point() +
  labs(title = 'Yearly Mileage vs Commute Time',
       x = 'Time to Work (mins)',
       y = 'Yearly Mileage') +
  theme_bw()
## Warning: Removed 136 rows containing missing values (geom_point).

Sometimes adding color and transparency adds meaning to the plot. In this case, since we limited the number of rows of the NHTS data to 200, there is not much overlap. But with a lot of points, alpha can be very helpful.

ggplot(data = nhts_per, aes(x = TIMETOWK, y = YEARMILE)) +
  geom_point(alpha = 0.3, color = "blue") +
  labs(title = 'Yearly Mileage vs Commute Time',
       x = 'Time to Work (mins)',
       y = 'Yearly Mileage') +
  theme_bw()
## Warning: Removed 136 rows containing missing values (geom_point).

Barplot

Let’s try a barplot.

ggplot(data = nhts_per, aes(x = HHVEHCNT)) +
  geom_bar() 

But this looks a bit funny right? What if, for example, NHTS coded “8” to actually be any household with 8 or more vehicles. We can instead treat this variable as categories, which in R are called factors.

ggplot(data = nhts_per, aes(x = as.factor(HHVEHCNT))) +
  geom_bar() +
  scale_fill_brewer()

Then maybe we want to get even fancier and see how household size plays into this distribution.

ggplot(data = nhts_per, aes(x = as.factor(HHVEHCNT), 
                            fill = as.factor(HHSIZE))) +
  geom_bar()

And maybe we don’t like those colors, so we’ll change them.

ggplot(data = nhts_per, aes(x = as.factor(HHVEHCNT), 
                            fill = as.factor(HHSIZE))) +
  geom_bar() +
  scale_fill_brewer()

Faceting

ggplot2 has a special technique called faceting that allows to split one plot into multiple plots based on some factor. Let’s take the same graph and add another dimension. Let’s look at household size and number of vehicles BY transit useage.

ggplot(data = nhts_per, aes(x = as.factor(HHVEHCNT), 
                            fill = as.factor(HHSIZE))) +
  geom_bar() +
  scale_fill_brewer() +
  facet_grid(. ~ USEPUBTR)

Challenge

Update the labeling so that people can understand the plot.

ggplot(data = nhts_per, aes(x = as.factor(HHVEHCNT), 
                            fill = as.factor(HHSIZE))) +
  geom_bar() +
  scale_fill_brewer() +
  facet_grid(. ~ USEPUBTR) +
  labs(title = 'Household Size by Number of Vehicles by Transit Use',
       x = 'Household Size',
       y = 'Count') +
  guides(fill = guide_legend(title = "Num of Vehs"))

Enjoy plotting with ggplot2! Use Google search, other online tutorials, and R’s help functionality to teach yourself more.