base_plot <- ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
geom_point()
base_plot
Tuesday September 30, 2025 at 8:37 AM
Yep. Leland Wilkinson—the guy who invented the idea of the grammar of graphics and wrote the literal The Grammar of Graphics book—was actually a semi-cofounder of Tableau. As a result, Tableau uses the same grammar of graphics idea—you map data to aesthtics and show it with geoms (though they don’t call things geoms there).
So why R and not Tableau then?
Two main reasons: (1) R is free and you’ll be able to use it after you graduate, and (2) using code to make graphics makes those graphics reproducible and more easily changeable. If you want to recreate a Tableau visualization, you have to document all the things you click on; if you want to recreate someone else’s Tableau visualization, you have to reverse engineer it by clickingn around a bunch. With code-based graphics, once you have a good foundation in R, you can copy, paste, and adapt to make your plots, make modified versions of your plots, or borrow others’ code.
Also, Tableau doesn’t have the huge community of developers that R and ggplot have. There’s no {gghalves} or {ggridges} or any of the different gg-related packages that R has. Go to the Packages panel in RStudio, then click on “Install packages” and start typing “gg”, you’ll see a ton of ggplot-related packages. I have no idea what most of these do (you can see {ggbeeswarm} there!), but they do something with ggplot. One of those packages is called {ggbrain}—it lets you plot brain images using ggplot. {ggarchery} sounds neat—it lets you make nicer arrows for annotating things.

Tableau doesn’t have this kind of broader extension community.
After this class, you’ll be nice and comfortable with the grammar of graphics and should be able to quickly pick up Tableau or other plotting libraries that use the idea of aesthetics and geoms (like Python’s plotnine or seaborn.objects, or Javascript’s Observable Plot).
Knowing ggplot will make you a better Tableau user.
ggsave() instead of just rendering a Quarto document?Over the course of the semester, you’ve been rendering PDFs and Word files, and your plots have appeared in those documents, mixed in with the text and code and output. That’s all fine and great.
Sometimes, though, you need a file that is only the plot, not all the other text. You’ll use these standalone images in Session 10, Mini Project 2, and the final project. In real life, you’ll often need to post a PNG version of a graph on a website or on social media. To create a standalone image, you use ggsave().
theme() but when I used ggsave(), none of them actually saved. Why?This is a really common occurrence—don’t worry! And it’s easy to fix!
In the code I gave you in exercise 5, you stored the results of ggplot() as an object named base_plot, like this (this isn’t the same data as exercise 5, but shows the same general principle):
That base_plot object contains the basic underlying plot that you wanted to adjust. You then used it with {ggThemeAssist} to make modifications, something like this:

That’s great and nice and ugly and it displays in your document just fine. If you then use ggsave() like this:
…you’ll see that it actually doesn’t save all the theme() changes. That’s because it’s saving the base_plot object, which is just the underlying base plot before adding theme changes.
If you want to keep the theme changes you make, you need to store them in an object, either overwriting the original base_plot object, or creating a new object:
theme() and then wiped them out with theme_minimal()?Yes, the order matters! The order doesn’t matter within theme(), but you can accidentally undo all your theme adjustments if you’re not careful.
For instance, here I make a bunch of changes to the theme, like making the title bold, bigger, and red, and moving the caption to be left-aligned:

Then later I decide that I want to use one of the built-in themes like theme_bw() to quickly get rid of the gray background:

Oh no! The red title and left-aligned caption are gone! What happened?
The built-in themes like theme_gray() (the default), theme_bw(), theme_minimal(), and so on are really collections of a bunch of different theme presets. You can actually see what all the settings are if you run the function without the parentheses. Notice how theme_bw() is really just theme_grey() with some extra settings, like making panel.background white:
theme_bw
## function (base_size = 11, base_family = "", base_line_size = base_size/22,
## base_rect_size = base_size/22)
## {
## theme_grey(base_size = base_size, base_family = base_family,
## base_line_size = base_line_size, base_rect_size = base_rect_size) %+replace%
## theme(panel.background = element_rect(fill = "white",
## colour = NA), panel.border = element_rect(fill = NA,
## colour = "grey20"), panel.grid = element_line(colour = "grey92"),
## panel.grid.minor = element_line(linewidth = rel(0.5)),
## strip.background = element_rect(fill = "grey85",
## colour = "grey20"), complete = TRUE)
## }
## <bytecode: 0x12542b190>
## <environment: namespace:ggplot2>If you do this to your plot:
…it will modify the title and caption as expected, but then theme_bw() will overwrite all those changes with its own presets.
If you want to use a built-in theme like theme_bw() and make other modifications, the order matters. Use the built-in theme first, then make specific changes with theme():
No!
Recall from the lesson for session 5 that you can store all the theme settings as a separate object. If, for instance, I want to use the combination of theme_bw() with a red title and left-aligned caption like the question above, I can store those settings as an object first:
And then I can add my_cool_theme to any other plot:

That’s a totally normal and common approach to all this.
theme_set()Another thing you can do is use theme_set(), which will change the default setting for all plots in the session. You can include it up near the top of your document after you load your packages.
See this plot? There’s no extra theme layer at the end because it was set with theme_set():

Any plots you make will now use theme_bw() automatically and you don’t have to add it yourself.
You can even pass it a theme object like my_cool_theme:

Now any plots you make will use those my_cool_theme settings automatically:
I do this all the time. See this blog post, for example, where I create a nicer theme that I creatively call theme_nice() and then use theme_set() to use it automatically for all the plots:
# Download Mulish from https://fonts.google.com/specimen/Mulish
theme_nice <- function() {
theme_minimal(base_family = "Mulish") +
theme(
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "white", color = NA),
plot.title = element_text(face = "bold"),
axis.title = element_text(face = "bold"),
strip.text = element_text(face = "bold"),
strip.background = element_rect(fill = "grey80", color = NA),
legend.title = element_text(face = "bold")
)
}
theme_set(theme_nice())rel(1.4) instead of actual numbers when sizing things?A lot of you wondered about this! When changing the size of text elements like plot titles, axis labels, and so on, you use the size argument in element_text(), like this:

Here both the title and caption are sized at 18 points (just like choosing 18 point font in Word or Google Docs). It’s totally valid to set font sizes with actual numbers.
However, I rarely do this. Instead, I use rel(...) to size things so that it’s easier to resize plots more dynamically.
All of the built in theme_*() functions like theme_minimal() and friends have a base_size argument that defaults to 11, meaning that the base font size throughout the plot is 11 points. You can change that to other values, like really big text:

or really small text:

Notice how all the text elements shrink and grow in relation to the base_size. That’s because the text elements use relative sizing instead of absolute sizing. If you run theme_grey by itself in the console, you’ll see all of the exact default theme settings:
theme_grey
#> lots of output
#> ...
#> plot.title = element_text(size = rel(1.2), hjust = 0,
#> vjust = 1, margin = margin(b = half_line)), plot.title.position = "panel",
#> plot.subtitle = element_text(hjust = 0, vjust = 1, margin = margin(b = half_line)),
#> plot.caption = element_text(size = rel(0.8), hjust = 1,
#> vjust = 1, margin = margin(t = half_line)), plot.caption.position = "panel",
#> ...Notice that plot.title is sized to rel(1.2) and plot.caption is sized to rel(0.8). That means that the title font size will be 1.2 * base_size and the caption font size will be 0.8 * base_size. With a base size of 11 points, the title will be 13.2 points and the caption will be 8.8 points. If the base size is scaled up to 20, the title will be 1.2 * 20, or 24 points; if the base size is scaled down to 6, the title will be 1.2 * 6, or 7.2 points.
If you use exact numbers for elements and then change base_size, the elements with absolute values will not scale up or down. Like here, if we want tiny text (base_size = 6), the title and caption will still be massive at 18 points:

So instead of working with absolute numbers, I almost always use relative values. If you search my code on GitHub, you’ll see a ton of examples where I’ve done this.
theme()?theme() lets you adjust how parts of your plot look, but it doesn’t let you adjust the actual content. If you want to change the labels and titles, use labs():

If you want to adjust the facet titles, you have to make a new column in the data. Facet titles come from the actual data itself. Like here, it says “4”, “f”, and “r”, which are not very helpful:

It’s using those labels because that’s what’s in the data
mpg |>
select(model, displ, year, drv)
## # A tibble: 234 × 4
## model displ year drv
## <chr> <dbl> <int> <chr>
## 1 a4 1.8 1999 f
## 2 a4 1.8 1999 f
## 3 a4 2 2008 f
## 4 a4 2 2008 f
## 5 a4 2.8 1999 f
## 6 a4 2.8 1999 f
## 7 a4 3.1 2008 f
## 8 a4 quattro 1.8 1999 4
## 9 a4 quattro 1.8 1999 4
## 10 a4 quattro 2 2008 4
## # ℹ 224 more rowsTo adjust those, make a new column with better values:
mpg_nice <- mpg |>
mutate(drv_nice = case_match(drv,
"4" ~ "Four-wheel drive",
"f" ~ "Front-wheel drive",
"r" ~ "Rear-wheel drive"
))
mpg_nice |>
select(model, displ, year, drv, drv_nice)
## # A tibble: 234 × 5
## model displ year drv drv_nice
## <chr> <dbl> <int> <chr> <chr>
## 1 a4 1.8 1999 f Front-wheel drive
## 2 a4 1.8 1999 f Front-wheel drive
## 3 a4 2 2008 f Front-wheel drive
## 4 a4 2 2008 f Front-wheel drive
## 5 a4 2.8 1999 f Front-wheel drive
## 6 a4 2.8 1999 f Front-wheel drive
## 7 a4 3.1 2008 f Front-wheel drive
## 8 a4 quattro 1.8 1999 4 Four-wheel drive
## 9 a4 quattro 1.8 1999 4 Four-wheel drive
## 10 a4 quattro 2 2008 4 Four-wheel drive
## # ℹ 224 more rowsNow you can use drv_nice instead of drv by plotting mpg_nice instead of mpg:
This is a weird little quirk about fonts on a Mac. See here for full details.
The short version of how to fix it is to tell R and Quarto to use the Cairo PDF rendering program when creating a PDF. Cairo supports custom fonts, while R’s default PDF rendering program does not.
Add this to the metadata section of your Quarto file to get it working:
If you’re trying to save a PDF with ggsave(), you can specify the Cairo engine with the device argument:
Absolutely! We don’t have time in this class to cover tables, but there’s a whole world of packages for making beautiful tables with R. Four of the more common ones are {tinytable}, {gt}, {kableExtra}, and {flextable}:
| Output support | |||||
|---|---|---|---|---|---|
| Package | HTML | Word | Notes | ||
| {tinytable} | Great | Great | Okay | Examples | Simple, straightforward, and lightweight. It has fantastic support for HTML and it has the absolute best support for PDF, both with Typst and LaTeX. |
| {gt} | Great | Okay | Okay | Examples | Has the goal of becoming the “grammar of tables” (hence “gt”). It is supported by developers at Posit and gets updated and improved regularly. |
| {flextable} | Great | Okay | Great | Examples | Works really well for HTML output and has the best support for Word output. |
| {kableExtra} | (Once) Great |
(Once) Great |
Okay | Examples | Worked really well for HTML output and had great support for PDF output, but development has stalled for the past few years and it seems to be abandoned, which is sad. |
Here’s a quick illustration of these four packages. All four are incredibly powerful and let you do all sorts of really neat formatting things ({gt} even makes interactive HTML tables!), so make sure you check out the documentation and examples. I personally use {tinytable} and {gt} for all my tables, depending on which output I’m working with. When rendering to HTML, I use {tinytable} or {gt}; when rendering to PDF I use {tinytabe}; when rendering to Word I use {flextable}.
library(tinytable)
cars_summary |>
select(
Drive = drv, N = n, Average = avg_mpg, Median = median_mpg,
Minimum = min_mpg, Maximum = max_mpg
) |>
tt() |>
group_tt(
i = list("1999" = 1, "2008" = 4),
j = list("Highway MPG" = 3:6)
) |>
format_tt(j = 3, digits = 4) |>
style_tt(i = c(1, 5), bold = TRUE, line = "b", line_width = 0.1, line_color = "#dddddd") |>
style_tt(j = 2:6, align = "c")| Highway MPG | |||||
|---|---|---|---|---|---|
| Drive | N | Average | Median | Minimum | Maximum |
| 1999 | |||||
| 4 | 49 | 18.84 | 17 | 15 | 26 |
| f | 57 | 27.91 | 26 | 21 | 44 |
| r | 11 | 20.64 | 21 | 16 | 26 |
| 2008 | |||||
| 4 | 54 | 19.48 | 19 | 12 | 28 |
| f | 49 | 28.45 | 29 | 17 | 37 |
| r | 14 | 21.29 | 21 | 15 | 26 |
library(gt)
cars_summary |>
gt() |>
cols_label(
drv = "Drive",
n = "N",
avg_mpg = "Average",
median_mpg = "Median",
min_mpg = "Minimum",
max_mpg = "Maximum"
) |>
tab_spanner(
label = "Highway MPG",
columns = c(avg_mpg, median_mpg, min_mpg, max_mpg)
) |>
fmt_number(
columns = avg_mpg,
decimals = 2
) |>
tab_options(
row_group.as_column = TRUE
)| year | Drive | N |
Highway MPG
|
|||
|---|---|---|---|---|---|---|
| Average | Median | Minimum | Maximum | |||
| 1999 | 4 | 49 | 18.84 | 17 | 15 | 26 |
| 1999 | f | 57 | 27.91 | 26 | 21 | 44 |
| 1999 | r | 11 | 20.64 | 21 | 16 | 26 |
| 2008 | 4 | 54 | 19.48 | 19 | 12 | 28 |
| 2008 | f | 49 | 28.45 | 29 | 17 | 37 |
| 2008 | r | 14 | 21.29 | 21 | 15 | 26 |
Highway MPG
|
|||||
|---|---|---|---|---|---|
| Drive | N | Average | Median | Minimum | Maximum |
| 1999 | |||||
| 4 | 49 | 18.84 | 17 | 15 | 26 |
| f | 57 | 27.91 | 26 | 21 | 44 |
| r | 11 | 20.64 | 21 | 16 | 26 |
| 2008 | |||||
| 4 | 54 | 19.48 | 19 | 12 | 28 |
| f | 49 | 28.45 | 29 | 17 | 37 |
| r | 14 | 21.29 | 21 | 15 | 26 |
library(flextable)
cars_summary |>
rename(
"Year" = year,
"Drive" = drv,
"N" = n,
"Average" = avg_mpg,
"Median" = median_mpg,
"Minimum" = min_mpg,
"Maximum" = max_mpg
) |>
mutate(Year = as.character(Year)) |>
flextable() |>
colformat_double(j = "Average", digits = 2) |>
add_header_row(values = c(" ", "Highway MPG"), colwidths = c(3, 4)) |>
align(i = 1, part = "header", align = "center") |>
merge_v(j = ~ Year) |>
valign(j = 1, valign = "top")
| Highway MPG | |||||
|---|---|---|---|---|---|---|
Year | Drive | N | Average | Median | Minimum | Maximum |
1999 | 4 | 49 | 18.84 | 17 | 15 | 26 |
f | 57 | 27.91 | 26 | 21 | 44 | |
r | 11 | 20.64 | 21 | 16 | 26 | |
2008 | 4 | 54 | 19.48 | 19 | 12 | 28 |
f | 49 | 28.45 | 29 | 17 | 37 | |
r | 14 | 21.29 | 21 | 15 | 26 | |
You can also create more specialized tables for specific situations, like side-by-side regression results tables with {modelsummary} (which uses {gt}, {kableExtra}, or {flextable} behind the scenes)
library(modelsummary)
model1 <- lm(hwy ~ displ, data = mpg)
model2 <- lm(hwy ~ displ + drv, data = mpg)
modelsummary(
list(model1, model2),
stars = TRUE,
# Rename the coefficients
coef_rename = c(
"(Intercept)" = "Intercept",
"displ" = "Displacement",
"drvf" = "Drive (front)",
"drvr" = "Drive (rear)"),
# Get rid of some of the extra goodness-of-fit statistics
gof_omit = "IC|RMSE|F|Log",
# Use {tinytable}
output = "tinytable"
)| (1) | (2) | |
|---|---|---|
| + p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 | ||
| Intercept | 35.698*** | 30.825*** |
| (0.720) | (0.924) | |
| Displacement | -3.531*** | -2.914*** |
| (0.195) | (0.218) | |
| Drive (front) | 4.791*** | |
| (0.530) | ||
| Drive (rear) | 5.258*** | |
| (0.734) | ||
| Num.Obs. | 234 | 234 |
| R2 | 0.587 | 0.736 |
| R2 Adj. | 0.585 | 0.732 |
As you’ve read, double encoding aesthetics can be helpful for accessibility and printing reasons—for instance, if points have colors and shapes, they’re still readable by people who are colorblind or if the image is printed in black and white:
Sometimes the double encoding can be excessive though, and you can safely remove legends. For example, in exercises 3 and 4, you made bar charts showing counts of different things (words spoken in The Lord of the Rings; pandemic-era construction projects in New York City), and lots of you colored the bars, which is great!

Car drive here is double encoded: it’s on the x-axis and it’s the fill. That’s great, but having the legend here is actually a little excessive. Both the x-axis and the legend tell us what the different colors of drives are (four-, front-, and rear-wheeled drives), so we can safely remove the legend and get a little more space in the plot area:
Yes! Later in the semester we’ll cover annotations, but in the meantime, you can check out a couple packages that let you directly label geoms that have been mapped to aesthetics.
The {geomtextpath} package lets you add labels directly to paths and lines with functions like geom_textline() and geom_labelline() and geom_labelsmooth().
Like, here’s the relationship between penguin bill lengths and penguin weights across three different species:
# This isn't on CRAN, so you need to install it by running this:
# remotes::install_github("AllanCameron/geomtextpath")
library(geomtextpath)
library(palmerpenguins) # Penguin data
# Get rid of the rows that are missing sex
penguins <- penguins |> drop_na(sex)
ggplot(
penguins,
aes(x = bill_length_mm, y = body_mass_g, color = species)
) +
geom_point(alpha = 0.5) + # Make the points a little bit transparent
geom_labelsmooth(
aes(label = species),
# This spreads the letters out a bit
text_smoothing = 80
) +
# Turn off the legend bc we don't need it now
guides(color = "none")
And the average continent-level life expectancy across time:
library(gapminder)
gapminder_lifeexp <- gapminder |>
group_by(continent, year) |>
summarize(avg_lifeexp = mean(lifeExp))
ggplot(
gapminder_lifeexp,
aes(x = year, y = avg_lifeexp, color = continent)
) +
geom_textline(
aes(label = continent, hjust = continent),
linewidth = 1, size = 4
) +
guides(color = "none")
A new package named {ggdirectlabel} lets you add legends directly to your plot area:
# This also isn't on CRAN, so you need to install it by running this:
# remotes::install_github("MattCowgill/ggdirectlabel")
library(ggdirectlabel)
ggplot(
penguins,
aes(x = bill_length_mm, y = body_mass_g, color = species)
) +
geom_point(alpha = 0.5) +
geom_smooth() +
geom_richlegend(
aes(label = species), # Use the species as the fake legend labels
legend.position = "topleft", # Put it in the top left
hjust = 0 # Make the text left-aligned (horizontal adjustment, or hjust)
) +
guides(color = "none")
Sometimes when you have to create a grayscale plot (like for a document that will only be printed in black and white), it’s helpful to fill areas with patterns (stripes, dots, squares, etc.) instead of colors.
You can do this with ggplot plots with the {ggpattern} package:

There are so many examples of things you can do at {ggpattern}’s documentation site (use the “Articles” link in the top navigation bar there). You can use images as patterns:

I use it here to fill things that show the differences between categories. Like “Often” is blue, “Sometimes” is yellow, and “Rarely” is red, so these densities show the differences between those categories (often minus sometimes is blue and yellow, etc.):

You can even animate the patterns. Play around with it—it’s neat!