Data Visualization

Yebelay Berehan

Center for Evaluation and Development (C4ED)



2026-06-11

Roadmap

  1. Foundations: what data visualization is and what to ask before you plot
  2. The grammar of graphics: the building blocks of ggplot2
  3. Hands-on: building a plot layer by layer with the palmerpenguins data
  4. Customization: axes, titles, legends, themes, and backgrounds
  5. Color: palettes for discrete and continuous data, colorblind-friendly choices
  6. Annotation: reference lines, text, and labels
  7. Coordinates, scales, and smoothing
  8. Facets and multi-panel figures
  9. A gallery of geoms: bar, histogram, boxplot, violin, line
  10. Interactive graphics and saving your plots
  11. From beginner to advanced: principles, chart choice, and a guided learning path

1. Foundations

What is data visualization?

  • Data visualization is the presentation of data in a pictorial or graphical format; a data visualization tool is the software that generates this presentation.
  • Effective data visualization gives users intuitive means to:
    • interactively explore and analyze data,
    • identify interesting patterns,
    • infer correlations and causalities, and
    • support sense-making activities.

Good visual presentation enhances the message: the same data can inform or mislead depending on how it is shown.

Ask before you plot

The effectiveness of a visualization depends on a few key questions:

  • What would you like to communicate?
  • Who is your audience? Researchers? Journalists? The general public? Grant reviewers?
  • How is your message best represented?
    • Is it through a box plot or a scatter plot?
    • Should you use blue or red?
    • What scale should you use?
    • Should you add or remove information?

These choices drive the key principles, methods, and concepts needed to visualize data for publications, reports, or presentations.

The R visualization toolbox

  • ggplot2 grammar of graphics
  • cowplot for composing ggplots
  • ggforce visual data investigations
  • ggrepel for nice text labeling
  • ggridges for ridge plots
  • ggsci for nice color palettes
  • ggtext for advanced text rendering
  • ggthemes for additional themes
  • grid for creating graphical objects
  • gridExtra additional grid functions
  • patchwork for multi-panel plots
  • prismatic for manipulating colors
  • rcartocolor for great color palettes
  • scico perceptually uniform palettes
  • showtext for custom fonts
  • echarts4r interactive visualizations
  • ggiraph interactive visualizations
  • highcharter interactive visualizations
  • plotly interactive visualizations
Code
# install CRAN packages
install.packages(
  c("ggplot2", "tibble", "tidyr", "forcats", "purrr", "prismatic", "corrr",
    "cowplot", "ggforce", "ggrepel", "ggridges", "ggsci", "ggtext", "ggthemes",
    "grid", "gridExtra", "patchwork", "rcartocolor", "scico", "showtext",
    "shiny", "plotly", "highcharter", "echarts4r"))

2. The grammar of graphics

Why ggplot2?

  • ggplot2 is a system for declaratively creating graphics, based on the Grammar of Graphics: a grammar used to describe and create a wide range of statistical graphics.
  • You provide the data, tell ggplot2 how to map variables to aesthetics, and which graphical primitives to use; it takes care of the details.
  • Graphs are composed of layers, so it is easy to add elements to existing graphs.
  • Plots are easy to manage, reproduce, and save.
  • Less work is needed to make beautiful, eye-catching, publication-quality graphics.

The building blocks of a ggplot

  1. Data: the raw data that you want to plot.
  2. Geometries geom_*(): the geometric shapes that represent the data.
  3. Aesthetics aes(): properties of the geometric and statistical objects, such as position, color, size, shape, and transparency.
  4. Scales scale_*(): maps between the data and the aesthetic dimensions, such as data range to plot width or factor values to colors.
  5. Statistical transformations stat_*(): statistical summaries of the data, such as quantiles, fitted curves, and sums.
  6. Coordinate system coord_*(): the transformation used for mapping data coordinates into the plane of the data rectangle.
  7. Facets facet_*(): the arrangement of the data into a grid of plots.
  8. Visual themes theme(): the overall visual defaults of a plot, such as background, grids, axes, default typeface, sizes, and colors.

Components of the layered grammar

  • Layer
    • Data
    • Mapping
    • Statistical transformation (stat)
    • Geometric object (geom)
    • Position adjustment (position)
  • Scale
  • Coordinate system (coord)
  • Faceting (facet)

“Source: BloggoType”

Data and aesthetics

Code
# data and aesthetics
ggplot(data, mapping = aes(x, y, ...))
  • Data must be a data.frame (or tibble) and gets pulled into the ggplot() object.
  • x, y: variables
  • colour: colors the lines of geometries
  • fill: fill color of geometries
  • group: groups based on the data
  • shape: shape of point, an integer value 0 to 24, or NA
  • linetype: type of line, an integer value 0 to 6, or a string
  • size: sizes of elements, a non-negative numeric value
  • alpha: changes the transparency, a numeric value 0 to 1

“shape values”

“line type values”

Geometries: geom_*() functions

The general syntax is ggplot(data, aes(mappings)) + geom_function().

Geom Description Input
geom_histogram Histograms Continuous x
geom_bar Bar plot with frequencies Discrete x
geom_point Points/scatter plots Discrete/continuous x and y
geom_boxplot Box plot Discrete x and continuous y
geom_smooth Fitted line based on data
geom_line Line plots Discrete/continuous x and y
geom_abline Reference line Intercept and slope value
geom_hline / geom_vline Reference lines yintercept or xintercept

Position adjustments

  • geom_bar(position = "<position>")
  • When aesthetics are mapped, how are the elements positioned?
  • Bar: dodge, fill, stack (default)
  • Point: jitter

Statistics and coordinates

  • Statistics (stat_*()) are computed on the data.
    • stat_*() functions perform computations such as means, counts, linear models, and other statistical summaries of data.
  • Coordinates (coord_*()) establish the rules used to print the data:
    • coord_cartesian() for the Cartesian plane,
    • coord_polar() for circular plots,
    • coord_map() for different map projections.
  • Themes (theme()) control the overall appearance of the plot; ggplot2 ships with several predefined themes (covered in Section 4).

Building a plot, step by step

  1. Create a simple plot object: plot.object <- ggplot()
  2. Add geometric layers: plot.object <- plot.object + geom_*()
  3. Add appearance layers: plot.object <- plot.object + coord_*() + theme()
  4. Repeat steps 2 and 3 until satisfied, then print: plot.object or print(plot.object)

3. Hands-on with penguins

The palmerpenguins data

  • We will practice with the palmerpenguins data set: size measurements for three penguin species observed on three islands in the Palmer Archipelago, Antarctica.
  • This data set is often used to replace the iris data set, which has some problems for teaching data science, including its ties to eugenics.
Code
library(palmerpenguins)
data(penguins)
  • species, island, and sex are factor variables.
  • The bill measurements depicted in the image are numeric variables.
  • Flipper length and body mass are integer variables.

  • ggplot2 requires the data as an object of class data.frame or tibble (common in the tidyverse). More complex plots also require the long data frame format.
Code
library(tibble)
class(penguins) # all set!
[1] "tbl_df"     "tbl"        "data.frame"
Code
peng <- as_tibble(penguins) # acceptable
class(peng)
[1] "tbl_df"     "tbl"        "data.frame"

Scientific questions

How can we graphically address these questions with ggplot2?

  • Is there a relationship between the length and the depth of bills?
  • Does the size of the bill and flipper vary together?
  • How are these measures distributed among the three penguin species?

Building the plot layer by layer

Code
ggplot(data = penguins)

Code
ggplot(data = penguins, aes(x = bill_length_mm, y = bill_depth_mm))

Code
ggplot(data = penguins,
       aes(x = bill_length_mm, y = bill_depth_mm)) +
  geom_point()

Code
ggplot(data = penguins,
       aes(x = bill_length_mm, y = bill_depth_mm)) +
  geom_point() + facet_wrap(~species) +
  coord_trans(x = "log10", y = "log10")

Code
ggplot(data = penguins,               # Data
       aes(x = bill_length_mm,        # Your X-value
           y = bill_depth_mm,         # Your Y-value
           col = species)) +          # Aesthetics
  geom_point(size = 5, alpha = 0.8) + # Point
  geom_smooth(method = "lm")          # Linear regression

A reusable base plot

To keep the customization examples short, we store two base plots and add layers to them throughout the rest of the training:

Code
# color mapped to a categorical variable (species)
p <- ggplot(penguins,
            aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point() +
  labs(x = "Bill length (mm)", y = "Bill depth (mm)")

# color mapped to a continuous variable (body mass)
p2 <- ggplot(penguins,
             aes(x = bill_length_mm, y = bill_depth_mm, color = body_mass_g)) +
  geom_point() +
  labs(x = "Bill length (mm)", y = "Bill depth (mm)")

Every slide that follows builds on p (discrete color) or p2 (continuous color), so you can focus on the one new function being demonstrated.

4. Customization: axes, titles, legends, themes

Anatomy of a themed plot

Customizing plots means adjusting elements to improve readability, presentation, and informativeness. Title and axis components can change in size, color, and face:

Axis titles

labs() modifies plot labels, including the x-axis, y-axis, and plot title:

Code
p + labs(x = "Bill length in millimeters", y = "Bill depth in millimeters")

xlab() and ylab() set the x-axis and y-axis labels individually:

Code
p + xlab("Bill length (mm)") + ylab("Bill depth (mm)")

The name argument of scale functions also sets axis labels:

Code
p + scale_x_continuous(name = "New X Axis Label") +
  scale_y_continuous(name = "New Y Axis Label")

expression() allows mathematical expressions, Greek letters, superscripts, and subscripts in labels:

Code
p + labs(x = expression(paste("X Axis Label with ", mu^2, " and ", sigma)))

Styling axis titles with theme()

element_text() inside theme() sets text properties such as size, color, and font face:

Code
p + theme(axis.title = element_text(size = 15, face = "italic"))

The face argument can be bold, italic, or bold.italic:

Code
p + theme(axis.title = element_text(color = "sienna", size = 15, face = "bold"),
          axis.title.y = element_text(face = "bold.italic"))

vjust controls the vertical alignment, typically ranging between 0 and 1, but it can extend beyond this range:

Code
p + theme(axis.title.x = element_text(vjust = 0, size = 15),
          axis.title.y = element_text(vjust = 2, size = 15))

Use margin() with parameters t (top) and r (right) to add distance between the axis and its title. For the y-axis, change the right margin, not the bottom margin:

Code
p + theme(axis.title.x = element_text(margin = margin(t = 10), size = 15),
          axis.title.y = element_text(margin = margin(r = 10), size = 15))

Axis text and ticks

axis.text (and axis.text.x / axis.text.y) modify the appearance of the axis numbers:

Code
p + theme(axis.text = element_text(color = "dodgerblue", size = 12),
          axis.text.x = element_text(face = "italic"))

angle, hjust, and vjust rotate and position any text element (hjust: 0 = left, 1 = right; vjust: 0 = top, 1 = bottom):

Code
p + theme(axis.text.x = element_text(angle = 50, vjust = 1, hjust = 1, size = 12))

element_blank() removes axis text and ticks entirely:

Code
p + theme(axis.ticks.y = element_blank(), axis.text.y = element_blank())

Remove axis titles by setting them to NULL or empty quotes in labs():

Code
p + labs(x = NULL, y = "")

Tip: NULL removes the element, while empty quotes " " keep the space for the axis title but print nothing.

Axis limits

  • xlim() and ylim() limit the axis range:
Code
p + ylim(c(0, 20))
  • Alternatively, use scale_y_continuous(limits = c(0, 20)) or coord_cartesian(ylim = c(0, 20)). The former removes data points outside the range, while the latter zooms without removing data points.

Plot titles

Three functions work together to customize titles:

  1. ggtitle(): sets the text for the main title, for example ggtitle("Main Title").
  2. labs(): sets title, subtitle, caption, and tag in one call.
  3. theme(): styles each element via plot.title, plot.subtitle, plot.caption, and plot.tag, using element_text(face, size, family, hjust, vjust, margin, lineheight).
Code
theme(plot.title    = element_text(face = "bold", size = 14, hjust = 0.5),
      plot.subtitle = element_text(size = 12, hjust = 0.5),
      plot.caption  = element_text(size = 10, hjust = 0),
      plot.tag      = element_text(size = 8,  hjust = 1))

Plot titles in practice

Code
p + labs(title = "Relationship between bill length and depth",
         subtitle = "for different penguin species",
         caption = "scatter plot", tag = "Fig. 1")

Code
p + labs(title = "Relationship between bill length and depth") +
  theme(plot.title = element_text(face = "bold",
                                  margin = margin(10, 0, 10, 0), size = 14))

Code
library(showtext)
font_add_google("Playfair Display", "Playfair")
font_add_google("Bangers", "Bangers")
showtext_auto()
p + labs(title = "Relationship between bill length and depth") +
  theme(plot.title = element_text(family = "Bangers", hjust = 0.5, size = 25),
        plot.subtitle = element_text(family = "Playfair", hjust = 0.5, size = 15))

Code
p + ggtitle("Relationship between bill length and depth across different \n
            species using scatter plot") +
  theme(plot.title = element_text(lineheight = 0.8, size = 16))

Legends: the default

ggplot2 adds a legend by default when a variable is mapped to an aesthetic. The default legend title is the variable specified in the color argument:

Code
p

Removing the legend

Three equivalent ways to turn the legend off:

Code
p + theme(legend.position = "none")

Code
p + guides(color = "none")

Code
p + scale_color_discrete(guide = "none")

Legend titles

theme(legend.title = element_blank()) removes the legend title; setting the name to NULL via scale_color_discrete(name = NULL) or labs(color = NULL) achieves the same:

Code
p + theme(legend.title = element_blank())

Three equivalent ways: labs(color = "new title"), scale_color_discrete(name = "new title"), or guides(color = guide_legend("new title")):

Code
p + labs(color = "species\nindicated\nby colors:") +
  theme(legend.title = element_text(family = "Playfair",
                                    color = "blue", size = 14, face = "bold"))

theme(legend.title = element_text(family, color, size, face)) styles the title:

Code
p + theme(legend.title = element_text(family = "Playfair",
                                      color = "chocolate", size = 14, face = "bold"))

Legend position and direction

Code
p + theme(legend.position = "top")

A coordinate pair places the legend inside the panel; a transparent background keeps it unobtrusive:

Code
p + theme(legend.position = c(.15, .15),
          legend.background = element_rect(fill = "transparent"))

guides(color = guide_legend(direction = "horizontal")) switches the key layout from vertical to horizontal.

Legend keys and labels

Reorder the underlying factor to reorder the legend:

Code
penguins1 <- penguins %>%
  mutate(species = factor(species, levels = c("Chinstrap", "Gentoo", "Adelie")))
ggplot(penguins1, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point() + labs(x = "Bill length (mm)", y = "Bill depth (mm)")

scale_color_discrete(name, labels) renames the keys:

Code
p + scale_color_discrete(name = "species:",
                         labels = c("Adelie type", "Chinstrap type", "Gentoo type"))

legend.key changes the background boxes; override.aes changes the symbol size:

Code
p + theme(legend.key = element_rect(fill = NA),
          legend.title = element_text(color = "chocolate", size = 14, face = 2)) +
  scale_color_discrete("species:") +
  guides(color = guide_legend(override.aes = list(size = 6)))

For continuous variables, choose between guide_legend() and guide_bins():

Code
p2 + guides(color = guide_bins())

Complete themes

Elements of a theme, by Isabella Benabaye:

The default theme is theme_gray(), with two arguments for the base font size (base_size, a number) and font family (base_family, a string such as “serif”, “sans”, “mono”):

Code
p + theme_grey() + labs(title = "Default: Grey")

  • plot + theme_gray()
  • plot + theme_bw()
  • plot + theme_linedraw()
  • plot + theme_light()
  • plot + theme_dark()
  • plot + theme_minimal()
  • plot + theme_classic()
  • plot + theme_void()

The ggthemes package offers additional predefined themes.

theme() has many arguments to modify individual components of a plot, including:

  • all line, rectangle, text, and title elements,
  • the aspect ratio of the panel,
  • axis title, text, ticks, and lines,
  • legend background, margin, text, title, position, and more,
  • panel border and grid lines.

Backgrounds

The panel background is the area where the data is plotted. panel.background adjusts its fill and outline:

Code
p2 + theme(panel.background = element_rect(fill = "#64D2AA", color = "#64D2AA",
                                           linewidth = 2))

The panel border is an overlay on top of panel.background that outlines the panel:

Code
p2 + theme(panel.border = element_rect(fill = "#64D2AA99", color = "#64D2AA",
                                       linewidth = 2))

The plot background is the entire area of the plot, including the panel and the surrounding space:

Code
p2 + theme(plot.background = element_rect(fill = "gray60", color = "gray30",
                                          linewidth = 2))

Grid lines

panel.grid.major and panel.grid.minor style the two sets of grid lines (panel.grid styles both at once):

Code
p + theme(panel.grid.major = element_line(color = "gray10", linewidth = .5),
          panel.grid.minor = element_line(color = "gray70", linewidth = .25))

panel.grid.major.x, panel.grid.major.y, and their minor counterparts style each axis separately:

Code
p2 + theme(panel.grid.major = element_line(linewidth = .5, linetype = "dashed"),
           panel.grid.minor = element_line(linewidth = .25, linetype = "dotted"),
           panel.grid.major.x = element_line(color = "red1"),
           panel.grid.major.y = element_line(color = "blue1"),
           panel.grid.minor.x = element_line(color = "red4"),
           panel.grid.minor.y = element_line(color = "blue4"))

element_blank() removes grid lines selectively (panel.grid.minor) or entirely (panel.grid):

Code
p + theme(panel.grid = element_blank())

Specify the spacing of grid lines through axis breaks with scale_*_continuous():

Code
p + scale_y_continuous(breaks = seq(0, 30, 5), minor_breaks = seq(0, 60, 2.5))

5. Color

color and fill

  • The color argument defines the outline color and fill the filling color of plot elements:
    • geom_point(color = "steelblue", size = 2)
    • For point shapes 21 to 24, both apply:
Code
ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm)) +
  geom_point(shape = 21, size = 2, stroke = 1, color = "#3cc08f", fill = "#c08f3c") +
  labs(x = "Bill length (mm)", y = "Bill depth (mm)")
  • scale_color_* and scale_fill_* functions modify colors mapped to variables; they differ for categorical (qualitative) and continuous (quantitative) variables.

Palettes for categorical variables

scale_color_manual() and scale_fill_manual() specify colors by hand:

Code
p + scale_color_manual(values = c("dodgerblue4", "darkorchid3", "goldenrod1"))

scale_color_brewer() and scale_fill_brewer() use predefined ColorBrewer palettes (run RColorBrewer::display.brewer.all() to see them all):

Code
p + scale_color_brewer(palette = "Dark2")

The {ggthemes} package gives access to the well-known Tableau palette:

Code
library(ggthemes)
p + scale_color_tableau()

The {ggsci} package provides scientific-journal-themed palettes (such as Science via scale_color_aaas() or Nature via scale_color_npg()):

Code
library(ggsci)
p + scale_color_npg()

Palettes for continuous variables

scale_color_gradient() and scale_fill_gradient() apply a sequential gradient:

Code
p2 + scale_color_gradient(low = "darkkhaki", high = "darkgreen")

scale_color_viridis_c() uses the Viridis palettes, which are perceptually uniform and suitable for colorblind viewers:

Code
p2 + scale_color_viridis_c(option = "inferno")

scale_color_carto_c() (from rcartocolor) uses CARTO palettes:

Code
library(rcartocolor)
p2 + scale_color_carto_c(palette = "BurgYl")

The {scico} package provides access to the perceptually uniform palettes by Fabio Crameri:

Code
library(scico)
p2 + scale_color_scico(palette = "berlin")

Colorblind-friendly palettes

Have you ever considered how your figure might appear under various forms of colorblindness? The colorBlindness package simulates it:

Code
library(colorBlindness)
cvdPlot(p)

A safe default: viridis

Viridis works for both discrete and continuous mappings:

Code
p + scale_colour_viridis_d() + labs(title = "Viridis palette for groups")

Code
p2 + scale_colour_viridis_c() + labs(title = "Viridis palette for continuous values")

6. Annotating plots

Reference lines

Adds horizontal lines at specified y-axis values (yintercept):

Code
p + geom_hline(yintercept = c(12, 23))

Adds vertical lines at specified x-axis values (xintercept), with color, linewidth, and linetype for styling:

Code
p + geom_vline(aes(xintercept = 45), linewidth = 1.5, color = "firebrick",
               linetype = "dashed")

Adds a line with a specified intercept and slope, for example a regression fit:

Code
reg <- lm(body_mass_g ~ bill_depth_mm, data = penguins)
p + geom_abline(intercept = coefficients(reg)[1], slope = coefficients(reg)[2],
                color = "darkorange2", linewidth = 1.5)

Segments and highlights

Adds line segments that do not span the entire plot range, using xmin/xmax or ymin/ymax:

Code
p + geom_linerange(aes(x = 45, ymin = 15, ymax = 22), color = "steelblue", linewidth = 1) +
  geom_linerange(aes(y = 16, xmin = 30, xmax = 45), color = "red", linewidth = 1)

Adds line segments with arbitrary start and end points (x, xend, y, yend):

Code
p + annotate(geom = "segment", x = 10, xend = 75, y = 20, yend = 5,
             color = "purple", linewidth = 2)

geom_encircle() from ggalt automatically encloses points in a polygon, useful for highlighting clusters:

Code
library(ggalt)
p + geom_encircle(data = subset(penguins, species == "Adelie"),
                  colour = "blue", spread = 0.002) +
  geom_encircle(data = subset(penguins, species == "Chinstrap"),
                colour = "purple", spread = 0.002) +
  geom_encircle(data = subset(penguins, species == "Gentoo"),
                colour = "red", spread = 0.002) + ylim(10, 23)

Text labels

Adds labels with a rectangle around the text; hjust and vjust control the justification:

Code
p + geom_label(aes(label = species), hjust = .5, vjust = -.5) +
  theme(legend.position = "none")

Like geom_label(), but without the rectangle:

Code
p + geom_text(aes(label = sex), hjust = .5, vjust = -.5) +
  theme(legend.position = "none")

geom_text_repel() and geom_label_repel() from ggrepel repel overlapping labels:

Code
library(ggrepel)
p + geom_label_repel(aes(label = species), fontface = "bold") +
  theme(legend.position = "none")

Annotations

Adds a single text or label annotation at specified coordinates:

Code
p + annotate(geom = "text", x = 45, y = 25, fontface = "bold",
             label = "This is a useful annotation")

Adds custom annotations using grid graphical objects:

Code
library(grid)
my_grob <- grobTree(textGrob("This is species type!", x = .1, y = .9,
                             hjust = 0, gp = gpar(col = "black", fontsize = 15,
                                                  fontface = "bold")))
p + annotation_custom(my_grob) + facet_wrap(~species, scales = "free_x") +
  scale_y_continuous(limits = c(NA, 20)) + theme(legend.position = "none")

ggtext renders text as markdown or HTML:

Code
library(ggtext)
lab_md <- "This plot shows **Bill length** in *mm* versus **Bill depth** in *mm* across species type"
p + geom_richtext(aes(x = 45, y = 22.5, label = lab_md), stat = "unique")

geom_textbox() provides dynamic wrapping for longer annotations:

Code
lab_long <- "**Association**<br><i style='font-size:8pt;color:black;'>This graph is a scatter plot showing the association between bill length and bill depth for each species type, so we can see that there is a clear association.</i>"
p + geom_textbox(aes(x = 45, y = 20, label = lab_long),
                 width = unit(25, "lines"), stat = "unique")

7. Coordinates, scales, and smoothing

Coordinate systems

Flips the x and y coordinates, making horizontal plots vertical and vice versa; particularly useful for bar charts and boxplots:

Code
p + coord_flip()

Fixes the aspect ratio between the units of the x and y axes:

Code
p + scale_x_continuous(breaks = seq(0, 60, by = 15)) +
  coord_fixed(ratio = 2/3) +
  theme(plot.background = element_rect(fill = "grey80"))

Converts the plot to polar coordinates, often used for circular bar charts:

Code
penguins %>% group_by(species) %>%
  summarize(median_flipper = median(flipper_length_mm, na.rm = TRUE)) %>%
  ggplot(aes(x = species, y = median_flipper)) +
  geom_col(aes(fill = species), color = NA) +
  labs(x = "", y = "Median flipper length (mm)") + coord_polar() +
  guides(fill = "none")

coord_polar(theta = "y") turns a single stacked bar into a pie chart:

Code
peng_sum <- penguins %>% mutate(n_all = n()) %>% group_by(species) %>%
  summarize(Total = n() / unique(n_all))
ggplot(peng_sum, aes(x = "", y = Total)) +
  geom_col(aes(fill = species), width = 1, color = NA) +
  coord_polar(theta = "y") +
  scale_fill_brewer(palette = "Set1", name = "Species:")

Scale transformations

scale_x_continuous() / scale_y_continuous() customize breaks, labels, and limits:

Code
p + scale_x_continuous(breaks = seq(0, 60, by = 10))

scale_x_reverse() / scale_y_reverse() flip the direction of an axis:

Code
p + scale_y_reverse()

scale_y_log10() and scale_y_sqrt() transform the axis, useful for data with a wide range:

Code
p + scale_y_sqrt()

Smoothing

stat_smooth() adds a LOESS fit (method = "loess") for fewer than 1,000 points, or a GAM (method = "gam") otherwise:

Code
p + geom_point(color = "gray40", alpha = .5) + stat_smooth()

It is just as easy to add a standard linear fit:

Code
p + geom_point(color = "gray40", alpha = .5) +
  stat_smooth(method = "lm", se = FALSE, color = "firebrick", linewidth = 1.3)

You can specify the model yourself, for example a polynomial regression:

Code
p + geom_point(color = "gray40", alpha = .3) +
  geom_smooth(method = "lm",
              formula = y ~ x + I(x^2) + I(x^3) + I(x^4) + I(x^5),
              color = "black", fill = "firebrick")

8. Facets and multi-panel figures

facet_grid() vs facet_wrap()

  • facet_grid() facets the plot with a variable in a single direction (horizontal or vertical).
  • facet_wrap() places the facets next to each other and wraps them according to the provided number of columns and/or rows.
Type Formula Description
Grid facet_grid(. ~ x) Facet horizontally across x values
Grid facet_grid(y ~ .) Facet vertically across y values
Grid facet_grid(y ~ x) Facet 2-dimensionally
Wrap facet_wrap(~ x) Facet across x values
Wrap facet_wrap(~ x + y) Facet across x and y values

Facets in practice

Code
p + facet_grid(~ species, scales = "free")

Code
p + facet_grid(year ~ species, scales = "free")

ncol and nrow control the number of columns and rows in facet_wrap():

Code
p + theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
  facet_wrap(~ species + sex, ncol = 3)

scales = "free" lets every panel use its own axis ranges ("free_x" / "free_y" control one axis):

Code
p + theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
  facet_wrap(~ species, ncol = 3, scales = "free")

Styling facet labels

Use theme() to customize the appearance of the facet labels:

Code
p + facet_wrap(~ species, ncol = 3, scales = "free_x") +
  theme(strip.text = element_text(face = "bold", color = "white", hjust = 0, size = 20),
        strip.background = element_rect(fill = "chartreuse4", linetype = "dotted"))

element_textbox_highlight() from ggtext can highlight specific labels:

Code
library(ggtext)
library(purrr)  # for %||%
p + facet_wrap(~ species, ncol = 3, scales = "free_x") +
  theme(strip.background = element_blank(),
        strip.text = element_textbox_highlight(
          family = "Playfair", size = 12, face = "bold",
          fill = "white", box.color = "chartreuse4", color = "chartreuse4",
          halign = .5, linetype = 1, r = unit(5, "pt"), width = unit(1, "npc"),
          padding = margin(5, 0, 3, 0), margin = margin(0, 1, 3, 1),
          hi.labels = c("Adelie"), hi.fill = "chartreuse4",
          hi.box.col = "black", hi.col = "white"))

Combining different plots

  • patchwork: combine multiple plots with simple syntax such as p1 + p2, p1 / p2, or (g + p2) / p1, and define complex layouts with a design matrix via plot_layout(design = layout).
  • cowplot: another package for combining plots, for example plot_grid(plot_grid(g, p1), p2, ncol = 1).
  • {gridExtra}: arrange plots with grid.arrange(g, p1, p2, layout_matrix = rbind(c(1, 2), c(3, 3))).
Code
library(patchwork)
g1 <- p
g2 <- ggplot(penguins, aes(x = species, fill = island)) + geom_bar()
g1 + g2

Different plots for different contexts

Four geoms at a glance

Code
p1 <- ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, colour = species)) +
  geom_point()
p2g <- ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, colour = species)) +
  geom_density2d()
p3 <- ggplot(penguins, aes(x = species, fill = island)) +
  geom_bar()
p4 <- ggplot(penguins, aes(x = species, y = bill_depth_mm, fill = species)) +
  geom_boxplot()

library(patchwork)
p1 + p2g + p3 + p4

Bar plots: geom_bar()

How many penguins of each species are in this data set?

Code
ggplot(penguins, aes(x = species, fill = species)) +
  geom_bar() + labs(title = "Number of Penguins by Species",
                    x = "Species", y = "Count", fill = "Species") + theme_minimal()

Number of penguin species on each island:

Code
ggplot(penguins) + geom_bar(aes(x = island, fill = species)) +
  labs(title = "Population of penguin species on each island",
       y = "count of species") +
  theme(text = element_text(size = 14))

Body mass by species and sex (stat = "identity" plots the values as given):

Code
ggplot(penguins, aes(x = species, y = body_mass_g, fill = sex)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Body Mass by Species and Sex",
       x = "Species", y = "Body Mass (g)", fill = "Sex") +
  theme_minimal()

Histograms: geom_histogram()

A histogram is an accurate graphical representation of the distribution of numeric data. There is only one aesthetic required: the x variable.

Code
ggplot(penguins, aes(x = bill_length_mm)) + geom_histogram() +
  ggtitle("Histogram of penguin bill length")

Boxplots: geom_boxplot()

Body mass distribution of penguins by species:

Code
ggplot(penguins, aes(x = species, y = body_mass_g, fill = species)) +
  geom_boxplot() +
  labs(title = "Body Mass Distribution of Penguins by Species",
       x = "Species", y = "Body Mass (g)", fill = "Species") +
  theme_minimal()

Boxplot with annotations via geom_signif() from ggsignif:

Code
library(ggsignif)
ggplot(penguins, aes(x = species, y = bill_length_mm, fill = species)) +
  geom_boxplot() +
  # specify the comparison we are interested in
  geom_signif(comparisons = list(c("Adelie", "Gentoo")), map_signif_level = TRUE)

Violin plots: geom_violin()

A violin plot visualizes the distribution of a numeric variable for one or several groups. It is close to a boxplot but allows a deeper understanding of the distribution.

Code
violin <- ggplot(penguins, aes(x = species, y = bill_length_mm)) +
  geom_violin(trim = FALSE, fill = "grey70", alpha = .5) +
  labs(title = "Violin plot")
violin

Code
violin + geom_jitter(shape = 16, position = position_jitter(0.2), alpha = .3) +
  geom_boxplot(width = .05)

Line charts: geom_line()

geom_line() displays values over time and requires a group = aesthetic: use group = 1 for a single line, or group = variable_name to split lines by a variable.

Code
data(AirPassengers)
airpassengers <- data.frame(AirPassengers,
                            year = trunc(time(AirPassengers)),
                            month = month.abb[cycle(AirPassengers)])
airpassengers %>% group_by(year) %>%
  summarize(sum = sum(AirPassengers, na.rm = TRUE)) %>%
  ggplot() + geom_line(aes(x = year, y = sum, group = 1))

Code
ggplot(airpassengers) +
  geom_line(aes(x = year, y = AirPassengers, group = month))

The lines are very close together, so instead of geom_label() we use ggrepel, which gives the labels space and connects them with their lines:

Code
library(ggrepel)
ggplot(airpassengers) +
  geom_line(aes(x = year, y = AirPassengers, group = month)) +
  geom_label_repel(data = airpassengers %>% filter(year == max(year)),
                   aes(x = year, y = AirPassengers, label = month))

Summarize y values: stat_summary()

Code
ggplot(mtcars, aes(cyl, mpg)) + geom_point() +
  stat_summary(fun = "median", geom = "point",
               colour = "red", size = 6) + labs(title = "Medians")

Code
ggplot(mtcars, aes(cyl, mpg)) + geom_point() +
  stat_summary(fun.data = "mean_cl_boot", colour = "red", size = 1.6) +
  labs(title = "Means and CIs")

10. Interactive graphics

Interactive plots

Interactive plots enhance the user experience with dynamic, visually appealing graphics. Several libraries work with ggplot2 or on their own:

Plotly creates online, interactive graphics and web apps; ggplotly(p) converts a ggplot2 object into an interactive plot:

Code
library(plotly)
ggplotly(p)

ggiraph makes ggplot2 graphs dynamic with tooltips, animations, and JavaScript actions; girafe(ggobj = ...) renders the interactive plot:

Code
library(ggiraph)
p3i <- ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm)) +
  geom_point_interactive(aes(color = species, tooltip = species, data_id = species)) +
  scale_color_brewer(palette = "Dark2", guide = "none") +
  labs(x = "Bill length (mm)", y = "Bill depth (mm)")
girafe(ggobj = p3i)

highcharter brings the Highcharts charting library to R:

Code
library(highcharter)
hchart(penguins, "scatter", hcaes(x = bill_length_mm, y = bill_depth_mm,
                                  group = species))

echarts4r provides an interface to the ECharts charting library:

Code
library(echarts4r)
penguins %>% e_charts(bill_length_mm) %>%
  e_scatter(body_mass_g, symbol_size = 7) %>%
  e_visual_map(body_mass_g) %>% e_y_axis(name = "Bill length") %>%
  e_legend(FALSE)

11. Saving your plots

Saving with ggsave()

  • ggsave() saves the last displayed plot (or a named plot object) to disk; the file extension determines the format (.png, .pdf, .svg, .tiff).
  • Set width, height, and dpi explicitly so figures are reproducible and meet journal or donor requirements:
Code
final_plot <- p + theme_minimal()
ggsave("penguins_scatter.png", plot = final_plot,
       width = 8, height = 5, dpi = 300)
ggsave("penguins_scatter.pdf", plot = final_plot,
       width = 8, height = 5)

12. From beginner to advanced

Principles of effective visualization

What separates an adequate chart from an excellent one is rarely the code; it is the design decisions behind it:

  • Show the data honestly. Use sensible axis ranges, never truncate or distort to exaggerate an effect, and show uncertainty where it matters.
  • Match the chart to the question. Choose the geometry from the question you are answering, not from habit.
  • Reduce clutter. Remove grid lines, borders, and legends that do not carry information; every element should earn its place.
  • Use color with intent. Color should encode meaning, not decorate. Limit categorical palettes to 5 to 7 colors and use colorblind-safe palettes such as viridis by default.
  • Label directly where possible. Direct labels beat legends; titles should state the finding, not just the variables.
  • Design for the audience. A figure for a donor report needs larger text, fewer panels, and a clearer takeaway than a figure for exploratory analysis.
  • Always cite the data source under the figure, and label tables and figures with descriptions rather than variable names.

Choosing the right chart

Start from the question, then pick the geometry:

Question type Typical charts ggplot2 geoms
Distribution histogram, density, boxplot, violin, ridgeline geom_histogram, geom_density, geom_violin
Comparison bar, dot plot, grouped boxplot geom_col, geom_point, geom_boxplot
Relationship scatter, bubble, heatmap, smoothed line geom_point, geom_tile, geom_smooth
Composition stacked bar, proportional bar, treemap geom_col + position, treemapify
Evolution line, area, slope chart geom_line, geom_area

Two interactive chart choosers: from Data to Viz (a decision tree from your data type to the right chart, with R code) and the R Graph Gallery (hundreds of charts with copy-paste code).

A learning path

Goal: be comfortable building and customizing standard charts.

Goal: control every element of a figure and develop design judgment.

Goal: design publication-grade and donor-grade figures, and master the grammar itself.

Reference library

Resource Author Focus Level
R for Data Science Wickham et al. Workflow, first plots Beginner
Data Visualization: A Practical Introduction Kieran Healy Principles plus code Beginner to intermediate
R Graphics Cookbook Winston Chang Task-based recipes Beginner to intermediate
ggplot2: Elegant Graphics for Data Analysis Hadley Wickham Grammar theory Intermediate to advanced
Fundamentals of Data Visualization Claus Wilke Design principles All levels

All five books are free to read online. Also useful: The Ultimate Guide to Get Started With ggplot2 by Albert Rapp, the theme elements reference by Isabella Benabaye, the original Grammar of Graphics chapter, and package docs for {ggthemes}, {ggsci}, {scico}, and {colorBlindness}.

How to keep improving

  • Rebuild charts you admire. Take a figure from a journal or news outlet and reproduce it in ggplot2; you will learn more from one reproduction than from ten tutorials.
  • Iterate in public. Share drafts with colleagues, ask what they read from the chart in five seconds, and revise until the takeaway is immediate.
  • Build a personal theme and palette so every figure you produce is consistent and branded by default.
  • Critique before you decorate. First ask whether the chart answers the question, then polish.
  • Practice on a schedule. One #TidyTuesday figure per week compounds quickly into an expert portfolio.

Thank you. Questions and discussion are welcome. Part 2 of this training (graph_part2.qmd) continues with the advanced chart gallery: ridgeline, lollipop, dumbbell, diverging bars, heatmaps, slope charts, uncertainty, storytelling, and a reusable C4ED theme.