ggplot2 Summary Plots: Mean & Standard Error

Learn how to plot means and standard errors using ggplot2 and tidyverse.

Objective

In this guide you will learn how to: - Read in data from an Excel file. - Compute summary statistics (mean and standard error). - Create ggplot2 plots that display the mean and standard error using stat_summary().

Data for the Exercise

We use a sample M&M dataset. For more sample data files, check out the Data Files page.

Load Libraries

Make sure you have installed these packages; if not, run install.packages("packageName") separately.

library(tidyverse)  # Loads ggplot2, dplyr, and other tidyverse packages
library(readxl)     # For reading Excel files
library(skimr)      # For summary statistics (optional)

Read in the Data

We read the M&M dataset from an Excel file.

# This code runs but isn't shown in the rendered document.
mm_df <- read_excel("data/mms.xlsx")
head(mm_df)
# A tibble: 6 × 4
  center        color  diameter  mass
  <chr>         <chr>     <dbl> <dbl>
1 peanut butter blue       16.2  2.18
2 peanut butter brown      16.5  2.01
3 peanut butter orange     15.5  1.78
4 peanut butter brown      16.3  1.98
5 peanut butter yellow     15.6  1.62
6 peanut butter brown      17.4  2.59

Summary Statistics

You can quickly inspect your data using base R or skimr(). For example, to calculate the mean diameter and mass by center and color, you can use:

mm_summary <- mm_df %>% 
  group_by(center, color) %>% 
  summarize(
    mean_diameter = mean(diameter, na.rm = TRUE),
    mean_mass = mean(mass, na.rm = TRUE)
  )
mm_summary
# A tibble: 18 × 4
# Groups:   center [3]
   center        color  mean_diameter mean_mass
   <chr>         <chr>          <dbl>     <dbl>
 1 peanut        blue            14.8     2.58 
 2 peanut        brown           14.7     2.57 
 3 peanut        green           15.0     2.68 
 4 peanut        orange          14.6     2.57 
 5 peanut        red             15.0     2.63 
 6 peanut        yellow          14.5     2.57 
 7 peanut butter blue            15.9     1.85 
 8 peanut butter brown           15.7     1.80 
 9 peanut butter green           16.0     1.92 
10 peanut butter orange          15.7     1.73 
11 peanut butter red             15.8     1.74 
12 peanut butter yellow          15.7     1.74 
13 plain         blue            13.2     0.860
14 plain         brown           13.3     0.871
15 plain         green           13.3     0.870
16 plain         orange          13.3     0.865
17 plain         red             13.3     0.854
18 plain         yellow          13.4     0.865

Plotting Mean and Standard Error with ggplot2

We use stat_summary() to display the mean as a point and the standard error as error bars.

1. Basic Mean and SE Plot

This plot shows the mean diameter for each candy color with standard error error bars.

ggplot(mm_df, aes(x = color, y = diameter, color = color)) +
  stat_summary(fun = mean, na.rm = TRUE, geom = "point", size = 3) +
  stat_summary(fun.data = mean_se, na.rm = TRUE, geom = "errorbar", width = 0.2) +
  labs(
    x = "Candy Color",
    y = "Diameter (units)",
    title = "Mean Diameter with Standard Error"
  ) +
  theme_minimal()

2. Adding Grouping by Center

Here, we add a shape mapping to distinguish between different candy centers (e.g., plain, peanut, etc.).

ggplot(mm_df, aes(x = color, y = diameter, color = color, shape = center)) +
  stat_summary(fun = mean, na.rm = TRUE, geom = "point", size = 3) +
  stat_summary(fun.data = mean_se, na.rm = TRUE, geom = "errorbar", width = 0.3) +
  labs(
    x = "Candy Color",
    y = "Diameter (units)",
    title = "Mean Diameter with SE Grouped by Center"
  ) +
  theme_minimal()

3. Dodging for Better Separation

When grouping by center, points and error bars may overlap. Use position_dodge() to separate them.

ggplot(mm_df, aes(x = color, y = diameter, color = color, shape = center)) +
  stat_summary(
    fun = mean, na.rm = TRUE, geom = "point", size = 3,
    position = position_dodge(width = 0.3)
  ) +
  stat_summary(
    fun.data = mean_se, na.rm = TRUE, geom = "errorbar", width = 0.3,
    position = position_dodge(width = 0.3)
  ) +
  labs(
    x = "Candy Color",
    y = "Diameter (units)",
    title = "Mean Diameter with SE (Dodged by Center)"
  ) +
  theme_minimal()

Summary

In this guide, you learned how to:

  • Load and inspect data.

  • Compute summary statistics.

  • Create ggplot2 plots displaying the mean and standard error.

  • Enhance plots by grouping and dodging to improve clarity.