library(tidyverse) # Loads ggplot2, dplyr, and other tidyverse packages
library(readxl) # For reading Excel files
library(skimr) # For summary statistics (optional)
ggplot2 Summary Plots: Mean & Standard Error
Objective
In this guide you will learn how to: - Read in data from an Excel file. - Compute summary statistics (mean and standard error). - Create ggplot2 plots that display the mean and standard error using stat_summary()
.
Data for the Exercise
We use a sample M&M dataset. For more sample data files, check out the Data Files page.
Load Libraries
Make sure you have installed these packages; if not, run install.packages("packageName")
separately.
Read in the Data
We read the M&M dataset from an Excel file.
# This code runs but isn't shown in the rendered document.
<- read_excel("data/mms.xlsx")
mm_df head(mm_df)
# A tibble: 6 × 4
center color diameter mass
<chr> <chr> <dbl> <dbl>
1 peanut butter blue 16.2 2.18
2 peanut butter brown 16.5 2.01
3 peanut butter orange 15.5 1.78
4 peanut butter brown 16.3 1.98
5 peanut butter yellow 15.6 1.62
6 peanut butter brown 17.4 2.59
Summary Statistics
You can quickly inspect your data using base R or skimr()
. For example, to calculate the mean diameter and mass by center
and color
, you can use:
<- mm_df %>%
mm_summary group_by(center, color) %>%
summarize(
mean_diameter = mean(diameter, na.rm = TRUE),
mean_mass = mean(mass, na.rm = TRUE)
) mm_summary
# A tibble: 18 × 4
# Groups: center [3]
center color mean_diameter mean_mass
<chr> <chr> <dbl> <dbl>
1 peanut blue 14.8 2.58
2 peanut brown 14.7 2.57
3 peanut green 15.0 2.68
4 peanut orange 14.6 2.57
5 peanut red 15.0 2.63
6 peanut yellow 14.5 2.57
7 peanut butter blue 15.9 1.85
8 peanut butter brown 15.7 1.80
9 peanut butter green 16.0 1.92
10 peanut butter orange 15.7 1.73
11 peanut butter red 15.8 1.74
12 peanut butter yellow 15.7 1.74
13 plain blue 13.2 0.860
14 plain brown 13.3 0.871
15 plain green 13.3 0.870
16 plain orange 13.3 0.865
17 plain red 13.3 0.854
18 plain yellow 13.4 0.865
Plotting Mean and Standard Error with ggplot2
We use stat_summary()
to display the mean as a point and the standard error as error bars.
1. Basic Mean and SE Plot
This plot shows the mean diameter for each candy color with standard error error bars.
ggplot(mm_df, aes(x = color, y = diameter, color = color)) +
stat_summary(fun = mean, na.rm = TRUE, geom = "point", size = 3) +
stat_summary(fun.data = mean_se, na.rm = TRUE, geom = "errorbar", width = 0.2) +
labs(
x = "Candy Color",
y = "Diameter (units)",
title = "Mean Diameter with Standard Error"
+
) theme_minimal()
2. Adding Grouping by Center
Here, we add a shape mapping to distinguish between different candy centers (e.g., plain, peanut, etc.).
ggplot(mm_df, aes(x = color, y = diameter, color = color, shape = center)) +
stat_summary(fun = mean, na.rm = TRUE, geom = "point", size = 3) +
stat_summary(fun.data = mean_se, na.rm = TRUE, geom = "errorbar", width = 0.3) +
labs(
x = "Candy Color",
y = "Diameter (units)",
title = "Mean Diameter with SE Grouped by Center"
+
) theme_minimal()
3. Dodging for Better Separation
When grouping by center, points and error bars may overlap. Use position_dodge()
to separate them.
ggplot(mm_df, aes(x = color, y = diameter, color = color, shape = center)) +
stat_summary(
fun = mean, na.rm = TRUE, geom = "point", size = 3,
position = position_dodge(width = 0.3)
+
) stat_summary(
fun.data = mean_se, na.rm = TRUE, geom = "errorbar", width = 0.3,
position = position_dodge(width = 0.3)
+
) labs(
x = "Candy Color",
y = "Diameter (units)",
title = "Mean Diameter with SE (Dodged by Center)"
+
) theme_minimal()
Summary
In this guide, you learned how to:
Load and inspect data.
Compute summary statistics.
Create ggplot2 plots displaying the mean and standard error.
Enhance plots by grouping and dodging to improve clarity.