In class activity 8: Study Design and Power Analysis
Introduction
This activity demonstrates key concepts in experimental design using a made up fish example:
Formulating research questions
Understanding study designs
Recognizing proper replication vs. pseudoreplication
Conducting power analysis (before and after data collection)
Planning sampling strategies
We’ll study fish mass above and below waterfalls across multiple streams.
Part 1: Load Required Packages
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.1 ✔ readr 2.2.0
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.3 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(pwr)set.seed(42)
Part 2: Research Question
Main Question: Do waterfalls affect fish mass in stream ecosystems?
Hypothesis: Fish below waterfalls are larger due to better feeding opportunities from nutrients and prey from the lake.
Part 3: Study Design - Natural Experiment
We’ll sample fish above and below waterfalls in 6 different streams. This represents a “natural experiment” since we can’t manipulate waterfall presence.
# Create fish data from streams# ADJUST THESE VARIABLES TO EXPLORE DIFFERENT SCENARIOS:n_streams <-5# Number of streams to samplen_fish_per_site <-6# Number of fish per site (above/below)waterfall_effect <-15# Effect size: how much larger fish are below (in grams)# Stream characteristics (baseline means will be generated)stream_df <-tibble(stream_id =1:n_streams,stream_mean =rnorm(n_streams, mean =85, sd =5) # Random baseline for each stream)f_df <-tibble(stream_id =rep(1:n_streams, each = n_fish_per_site *2),position =rep(c("above", "below"), each = n_fish_per_site, times = n_streams),fish_id =1:(n_streams * n_fish_per_site *2)) %>%left_join(stream_df, by ="stream_id") %>%mutate(# Above waterfall = stream baseline, below = baseline + effectexpected_mass =ifelse(position =="above", stream_mean, stream_mean + waterfall_effect),mass_g =rnorm(n(), mean = expected_mass, sd =12) ) %>%select(stream_id, position, fish_id, mass_g)head(f_df, 12)
# Visualize the databasic_plot <- f_df %>%ggplot(aes(x = position, y = mass_g, fill = position)) +geom_boxplot() +geom_jitter(width =0.2, alpha =0.6) +labs(x ="Position Relative to Waterfall",y ="Fish Mass (g)") +theme_minimal()basic_plot
# Show data by individual streamsstream_plot <- f_df %>%ggplot(aes(x = position, y = mass_g, group = stream_id, color =as.factor(stream_id))) +stat_summary(fun = mean, geom ="point") +stat_summary(fun = mean, geom ="line") +stat_summary(fun.data = mean_se, geom ="errorbar", width =0.1) +facet_wrap(~stream_id) +labs(x ="Position", y ="Fish Mass (g)", color ="Stream") +theme_minimal()stream_plot
Part 4: Replication Issues
What is the experimental unit here?
ImportantActivity 1: Identify the Experimental Unit
Question: In our fish study, what is the true experimental unit?
Individual fish
Stream locations (above/below pairs)
Individual streams
All fish combined
Pseudoreplication Example
# WRONG analysis: treating all fish as independent# This ignores that fish within streams may be similarpseudo_test <-t.test(mass_g ~ position, data = f_df)pseudo_test
Welch Two Sample t-test
data: mass_g by position
t = -4.2726, df = 56.515, p-value = 7.488e-05
alternative hypothesis: true difference in means between group above and group below is not equal to 0
95 percent confidence interval:
-24.308784 -8.792322
sample estimates:
mean in group above mean in group below
85.88446 102.43501
# CORRECT analysis: average by stream first, then comparestream_means <- f_df %>%group_by(stream_id, position) %>%summarize(mean_mass =mean(mass_g), .groups ="drop")# Reshape for paired analysis stream_wide <- stream_means %>%pivot_wider(names_from = position, values_from = mean_mass)proper_test <-t.test(stream_wide$below, stream_wide$above, paired =TRUE)proper_test
Paired t-test
data: stream_wide$below and stream_wide$above
t = 3.1164, df = 4, p-value = 0.03565
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
1.805299 31.295806
sample estimates:
mean difference
16.55055
The proper analysis uses streams as replicates (n = 6), not individual fish (n = 36 per group).
Part 5: Power Analysis - Planning Phase
Let’s calculate how many streams we need to detect a meaningful difference. The effect size here is called Cohen’s D measures the difference between two group means in terms of standard deviations. It helps to understand the magnitude of a difference beyond statistical significance by expressing the distance between means as a standardized value. For example, a Cohen’s d of 0.2 is considered a small effect, 0.5 a moderate effect, and 0.8 a large effect
# Expected values based on pilot dataabove_mean <-85below_mean <-110pooled_sd <-20# Calculate effect sizeeffect_val <-abs(below_mean - above_mean) / pooled_sdeffect_val
Paired t test power calculation
n = 7.171657
d = 1.25
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: n is number of *pairs*
We need at least 7 streams for 80% power.
Power Curve
# Show how power changes with sample sizepower_curve_df <-tibble(streams =3:15) %>%rowwise() %>%mutate(power =pwr.t.test(n = streams, d = effect_val, sig.level =0.05, type ="paired")$power) %>%ungroup()curve_plot <-ggplot(power_curve_df, aes(x = streams, y = power)) +geom_line(linewidth =1.2, color ="blue") +geom_hline(yintercept =0.8, linetype ="dashed", color ="red") +geom_vline(xintercept =ceiling(power_result$n), linetype ="dashed", color ="red") +labs(x ="Number of Streams", y ="Statistical Power") +theme_minimal()curve_plot
# What power did we actually have?actual_power <-pwr.t.test(n =6,d = observed_effect,sig.level =0.05,type ="paired")actual_power
Paired t test power calculation
n = 6
d = 1.393684
sig.level = 0.05
power = 0.7778318
alternative = two.sided
NOTE: n is number of *pairs*
With 6 streams, we had 100% power to detect the effect we observed.
Part 7: Alternative Analysis Approaches
Unpaired t-test (ignoring pairing)
# What if we ignored the stream pairing?unpaired_test <-t.test(mean_mass ~ position, data = stream_means)unpaired_test
Welch Two Sample t-test
data: mean_mass by position
t = -2.7496, df = 7.0247, p-value = 0.02842
alternative hypothesis: true difference in means between group above and group below is not equal to 0
95 percent confidence interval:
-30.773874 -2.327232
sample estimates:
mean in group above mean in group below
85.88446 102.43501