library(tidyverse)
library(pwr)
set.seed(42)
08_Class_Activity
In class activity 8: Study Design and Power Analysis
Introduction
This activity demonstrates key concepts in experimental design using a made up fish example:
- Formulating research questions
- Understanding study designs
- Recognizing proper replication vs. pseudoreplication
- Conducting power analysis (before and after data collection)
- Planning sampling strategies
We’ll study fish mass above and below waterfalls across multiple streams.
Part 1: Load Required Packages
Part 2: Research Question
Main Question: Do waterfalls affect fish mass in stream ecosystems?
Hypothesis: Fish below waterfalls are larger due to better feeding opportunities from nutrients and prey from the lake.
Part 3: Study Design - Natural Experiment
We’ll sample fish above and below waterfalls in 6 different streams. This represents a “natural experiment” since we can’t manipulate waterfall presence.
# Create fish data from streams
# ADJUST THESE VARIABLES TO EXPLORE DIFFERENT SCENARIOS:
<- 5 # Number of streams to sample
n_streams <- 6 # Number of fish per site (above/below)
n_fish_per_site <- 15 # Effect size: how much larger fish are below (in grams)
waterfall_effect
# Stream characteristics (baseline means will be generated)
<- tibble(
stream_df stream_id = 1:n_streams,
stream_mean = rnorm(n_streams, mean = 85, sd = 5) # Random baseline for each stream
)
<- tibble(
f_df stream_id = rep(1:n_streams, each = n_fish_per_site * 2),
position = rep(c("above", "below"), each = n_fish_per_site, times = n_streams),
fish_id = 1:(n_streams * n_fish_per_site * 2)
%>%
) left_join(stream_df, by = "stream_id") %>%
mutate(
# Above waterfall = stream baseline, below = baseline + effect
expected_mass = ifelse(position == "above", stream_mean, stream_mean + waterfall_effect),
mass_g = rnorm(n(), mean = expected_mass, sd = 12)
%>%
) select(stream_id, position, fish_id, mass_g)
head(f_df, 12)
# A tibble: 12 × 4
stream_id position fish_id mass_g
<int> <chr> <int> <dbl>
1 1 above 1 90.6
2 1 above 2 110.
3 1 above 3 90.7
4 1 above 4 116.
5 1 above 5 91.1
6 1 above 6 108.
7 1 below 7 134.
8 1 below 8 90.2
9 1 below 9 104.
10 1 below 10 105.
11 1 below 11 114.
12 1 below 12 103.
# Visualize the data
<- f_df %>%
basic_plot ggplot(aes(x = position, y = mass_g, fill = position)) +
geom_boxplot() +
geom_jitter(width = 0.2, alpha = 0.6) +
labs(x = "Position Relative to Waterfall",
y = "Fish Mass (g)") +
theme_minimal()
basic_plot
# Show data by individual streams
<- f_df %>%
stream_plot ggplot(aes(x = position, y = mass_g, group = stream_id, color = as.factor(stream_id))) +
stat_summary(fun = mean, geom = "point") +
stat_summary(fun = mean, geom = "line") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.1) +
facet_wrap(~stream_id) +
labs(x = "Position", y = "Fish Mass (g)", color = "Stream") +
theme_minimal()
stream_plot
Part 4: Replication Issues
What is the experimental unit here?
Question: In our fish study, what is the true experimental unit?
- Individual fish
- Stream locations (above/below pairs)
- Individual streams
- All fish combined
Pseudoreplication Example
# WRONG analysis: treating all fish as independent
# This ignores that fish within streams may be similar
<- t.test(mass_g ~ position, data = f_df)
pseudo_test pseudo_test
Welch Two Sample t-test
data: mass_g by position
t = -4.2726, df = 56.515, p-value = 7.488e-05
alternative hypothesis: true difference in means between group above and group below is not equal to 0
95 percent confidence interval:
-24.308784 -8.792322
sample estimates:
mean in group above mean in group below
85.88446 102.43501
# CORRECT analysis: average by stream first, then compare
<- f_df %>%
stream_means group_by(stream_id, position) %>%
summarize(mean_mass = mean(mass_g), .groups = "drop")
# Reshape for paired analysis
<- stream_means %>%
stream_wide pivot_wider(names_from = position,
values_from = mean_mass)
<- t.test(stream_wide$below, stream_wide$above, paired = TRUE)
proper_test proper_test
Paired t-test
data: stream_wide$below and stream_wide$above
t = 3.1164, df = 4, p-value = 0.03565
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
1.805299 31.295806
sample estimates:
mean difference
16.55055
The proper analysis uses streams as replicates (n = 6), not individual fish (n = 36 per group).
Part 5: Power Analysis - Planning Phase
Let’s calculate how many streams we need to detect a meaningful difference. The effect size here is called Cohen’s D measures the difference between two group means in terms of standard deviations. It helps to understand the magnitude of a difference beyond statistical significance by expressing the distance between means as a standardized value. For example, a Cohen’s d of 0.2 is considered a small effect, 0.5 a moderate effect, and 0.8 a large effect
# Expected values based on pilot data
<- 85
above_mean <- 110
below_mean <- 20
pooled_sd
# Calculate effect size
<- abs(below_mean - above_mean) / pooled_sd
effect_val effect_val
[1] 1.25
# Calculate required sample size for 80% power
<- pwr.t.test(
power_result d = effect_val,
sig.level = 0.05,
power = 0.8,
type = "paired"
)
power_result
Paired t test power calculation
n = 7.171657
d = 1.25
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: n is number of *pairs*
We need at least 7
streams for 80% power.
Power Curve
# Show how power changes with sample size
<- tibble(streams = 3:15) %>%
power_curve_df rowwise() %>%
mutate(power = pwr.t.test(n = streams,
d = effect_val,
sig.level = 0.05,
type = "paired")$power) %>%
ungroup()
<- ggplot(power_curve_df, aes(x = streams, y = power)) +
curve_plot geom_line(linewidth = 1.2, color = "blue") +
geom_hline(yintercept = 0.8, linetype = "dashed", color = "red") +
geom_vline(xintercept = ceiling(power_result$n), linetype = "dashed", color = "red") +
labs(x = "Number of Streams", y = "Statistical Power") +
theme_minimal()
curve_plot
Change the values below to see how power changes:
# Experiment with different scenarios
<- 85
above_mean <- 100 # Try changing this
below_mean <- 25 # Try changing this
pooled_sd
<- abs(below_mean - above_mean) / pooled_sd
effect_val
<- pwr.t.test(
power_result d = effect_val,
sig.level = 0.05,
power = 0.8,
type = "paired"
)
power_result
Questions:
- What happens to required sample size if the effect is smaller (below_mean = 95)?
- What if variation is higher (pooled_sd = 30)?
- What if we want 90% power instead of 80%?
Part 6: Post-Hoc Power Analysis
Now let’s analyze the power we actually had with our 6 streams.
# Calculate observed effect size from our data
<- mean(stream_wide$above)
observed_above <- mean(stream_wide$below)
observed_below <- observed_below - observed_above
observed_diff <- sd(stream_wide$below - stream_wide$above)
observed_sd
<- observed_diff / observed_sd
observed_effect observed_effect
[1] 1.393684
# What power did we actually have?
<- pwr.t.test(
actual_power n = 6,
d = observed_effect,
sig.level = 0.05,
type = "paired"
)
actual_power
Paired t test power calculation
n = 6
d = 1.393684
sig.level = 0.05
power = 0.7778318
alternative = two.sided
NOTE: n is number of *pairs*
With 6 streams, we had 100
% power to detect the effect we observed.
Part 7: Alternative Analysis Approaches
Unpaired t-test (ignoring pairing)
# What if we ignored the stream pairing?
<- t.test(mean_mass ~ position, data = stream_means)
unpaired_test unpaired_test
Welch Two Sample t-test
data: mean_mass by position
t = -2.7496, df = 7.0247, p-value = 0.02842
alternative hypothesis: true difference in means between group above and group below is not equal to 0
95 percent confidence interval:
-30.773874 -2.327232
sample estimates:
mean in group above mean in group below
85.88446 102.43501
Summary Statistics
# Calculate summary by position
<- f_df %>%
summary_df group_by(position) %>%
summarize(
n_fish = n(),
mean_mass = mean(mass_g),
sd_mass = sd(mass_g),
se_mass = sd_mass / sqrt(n_fish)
)
summary_df
# A tibble: 2 × 5
position n_fish mean_mass sd_mass se_mass
<chr> <int> <dbl> <dbl> <dbl>
1 above 30 85.9 16.2 2.95
2 below 30 102. 13.7 2.51
# Summary by stream (proper replication level)
<- stream_means %>%
stream_summary group_by(position) %>%
summarize(
n_streams = n(),
mean_mass = mean(mean_mass),
sd_mass = sd(mean_mass),
se_mass = sd_mass / sqrt(n_streams)
)
stream_summary
# A tibble: 2 × 5
position n_streams mean_mass sd_mass se_mass
<chr> <int> <dbl> <dbl> <dbl>
1 above 5 85.9 NA NA
2 below 5 102. NA NA
Part 8: Design Your Own Study
Scenario: You want to study if fish size differs between fast and slow water areas.
Design Questions:
What is your experimental unit? ___________________________________________________
How many replicates do you need? Use these values:
- Fast water mean: 75g
- Slow water mean: 85g
- Standard deviation: 15g
- Desired power: 80%
# Calculate your required sample size
<- 75
fast_mean <- 85
slow_mean <- 15
sd_val
<- abs(slow_mean - fast_mean) / sd_val
effect_size
<- pwr.t.test(
sample_result d = effect_size,
sig.level = 0.05,
power = 0.8,
type = "two.sample" # or "paired" if appropriate
)
sample_result
What could cause pseudoreplication in this design? ___________________________________________________
How would you avoid it? ___________________________________________________
Summary
Key Points:
- Experimental unit: The thing that receives the treatment independently
- Replication: Must be at the level of the experimental unit
- Power analysis: Plan sample size before collecting data
- Paired vs unpaired: Paired tests are more powerful when appropriate
- Effect size: Larger effects need fewer samples to detect
Common Mistakes:
- Treating subsamples as independent replicates
- Analyzing data at the wrong level
- Collecting data before planning analysis
- Ignoring natural pairing in the design