library(skimr)
library(patchwork)
library(janitor)
library(readxl)
library(car)
library(broom)
library(emmeans)
library(tidyverse)
theme_set(theme_light())08_homework_factorial_anova
Assignment Overview
This homework assignment analyzes crayfish growth data from Sargent and Lodge (2014) to examine differences in growth rates between native and invasive populations of rusty crayfish (Orconectes rusticus) across different lake environments using two-way ANOVA.
Learning Objectives
By completing this assignment, you will be able to:
- Understand two-way ANOVA concepts and applications
- Perform exploratory data analysis for factorial designs
- Conduct two-way ANOVA analysis
- Test statistical assumptions for ANOVA
- Interpret main effects and interactions
- Create publication-quality figures
- Write scientific methods and results sections
Data Description
The dataset contains growth measurements from a common garden experiment where young-of-year (YOY) rusty crayfish from native (Ohio) and invasive (Wisconsin) populations were grown in enclosures in three northern Wisconsin lakes during summer 2011.
Key variables:
- - range: Population origin (Native vs Invasive)
- - lake: Lake location (Big, High, Papoose)
- - growth_per_day: Daily growth rate (mm/day) - response variable
- - initial_length: Starting length (mm)
- - final_length: Ending length (mm)
- - days: Duration of experiment
Part 1: Data Loading and Preparation
1.1 Load and Clean the Data
Part 2: Statistical Analysis Setup
2.1 Analysis Type and Model
Type of Analysis: Two-way factorial ANOVA
Model Equation: Growth Rate = μ + Range + Lake + (Range × Lake) + ε
Where:
- - Growth Rate = daily growth rate (mm/day)
- - Range = population origin (Native vs Invasive)
- - Lake = lake environment (Big vs High vs Papoose)
- - Range × Lake = interaction between range and lake
- - ε = random error
Hypotheses:
- Main Effect
- - Range:
- - H₀: μ_Native = μ_Invasive (no difference in growth between ranges)
- - H₁: μ_Native ≠ μ_Invasive (difference exists between ranges)
 
- - Lake:
- - H₀: μ_Big = μ_High = μ_Papoose (no difference among lakes)
- - H₁: At least one lake mean differs from others
 
 
- - Range:
- Interaction Effect:
- - H₀: No interaction between Range and Lake 
- - H₁: Interaction exists between Range and Lake 
 
Variables:
- - Response: growth_per_day (continuous, mm/day)
- - Factor 1: range (categorical, 2 levels: Native, Invasive)
- - Factor 2: lake (categorical, 3 levels: Big, High, Papoose)
Part 3: Exploratory Data Analysis
3.1 Summary Statistics
# cray_df %>% group_by(range) %>% 
# skim()3.2 Exploratory Visualizations
Part 4: Two-Way ANOVA Analysis
4.1 Fit the ANOVA Model
Car ANOVA for unbalanced
# Anova(????, type = 3)4.2 Effect Sizes
# # note the code here is provided as it can be a mess....
# # name of datframe and such may need modificaiton....
# 
# eta_squared_df <- cray_df %>%
#   group_by(range, lake) %>%
#   summarise(mean_growth = mean(growth_per_day), .groups = "drop") %>%
#   ungroup()
# 
# total_ss <- sum((cray_df$growth_per_day - mean(cray_df$growth_per_day))^2)
# 
# anova_summary <- summary(crayfish_anova_model)
# range_ss <- anova_summary[[1]]["range ", "Sum Sq"]
# lake_ss <- anova_summary[[1]]["lake", "Sum Sq"]  
# interaction_ss <- anova_summary[[1]]["range:lake", "Sum Sq"]
# 
# eta_squared_range <- range_ss / total_ss
# eta_squared_lake <- lake_ss / total_ss
# eta_squared_interaction <- interaction_ss / total_ss
# 
# cat("Eta-squared (effect sizes):\n")
# cat("Range:", round(eta_squared_range, 3), "\n")
# cat("Lake:", round(eta_squared_lake, 3), "\n")
# cat("Range × Lake:", round(eta_squared_interaction, 3), "\n")Eta-squared represents the proportion of total variance explained by the factor (range).
- Formula: η² = SS_between / SS_total
- Range: 0 to 1
- Interpretation: If η² = 0.21, then 21% of the variance in growth rate is explained by population range
Omega-squared is a less biased estimate of effect size than eta-squared.
- Formula: ω² = (SS_between - df_between × MS_within) / (SS_total + MS_within)
- Range: 0 to 1 (but can be slightly negative)
- More conservative than η² because it adjusts for bias in small samples
Why Calculate Both?
- Eta-squared (η²): Easier to calculate and interpret, but slightly overestimates effect size
- Omega-squared (ω²): More accurate, unbiased estimate of population effect size
Effect Size Interpretation Guidelines:
Effect Size     η² / ω²       Interpretation
Small           0.01          1% of variance explained
Medium          0.06          6% of variance explained
Large           0.14          14% of variance explainedExample Output Interpretation:
If your results show:
  eta_squared omega_squared
1        0.21          0.20This means:
- 21% of the variance in crayfish growth rate is explained by population range (η²)
- 20% is the unbiased estimate of variance explained (ω²)
- This represents a large effect size (much larger than 0.14)
- Population range is a strong predictor of growth rate
Bottom line: Both metrics tell you how much of the differences in crayfish growth can be attributed to whether they’re from native vs. invasive populations, with omega-squared being the more conservative (and accurate) estimate.
4.3 Post-hoc Tests
# Estimated marginal means for main effects# Pairwise comparisons with Sidak correction# Estimated marginal means for interaction effect# Compact letter display for interaction effect# Custom interaction plot using emmeans results
# emmeans_interaction_df <- as.data.frame(emmeans_interaction)
# 
# emmeans_interaction_plot <- emmeans_interaction_df %>%
#   ggplot(aes(x = lake, y = emmean, color = range, group = range)) +
#   geom_point(size = 3, position = position_dodge(width=0.2)) +
#   geom_line(size = 1, position = position_dodge(width=0.2)) +
#   geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL), 
#                 width = 0.1, size = 1,
#                 , position = position_dodge(width=0.2)) +
#   labs(x = "Lake", 
#        y = "Estimated Marginal Mean Growth Rate (mm/day)",
#        color = "Range",
#        title = "Estimated Marginal Means with 95% Confidence Intervals") +
#   scale_color_manual(values = c("Native" = "coral", "Invasive" = "steelblue")) +
#   theme_classic()
# 
# emmeans_interaction_plotPart 5: Assumption Testing
5.1 Check ANOVA Assumptions
# # Create diagnostic plots
# par(mfrow = c(2, 2))
# plot(crayfish_anova_model)
# par(mfrow = c(1, 1))5.2 Formal Assumption Tests
Part 6: Publication Figure
6.1 Create Publication-Quality Figure
Submission Guidelines
What to turn in -
- a quarto markdown file and dataframe if you modified the original. All of the code should be able to run with what you turn in. (2 points) 
- a self-contained html file showing the code and output (2 points) 
- annotations in the quarto file that shows or tells what is being done in the r code chunks describing what you are trying to do - credit will be given even if it does not work as long as you detail what you are doing. As we start to move into more statistics you will be expected to interpret the results. (2 points) 
Points
- summary stats - 2 point
- assumptions and hypotheses - 3 points
- exploratory graphs - 2 point
- interpretation - 4 points
- Final figure - 1 point
- Results - 2 points