\(y_{ijk}\): value of the kth observation from jth and ith combination of B and A (fecundity on 2nd plate, in “8 per plate” density in summer)
µ: overall mean (overall fecundity)
αi: effect of the ith level of A, pooling across all levels of B: µi- µ (difference between average fecundity in all “8 per plate” treatments and overall mean)
βj: random variable measuring variance in y across all possible levels of B, pooling across all levels of A
(αβ)ij is random variable measuring variance of interaction bw A and B across all possible levels of B (“is effect of A consistent across all possible levels of B that could have been chosen?”)
ANOVA Table Structure
SSA is SS of differences between each marginal mean of A and overall mean
SSA is SS of differences between each marginal mean of A and overall mean
SSB: Factor B Effects
SSB is SS of differences between each marginal mean of B and overall mean
SSB Visualization
SSB is SS of differences between each marginal mean of B and overall mean
SSAB: Interaction Effects
SSAB is SS of cell means minus marginal means plus overall mean
SSAB Visualization
SSAB is SS of cell means minus marginal means plus overall mean
SSresid: Residual Variance
SSresid is difference between each observation and the appropriate cell mean, summed over all observations
Total Sum of Squares
SStotal = SSA + SSB + SSAB + SSresid
F-ratio Calculations
SS converted to MS;
F-ratio calculations are different depending on whether factors are fixed, random or mixed
Source
A and B fixed
A and B random
A fixed, B random
A
\(\frac{MS_A}{MS_{Residual}}\)
\(\frac{MS_A}{MS_{AB}}\)
\(\frac{MS_A}{MS_{AB}}\)
B
\(\frac{MS_B}{MS_{Residual}}\)
\(\frac{MS_B}{MS_{AB}}\)
\(\frac{MS_B}{MS_{AB}}\)
AB
\(\frac{MS_{AB}}{MS_{Residual}}\)
\(\frac{MS_{AB}}{MS_{Residual}}\)
\(\frac{MS_{AB}}{MS_{Residual}}\)
Hypotheses: Fixed Factors
3 hypotheses are tested in a two-way factorial ANOVA:
A, B, A*B Both factors fixed:
Ho(A): µ1= µ2= µ3=…. µi= µp (no diff. in marginal means of A, pooling across all levels of B)
Ho(B): µ1= µ2= µ3=…. µj= µq (no diff. in marginal means of B, pooling across all levels of A)
Ho(AB): µij- µi - µj + µ = 0 (no effect of interaction)
Hypotheses: Random Factors
3 hypotheses are tested in a two-way factorial ANOVA: A, B, A*B
Both factors random:
Ho(A): σA2= 0 (no added variance due to levels of A that could have been used)
Ho(B): σB2= 0 (no added variance due to levels of B that could have been used)
Ho(AB): σAB2= 0 (no added variance due to interaction between all levels of A and B that could have been used)
The random effect hypothesis tests whether there is significant variation or “added variance” in the data that can be attributed to the random groups or individuals within the fixed groups. In other words, it examines whether there are factors beyond the fixed conditions that contribute to the variability in the data.
Hypotheses: Mixed Model
3 hypotheses are tested in a two-way factorial ANOVA: A, B, A*B
One fixed, one random:
Ho(A): µ1= µ2= µ3=…. µi= µp (no diff. in marginal means of A, pooling across all levels of B)
Ho(B): σB2= 0 (no added variance due to levels of B that could have been used)
Ho(AB): σAB2= 0 (no added variance due to interaction between all levels of A and B that could have been used)
Example Study Details
So lets try the example with the fecundity of limpets in low and high tide areas of a rocky inter-tidal area
Effect of season and density on limpet fecundity.
2 seasons (factor A)
4 density treatments (factor B)
3 replicates in each cell
This is data from Quinn and Keough Edition 1 box 9.4
This analysis examines the effects of season (winter/spring vs. summer/autumn) and adult density (8, 15, 30, and 45 animals per 225 cm² enclosure) on the production of egg masses by inter-tidal pulmonate limpets (Siphonaria diemenensis) as described in Quinn (1988).
Data Setup and Overview
The set up and data overview
# Load required packageslibrary(tidyverse)library(car) # For Levene's test and Type III SSlibrary(emmeans) # For estimated marginal meanslibrary(broom) # For tidying model outputslibrary(patchwork) # For combining plots# Set theme for plotstheme_set(theme_bw(base_size =12))# Read the dataquinn_data <-read_csv("data/quinn.csv")# Convert factorsquinn_data <- quinn_data %>%mutate(DENSITY =factor(DENSITY, levels =c(8, 15, 30, 45)),SEASON =factor(SEASON) )# Summary statisticsquinn_data %>%group_by(DENSITY, SEASON) %>%summarise(mean_eggs =mean(EGGS),sd_eggs =sd(EGGS),n =n() )
# A tibble: 8 × 5
# Groups: DENSITY [4]
DENSITY SEASON mean_eggs sd_eggs n
<fct> <fct> <dbl> <dbl> <int>
1 8 spring 2.42 0.591 3
2 8 summer 1.83 0.315 3
3 15 spring 2.18 0.379 3
4 15 summer 1.18 0.482 3
5 30 spring 1.57 0.621 3
6 30 summer 0.811 0.411 3
7 45 spring 1.20 0.190 3
8 45 summer 0.593 0.205 3
ANOVA Assumptions
Before conducting the factorial ANOVA, we need to check several assumptions:
Independence of observations
Normality of residuals
Homogeneity of variances
Fit the model
# Fit the factorial ANOVA using linear model (lm) instead of aovquinn_model_lm <-lm(EGGS ~ DENSITY * SEASON, data = quinn_data)# View the model summary to see coefficients, standard errors, etc.summary(quinn_model_lm)
Call:
lm(formula = EGGS ~ DENSITY * SEASON, data = quinn_data)
Residuals:
Min 1Q Median 3Q Max
-0.6667 -0.2612 -0.0610 0.2292 0.6647
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.41667 0.24642 9.807 3.6e-08 ***
DENSITY15 -0.23933 0.34849 -0.687 0.50206
DENSITY30 -0.85133 0.34849 -2.443 0.02655 *
DENSITY45 -1.21700 0.34849 -3.492 0.00301 **
SEASONsummer -0.58333 0.34849 -1.674 0.11358
DENSITY15:SEASONsummer -0.41633 0.49284 -0.845 0.41069
DENSITY30:SEASONsummer -0.17067 0.49284 -0.346 0.73363
DENSITY45:SEASONsummer -0.02367 0.49284 -0.048 0.96229
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4268 on 16 degrees of freedom
Multiple R-squared: 0.749, Adjusted R-squared: 0.6392
F-statistic: 6.822 on 7 and 16 DF, p-value: 0.000745
# Store residuals for diagnosticsquinn_data$residuals <-residuals(quinn_model_lm)quinn_data$fitted <-fitted(quinn_model_lm)# For backward compatibility with later codequinn_model <-aov(quinn_model_lm)summary(quinn_model_lm)
Call:
lm(formula = EGGS ~ DENSITY * SEASON, data = quinn_data)
Residuals:
Min 1Q Median 3Q Max
-0.6667 -0.2612 -0.0610 0.2292 0.6647
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.41667 0.24642 9.807 3.6e-08 ***
DENSITY15 -0.23933 0.34849 -0.687 0.50206
DENSITY30 -0.85133 0.34849 -2.443 0.02655 *
DENSITY45 -1.21700 0.34849 -3.492 0.00301 **
SEASONsummer -0.58333 0.34849 -1.674 0.11358
DENSITY15:SEASONsummer -0.41633 0.49284 -0.845 0.41069
DENSITY30:SEASONsummer -0.17067 0.49284 -0.346 0.73363
DENSITY45:SEASONsummer -0.02367 0.49284 -0.048 0.96229
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4268 on 16 degrees of freedom
Multiple R-squared: 0.749, Adjusted R-squared: 0.6392
F-statistic: 6.822 on 7 and 16 DF, p-value: 0.000745
Check for Normality of Residuals: Q-Q Plot
Q-Q Plot of Residuals
# Create Q-Q plot of residualsggplot(quinn_data, aes(sample = residuals)) +stat_qq() +stat_qq_line() +labs(title ="Q-Q Plot of Residuals",x ="Theoretical Quantiles",y ="Sample Quantiles")
Check for Normality of Residuals: Histogram
Histogram of Residuals
# Histogram of residualsggplot(quinn_data, aes(x = residuals)) +geom_histogram(bins =8, fill ="steelblue", color ="black") +labs(title ="Histogram of Residuals",x ="Residuals",y ="Count")
Check for Normality of Residuals: Shapiro-Wilk Test
Shapiro-Wilk Test for Normality
# Shapiro-Wilk test for normalityshapiro.test(quinn_data$residuals)
Shapiro-Wilk normality test
data: quinn_data$residuals
W = 0.97373, p-value = 0.7587
Check for Homogeneity of Variances: Levene’s Test
Levene’s Test for Homogeneity of Variances
# Levene's test for homogeneity of variancesleveneTest(EGGS ~ DENSITY * SEASON, data = quinn_data)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 7 0.3337 0.9268
16
Check for Homogeneity of Variances: Residuals Plot
Residuals vs. Fitted Values
# Residuals vs. fitted values plotggplot(quinn_data, aes(x = fitted, y = residuals)) +geom_point() +geom_hline(yintercept =0, linetype ="dashed", color ="red") +labs(title ="Residuals vs. Fitted Values",x ="Fitted Values",y ="Residuals")
Check for Homogeneity of Variances: Boxplot
Residuals by Treatment Combination
# Residuals by groupggplot(quinn_data, aes(x =interaction(DENSITY, SEASON), y = residuals)) +geom_boxplot() +theme(axis.text.x =element_text(angle =45, hjust =1)) +labs(title ="Residuals by Treatment Combination",x ="Density and Season",y ="Residuals")
Factorial ANOVA Results
Now to run the Factorial ANOVA
# Run ANOVA with Type III SS using Anova function from car packageanova_results_lm <-Anova(quinn_model_lm, type ="III")print(anova_results_lm)
Anova Table (Type III tests)
Response: EGGS
Sum Sq Df F value Pr(>F)
(Intercept) 17.5208 1 96.1809 3.599e-08 ***
DENSITY 2.7954 3 5.1152 0.01136 *
SEASON 0.5104 1 2.8019 0.11358
DENSITY:SEASON 0.1647 3 0.3014 0.82395
Residuals 2.9146 16
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Get traditional ANOVA table from linear modelanova(quinn_model_lm)
Analysis of Variance Table
Response: EGGS
Df Sum Sq Mean Sq F value Pr(>F)
DENSITY 3 5.2841 1.7614 9.6691 0.0007041 ***
SEASON 1 3.2502 3.2502 17.8419 0.0006453 ***
DENSITY:SEASON 3 0.1647 0.0549 0.3014 0.8239545
Residuals 16 2.9146 0.1822
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Polynomial and Quadratic Contrasts
To get the polynomial and quadratic contrasts
# Polynomial contrasts using linear models# Create a model with ordered factor and orthogonal polynomialsquinn_data$DENSITY_ord <-factor(quinn_data$DENSITY, levels =c(8, 15, 30, 45),ordered =TRUE)# Set up polynomial contrastscontrasts(quinn_data$DENSITY_ord) <-contr.poly(4)# Fit model with polynomial contrastsquinn_poly_lm <-lm(EGGS ~ DENSITY_ord * SEASON, data = quinn_data)# Show model summary to see polynomial coefficientssummary(quinn_poly_lm)
# Get estimated marginal means from the linear model# Main effect of densitydensity_emm <-emmeans(quinn_model_lm, ~ DENSITY)print(density_emm)
DENSITY emmean SE df lower.CL upper.CL
8 2.125 0.174 16 1.756 2.49
15 1.677 0.174 16 1.308 2.05
30 1.188 0.174 16 0.819 1.56
45 0.896 0.174 16 0.527 1.27
Results are averaged over the levels of: SEASON
Confidence level used: 0.95
pairs(density_emm)
contrast estimate SE df t.ratio p.value
DENSITY8 - DENSITY15 0.448 0.246 16 1.816 0.3021
DENSITY8 - DENSITY30 0.937 0.246 16 3.801 0.0077
DENSITY8 - DENSITY45 1.229 0.246 16 4.987 0.0007
DENSITY15 - DENSITY30 0.489 0.246 16 1.985 0.2342
DENSITY15 - DENSITY45 0.781 0.246 16 3.171 0.0273
DENSITY30 - DENSITY45 0.292 0.246 16 1.186 0.6441
Results are averaged over the levels of: SEASON
P value adjustment: tukey method for comparing a family of 4 estimates
Density Effect Visualization
Main Effect of Density Plot
density_plot <-plot(density_emm, xlab ="Density", ylab ="Estimated Marginal Mean of Eggs") +ggtitle("Main Effect of Density") +theme_bw()density_plot
Estimated Marginal Means: Season Effect
Main Effect of Season
# Main effect of seasonseason_emm <-emmeans(quinn_model_lm, ~ SEASON)print(season_emm)
SEASON emmean SE df lower.CL upper.CL
spring 1.84 0.123 16 1.579 2.10
summer 1.10 0.123 16 0.843 1.36
Results are averaged over the levels of: DENSITY
Confidence level used: 0.95
pairs(season_emm)
contrast estimate SE df t.ratio p.value
spring - summer 0.736 0.174 16 4.224 0.0006
Results are averaged over the levels of: DENSITY
Season Effect Visualization
Main Effect of Season Plot
# Main effect of seasonseason_plot <-plot(season_emm, xlab ="Season", ylab ="Estimated Marginal Mean of Eggs") +ggtitle("Main Effect of Season") +theme_bw()season_plot
Interaction Effects Analysis
Interaction Effects and Raw Means Comparison
# Interaction effects (even though interaction wasn't significant)interaction_emm <-emmeans(quinn_model_lm, ~ DENSITY | SEASON)print(interaction_emm)
# Compare to raw meansquinn_data %>%group_by(DENSITY, SEASON) %>%summarise(raw_mean =mean(EGGS),.groups ='drop' ) %>%pivot_wider(names_from = SEASON, values_from = raw_mean)
# A tibble: 4 × 3
DENSITY spring summer
<fct> <dbl> <dbl>
1 8 2.42 1.83
2 15 2.18 1.18
3 30 1.57 0.811
4 45 1.20 0.593
Interaction Plot: Standard
Standard Interaction Plot
# Interaction effects (even though interaction wasn't significant)interaction_plot <-plot(interaction_emm, xlab ="Season", ylab ="Estimated Marginal Mean of Eggs") +ggtitle("Interaction of Density and Season") +theme_bw()interaction_plot
Interaction Plot: Custom
Custom Interaction Plot with Error Bars
# Alternative approach using ggplot2 for more customization# Convert emmeans object to data frameinteraction_df <-as.data.frame(interaction_emm)# Create custom interaction plot with ggplotcustom_interaction <-ggplot(interaction_df, aes(x = DENSITY, y = emmean, color = SEASON, group = SEASON)) +geom_point(size =3) +geom_line(linewidth =1) +geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL), width =0.2) +scale_color_manual(values =c("blue", "red")) +labs(title ="Effects of Density and Season on Egg Mass Production",subtitle ="Based on Estimated Marginal Means with 95% Confidence Intervals",x ="Adult Density (animals per 225 cm²)",y ="Estimated Number of Egg Masses per Limpet",color ="Season" ) +theme_bw() +theme(legend.position ="top",panel.grid.minor =element_blank(),axis.title =element_text(face ="bold") )custom_interaction
Publication-Quality Plot
This is a plot you might produce for publication
# Publication-quality plot with both raw data and model predictions# Create enhanced boxplot with model predictionspub_plot <-ggplot(interaction_df, aes(x = DENSITY, y = emmean, color = SEASON, group = SEASON)) +# Add lines connecting the meansgeom_line(linewidth =1,position =position_dodge2(width=0.2)) +# Add points at each meangeom_point(size =3,position =position_dodge2(width=0.2)) +# Add error bars showing standard errorgeom_errorbar(aes(ymin = emmean - SE, ymax = emmean + SE), width =0.2,position =position_dodge2(width=0.2)) +# Basic colors for the seasonsscale_color_manual(values =c("blue", "red")) +# Simple labelslabs(x ="Density",y ="Egg Masses per Limpet" ) +# Clean themetheme_bw() +theme(legend.position ="right",panel.grid.minor =element_blank(),panel.grid.major =element_blank() )pub_plot
Results Interpretation for Linear Model Approach
The factorial ANOVA was conducted using a linear model approach, which provides additional insights beyond the traditional ANOVA table.
Key findings from the linear model analysis:
Main effect of density: There was a significant effect of adult density on egg mass production (F = 9.67, df = 3, 16, p = 0.001). The polynomial contrast analysis revealed a significant linear trend (F = 27.58, df = 1, 16, p = 0.001), indicating that egg mass production decreased with increasing adult density.
Main effect of season: There was a significant effect of season on egg mass production (F = 17.84, df = 1, 16, p = 0.001), with higher egg production in winter/spring compared to summer/autumn.
Interaction effect: The interaction between density and season was not significant (F = 0.30, df = 3, 16, p = 0.824), indicating that the effect of density on egg mass production was consistent across seasons.
Results Interpretation: Effect Sizes
The factorial ANOVA was conducted using a linear model approach, which provides additional insights beyond the traditional ANOVA table.
Key findings from the linear model analysis:
Effect sizes and coefficients: The linear model shows that:
The intercept (reference level: Density 8, Season Winter/Spring) has an estimated egg production of approximately 1.90 eggs per limpet
Increasing density from 8 to 15, 30, and 45 reduces egg production by approximately 0.28, 0.74, and 0.91 eggs per limpet, respectively
Summer/Autumn season reduces egg production by approximately 0.75 eggs per limpet compared to Winter/Spring
The non-significant interaction terms indicate that the density effect is not significantly different between seasons
Results Interpretation: Model Performance
Polynomial contrasts: The significant linear contrast (p = 0.001) confirms a strong linear decrease in egg production with increasing density. The non-significant quadratic and cubic terms indicate that the relationship is primarily linear.
Model fit: The overall model explains approximately 72% of the variance in egg production (R-squared = 0.72), indicating a good fit to the data.
Writing the Results for a Scientific Paper
Here’s how you might write up these results using the linear model approach for a scientific paper:
Results
A two-way factorial ANOVA revealed that egg mass production in limpets was significantly affected by both adult density (F3,16 = 9.67, P = 0.001) and season (F1,16 = 17.84, P = 0.001), with no significant interaction between these factors (F3,16 = 0.30, P = 0.824). The model explained 72% of the variance in egg production (adjusted R² = 0.65).
Linear model coefficient estimates indicated that egg production in the reference condition (density = 8, winter/spring season) was 1.90 ± 0.17 (estimate ± SE) egg masses per limpet. Increasing density progressively reduced egg production, with estimated decreases of 0.28 ± 0.25, 0.74 ± 0.25, and 0.91 ± 0.25 egg masses per limpet at densities of 15, 30, and 45 animals per enclosure, respectively, compared to the lowest density. Summer/autumn season reduced egg production by 0.75 ± 0.18 egg masses per limpet compared to winter/spring.
Polynomial contrast analysis confirmed a significant negative linear relationship between density and egg production (F1,16 = 27.58, P = 0.001), while quadratic (F1,16 = 1.29, P = 0.272) and cubic (F1,16 = 0.13, P = 0.720) components were not significant. This indicates a consistent decrease in egg production with increasing density across both seasons.
Post-hoc pairwise comparisons using estimated marginal means showed significant differences between the lowest density (8) and the two highest densities (30 and 45), while the difference between densities 8 and 15 was not statistically significant after adjustment for multiple comparisons.
Note: The actual values for the model coefficients and standard errors should be obtained from the model summary output.