General Linearized Models (GLM)
Analysis of covariance (ANCOVA):
Want to compare chirping rate of two cricket species:
But:
ANCOVA lets us adjust for temperature effect to get a more powerful test!
Key concept in ANCOVA: the difference between “unadjusted” group means and “adjusted” means.
In this visualization:
Group Means (shown as asterisks): raw/unadjusted means for each group - simply the average X value and average Y value for all points in that group. Notice that Group A and Group B have different mean X values (they’re positioned at different points along the X axis).
Adjusted Means (shown as triangles): These are what ANCOVA actually compares. The adjusted means represent what each group’s mean would be if all groups had the same value of the covariate (in this case, the overall mean X).
The core purpose of ANCOVA is to make this adjustment. This is important because:
For a single-factor ANCOVA with factor A (p levels, i = 1 to p), a continuous covariate (x), and response variable (y):
\(Y_{ij} = \mu + \alpha_i + \beta(X_{ij} - \bar{X}) + \varepsilon_{ij}\)
Where: - \(Y_{ij}\) = response value for observation j in level i of factor A - \(\mu\) = overall mean - \(\alpha_i\) = effect of level i of factor A - \(\beta\) = common regression slope relating Y to X - \(X_{ij}\) = covariate value for observation j in level i of factor A - \(\bar{X}\) = mean value of covariate - \(\varepsilon_{ij}\) = error term
This model assumes homogeneous slopes across all treatment groups (we’ll test this later).
Basic ANCOVA model: - Response: continuous variable (y) - Predictor: categorical factor (A)
- Covariate: continuous variable (x)
The simplest ANCOVA model is:
Alternative: use aov() function
Both approaches use Type I SS (sequential). For unbalanced designs, you may want Type III SS using car package.
# Load the partridge dataset
partridge <- read.csv("data/partridge.csv")
# Look at the first few rows
head(partridge)  PARTNERS TYPE TREATMEN LONGEV  LLONGEV THORAX     RESID1 PREDICT1      RESID2
1        8    0        1     35 1.544068   0.64  -5.868456 40.86846 -0.04743024
2        8    0        1     37 1.568202   0.68  -9.301196 46.30120 -0.07105067
3        8    0        1     49 1.690196   0.68   2.698804 46.30120  0.05094369
4        8    0        1     46 1.662758   0.72  -5.733936 51.73394 -0.02424867
5        8    0        1     63 1.799341   0.72  11.266064 51.73394  0.11233405
6        8    0        1     39 1.591065   0.76 -18.166676 57.16668 -0.14369601
  PREDICT2
1 1.591498
2 1.639252
3 1.639252
4 1.687007
5 1.687007
6 1.734761# Create factors for treatment
partridge$TREATMEN <- as.factor(partridge$TREATMEN)
# Basic ANCOVA model
model1 <- lm(LONGEV ~ THORAX + TREATMEN, 
            data = partridge)
# View ANOVA table with Type I SS
anova(model1)Analysis of Variance Table
Response: LONGEV
           Df  Sum Sq Mean Sq F value    Pr(>F)    
THORAX      1 15496.6 15496.6 140.293 < 2.2e-16 ***
TREATMEN    4  9611.5  2402.9  21.753 1.719e-13 ***
Residuals 119 13144.7   110.5                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Using Type III SS from car package:
# Using Type III SS from car package
model2 <- lm(LONGEV ~ TREATMEN + THORAX, 
            data = partridge)
Anova(model2, type = "III")Anova Table (Type III tests)
Response: LONGEV
             Sum Sq  Df F value    Pr(>F)    
(Intercept)  2234.9   1  20.233 1.605e-05 ***
TREATMEN     9611.5   4  21.753 1.719e-13 ***
THORAX      13168.9   1 119.219 < 2.2e-16 ***
Residuals   13144.7 119                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1The Type III approach is often preferred for: - Unbalanced designs - When you want to test each effect adjusted for all others - More conservative approach when groups differ in covariate values
The ANOVA table for a single-factor ANCOVA has these components:
| Source | df | Sum of Squares | Mean Square | F-ratio | Expected MS | 
|---|---|---|---|---|---|
| Factor A (adjusted) | (p-1) | SS_A(adj) | MS_A(adj) = SS_A(adj)/(p-1) | MS_A(adj)/MS_Residual | σ² + n∑α²/(p-1) | 
| Covariate | 1 | SS_Covariate | MS_Covariate = SS_Covariate/1 | MS_Covariate/MS_Residual | σ² + β²∑(X-X̄)² | 
| Residual | n-p-1 | SS_Residual | MS_Residual = SS_Residual/(n-p-1) | σ² | |
| Total | n-1 | SS_Total | 
Treatment Effect (adjusted for covariate) \(H_0: \alpha_1 = \alpha_2 = ... = \alpha_p = 0\)
Covariate Effect \(H_0: \beta = 0\)
Homogeneity of Slopes (test this first!) \(H_0: \beta_1 = \beta_2 = ... = \beta_p\)
ANCOVA assumes the regression slopes are the same for all groups (parallel regression lines)
To test this assumption:
# Test homogeneity of slopes in partridge data
# by adding interaction term
model_int <- lm(LONGEV ~ TREATMEN * THORAX, 
               data = partridge)
# Test significance of interaction
anova(model_int)Analysis of Variance Table
Response: LONGEV
                 Df  Sum Sq Mean Sq  F value    Pr(>F)    
TREATMEN          4 11939.3  2984.8  26.1983 1.896e-15 ***
THORAX            1 13168.9 13168.9 115.5855 < 2.2e-16 ***
TREATMEN:THORAX   4    42.5    10.6   0.0933    0.9844    
Residuals       115 13102.1   113.9                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1The p-value for the interaction is 0.984. Since p > 0.05, we can proceed with standard ANCOVA (assuming homogeneous slopes).
If the interaction term is significant (p < 0.05), the slope-group relationship is not the same across groups.
Options:
Report the interaction - this is a biologically interesting result!
Separate regressions - analyze each group separately
Johnson-Neyman procedure - identifies regions of the covariate where groups differ significantly
Alternative models - consider transformation, polynomial terms, or more complex models
# 1. Examine the partridge data structure
# str(partridge)
# Create better names for treatments
partridge$treatment <- factor(partridge$TREATMEN,
                            levels = 1:5,
                            labels = c("No females", 
                                      "One virgin female daily",
                                      "Eight virgin females daily",
                                      "One inseminated female daily",
                                      "Eight inseminated females daily"))
# 2. Create a plot of the data showing relationship
ggplot(partridge, aes(x = THORAX, y = LONGEV, color = treatment)) + 
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Relationship between Thorax Length and Longevity",
       x = "Thorax Length (mm)",
       y = "Longevity (days)",
       color = "Treatment") +
  theme_minimal() +
  theme(legend.position = "bottom")# Test for homogeneity of slopes
homo_slopes_model <- lm(LONGEV ~ THORAX * treatment, data = partridge)
anova(homo_slopes_model)Analysis of Variance Table
Response: LONGEV
                  Df  Sum Sq Mean Sq  F value    Pr(>F)    
THORAX             1 15496.6 15496.6 136.0170 < 2.2e-16 ***
treatment          4  9611.5  2402.9  21.0905 4.617e-13 ***
THORAX:treatment   4    42.5    10.6   0.0933    0.9844    
Residuals        115 13102.1   113.9                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1The p-value for the interaction term (treatment × THORAX) is 0.984. Since this value is > 0.05, we can assume homogeneous slopes and proceed with the standard ANCOVA.
# Fit the ANCOVA model (without interaction)
ancova_model <- lm(LONGEV ~ THORAX + treatment, data = partridge)
# View ANOVA table
anova(ancova_model)Analysis of Variance Table
Response: LONGEV
           Df  Sum Sq Mean Sq F value    Pr(>F)    
THORAX      1 15496.6 15496.6 140.293 < 2.2e-16 ***
treatment   4  9611.5  2402.9  21.753 1.719e-13 ***
Residuals 119 13144.7   110.5                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = LONGEV ~ THORAX + treatment, data = partridge)
Residuals:
    Min      1Q  Median      3Q     Max 
-26.189  -6.599  -0.989   6.408  30.244 
Coefficients:
                                         Estimate Std. Error t value Pr(>|t|)
(Intercept)                               -46.055     10.239  -4.498 1.61e-05
THORAX                                    135.819     12.439  10.919  < 2e-16
treatmentOne virgin female daily           -3.929      2.997  -1.311 0.192347
treatmentEight virgin females daily        -1.276      2.983  -0.428 0.669517
treatmentOne inseminated female daily     -10.946      2.999  -3.650 0.000391
treatmentEight inseminated females daily  -23.879      2.973  -8.031 7.83e-13
                                            
(Intercept)                              ***
THORAX                                   ***
treatmentOne virgin female daily            
treatmentEight virgin females daily         
treatmentOne inseminated female daily    ***
treatmentEight inseminated females daily ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 10.51 on 119 degrees of freedom
Multiple R-squared:  0.6564,    Adjusted R-squared:  0.6419 
F-statistic: 45.46 on 5 and 119 DF,  p-value: < 2.2e-16# Get adjusted means using emmeans
adjusted_means <- emmeans(ancova_model, "treatment")
adjusted_means treatment                       emmean   SE  df lower.CL upper.CL
 No females                        65.4 2.11 119     61.3     69.6
 One virgin female daily           61.5 2.11 119     57.3     65.7
 Eight virgin females daily        64.2 2.10 119     60.0     68.3
 One inseminated female daily      54.5 2.11 119     50.3     58.7
 Eight inseminated females daily   41.6 2.12 119     37.4     45.8
Confidence level used: 0.95  contrast                                                       estimate   SE
 No females - One virgin female daily                               3.93 3.00
 No females - Eight virgin females daily                            1.28 2.98
 No females - One inseminated female daily                         10.95 3.00
 No females - Eight inseminated females daily                      23.88 2.97
 One virgin female daily - Eight virgin females daily              -2.65 2.98
 One virgin female daily - One inseminated female daily             7.02 2.97
 One virgin female daily - Eight inseminated females daily         19.95 3.01
 Eight virgin females daily - One inseminated female daily          9.67 2.98
 Eight virgin females daily - Eight inseminated females daily      22.60 2.99
 One inseminated female daily - Eight inseminated females daily    12.93 3.01
  df t.ratio p.value
 119   1.311  0.6849
 119   0.428  0.9929
 119   3.650  0.0035
 119   8.031  <.0001
 119  -0.891  0.8996
 119   2.361  0.1336
 119   6.636  <.0001
 119   3.249  0.0129
 119   7.560  <.0001
 119   4.298  0.0003
P value adjustment: tukey method for comparing a family of 5 estimates Constable (1993) studied shrinking in sea urchin test: - Compared suture widths between treatments - Three groups: high food, low food, initial sample - Covariate: body volume (cube root transformed)
The analysis showed: - Significant interaction between treatment and covariate - Heterogeneous slopes across treatments - Can’t use standard ANCOVA
Analysis of Variance Table
Response: suture_width
                 Df   Sum Sq   Mean Sq F value    Pr(>F)    
volume            1 0.015724 0.0157238  176.91 < 2.2e-16 ***
treatment         2 0.036482 0.0182411  205.23 < 2.2e-16 ***
volume:treatment  2 0.006213 0.0031064   34.95 4.453e-11 ***
Residuals        66 0.005866 0.0000889                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1When you have heterogeneous slopes, the Johnson-Neyman procedure identifies regions of the covariate where groups differ:
The biological interpretation is that food regime affects suture width differently depending on urchin size. This interaction is biologically meaningful and would be missed if we only looked at adjusted means!
When ANCOVA assumptions are violated, consider:
# Example of rank-transformed ANCOVA
partridge$rank_LONGEV <- rank(partridge$LONGEV)
partridge$rank_THORAX <- rank(partridge$THORAX)
# Fit ANCOVA on ranked data
rank_ancova <- lm(rank_LONGEV ~ 
                  rank_THORAX + treatment, 
                data = partridge)
anova(rank_ancova)Analysis of Variance Table
Response: rank_LONGEV
             Df Sum Sq Mean Sq F value    Pr(>F)    
rank_THORAX   1  63622   63622 129.511 < 2.2e-16 ***
treatment     4  40447   10112  20.584 6.536e-13 ***
Residuals   119  58458     491                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1# Compare p-values with parametric ANCOVA
cat("P-value for treatment effect (parametric):", 
    round(anova(ancova_model)[2, "Pr(>F)"], 4), "\n")P-value for treatment effect (parametric): 0 P-value for treatment effect (rank-based): 0Note: The permutation test is commented out as it requires the lmPerm package, which may not be installed. The rank-based approach is shown as a simple alternative.
Here’s how you might write up ANCOVA results for publication:
“We analyzed the effects of mating strategy on male fruitfly longevity using analysis of covariance (ANCOVA), with thorax length as a covariate. Before conducting the main analysis, we tested the homogeneity of slopes assumption and found no significant interaction between treatment and thorax length (F₄,₁₁₅ = 1.56, P = 0.19), indicating that the effect of body size on longevity was consistent across treatments.
The ANCOVA revealed significant effects of both treatment (F₄,₁₁₉ = 27.97, P < 0.001) and thorax length (F₁,₁₁₉ = 145.44, P < 0.001) on longevity. Thorax length was positively associated with longevity (b = 1.19), with larger males living longer. After adjusting for body size, males with no female partners lived significantly longer (adjusted mean ± SE: 1.81 ± 0.02 log₁₀ days) than males in any other treatment group. Males provided with a single virgin female daily (1.77 ± 0.02) or a single inseminated female daily (1.79 ± 0.02) showed intermediate longevity, while males with eight females per day showed the lowest longevity (1.72 ± 0.02 for inseminated females; 1.59 ± 0.02 for virgin females). Pairwise comparisons using Tukey’s HSD test indicated significant differences between all treatment groups (P < 0.05) except between the two treatments with a single female per day (P = 0.42).”