Lecture 10 - Multiple Regression

Bill Perry

Lecture 09: Review

Covered

Regression T-Test Anova
Regression Assumptions
sum of squared allocation and subdivision from total sums of squares

Lecture 10: Overview

Multiple Linear Regression model

Regression parameters
Analysis of variance
Null hypotheses
Explained variance
Assumptions and diagnostics
Collinearity
Interactions
Dummy variables
Model selection
Importance of predictors

Lecture 10: Analyses

What if more than one predictor (X) variable?

If predictors continuous
Mix between categorical and continuous
Can use multiple linear regression

	Independent variable
Dependent variable	Continuous	Categorical
Continuous	Regression	ANOVA
Categorical	Logistic regression

Lecture 10: Analyses

Abundance of ants can be modeled as function of

latitude
longitude
both

Instead of line, modeled with (hyper)plane

Lecture 10: Analyses

Used in similar way to simple linear regression:

Describe nature of relationship between Y and X’s
Determine explained / unexplained variation in Y
Predict new Ys from X
Find the “best” model

Lecture 10: Analyses

Crawley 2012: “Multiple regression models provide some of the most profound challenges faced by the analyst”:

Overfitting
Parameter proliferation
Multicollinearity
Model selection

Lecture 10: Analyses

Multiple Regression:

Set of i= 1 to n observations
fixed X-values for p predictor variables (X1, X2…Xp)
random Y-values:

\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]

y_i: value of Y for i^th observation X₁ = x_i1, X₂ = x_i2,…, x_p = x_ip
β₀: population intercept, the mean value of Y when X₁= 0, X₂ = 0,…, X_p = 0

Lecture 10: Multiple linear regression model

Multiple Regression:

\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]

β₁: partial regression slope, change in Y per unit change in X₁ holding other X-vars constant
β₂: partial regression slope, change in Y per unit change in X₂ holding other X-vars constant
β_p: partial regression slope, change in Y per unit change in X_p holding other X-vars constant

What is partial - the relationship between a predictor variable and the response variable while holding all other predictor variables constant. It tells you the isolated effect of one variable, controlling for the influence of others.

Lecture 10: Partial regression slopes

model: ant_spp ~ elevation + latitude

The partial slope for elevation tells you how ant species richness changes with elevation when latitude is held constant
- 1-m increase in elevation - ant spp decrease by 0.012 spp, holding latitude constant
- Or 100-meter increase in elevation, lose 1.2 spp
The partial slope for latitude tells you how ant species richness changes with latitude when elevation is held constant
- 1-degree increase in latitude, ant spp decrease by 2 species, holding elevation constant
- Moving north = fewer ant spp, even when comparing sites at same elevation

model <- lm(ant_spp ~ elevation + latitude, data = ant_df)
summary(model)


Call:
lm(formula = ant_spp ~ elevation + latitude, data = ant_df)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.1180 -2.3759  0.3218  1.9070  5.8369 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) 98.49651   26.50701   3.716  0.00147 **
elevation   -0.01226    0.00411  -2.983  0.00765 **
latitude    -2.00981    0.61956  -3.244  0.00427 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.022 on 19 degrees of freedom
Multiple R-squared:  0.5543,    Adjusted R-squared:  0.5074 
F-statistic: 11.82 on 2 and 19 DF,  p-value: 0.000463

Lecture 10: Regression parameters

Multiple Regression:

\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]

ε_i: unexplained error - difference between y_i and value predicted by model (ŷ_i)
ant_spp = β₀ + β₁(lat) + β₂(elevation) + ε_i

Lecture 10: Regression parameters

Multiple Regression:

\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]

Estimate multiple regression parameters (intercept, partial slopes) using OLS to fit the regression line
OLS minimize ∑(y_i-ŷ_i)², the SS (vertical distance) between observed y_i and predicted ŷ_i for each x_ij
ε estimated as residuals: ε_i = y_i-ŷ_i
Calculation solves set of simultaneous normal equations with matrix algebra

Lecture 10: Regression parameters

Regression equation can be used for prediction by subbing new values for predictor (X) variables

Confidence intervals calculated for parameters
Confidence and prediction intervals depend on number of observations and number of predictors
- More observations decrease interval width
- More predictors increase interval width
Prediction should be restricted to within range of X variables

Lecture 10: Analyses of variance or sums of squares

Variance - SS_total partitioned into SS_regression and SS_residual

SS_regression is variance in Y explained by model
SS_residual is variance not explained by model

Source of variation	SS	df	MS	Interpretation
Regression	\(\sum_{i=1}^{n} (y_i - \bar{y})^2\)	\(p\)	\(\frac{\sum_{i=1}^{n} (y_i - \bar{y})^2}{p}\)	Difference between predicted observation and mean
Residual	\(\sum_{i=1}^{n} (y_i - \hat{y}_i)^2\)	\(n-p-1\)	\(\frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{n-p-1}\)	Difference between each observation and predicted
Total	\(\sum_{i=1}^{n} (y_i - \bar{y})^2\)	\(n-1\)		Difference between each observation and mean

Lecture 10: Analyses

SS converted to non-additive MS=(SS/df)

MS_residual: estimate population variance
MS_regression: estimate population variance + variation due to strength of X-Y relationships
MS is an estimate of variance that doesn’t increase with sample size the way Sum of Squares (SS) does

Source of variation	SS	df	MS
Regression	\(\sum_{i=1}^{n} (y_i - \bar{y})^2\)	\(p\)	\(\frac{\sum_{i=1}^{n} (y_i - \bar{y})^2}{p}\)
Residual	\(\sum_{i=1}^{n} (y_i - \hat{y}_i)^2\)	\(n-p-1\)	\(\frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{n-p-1}\)
Total	\(\sum_{i=1}^{n} (y_i - \bar{y})^2\)	\(n-1\)

Lecture 10: Hypotheses

Two Null Hypotheses usually tested in MLR:

“Basic” H_o: all partial regression slopes equal 0; β₁ = β₂ = … = β_p = 0
If “basic” H_o true, MS_regression and MS_residual estimate variance and their ratio (F-ratio) = 1
If “basic” H_o false (at least one β ≠ 0) MS_regression estimates variance + partial regression slope and their ratio (F-ratio) will be > 1 - F-ratio compared to F-distribution for p-value

Lecture 10: Hypotheses

Also: is any specific β = 0 (explanatory role)?

E.g., does LAT have effect on Ant species?
These H’s tested through model comparison
Model 1 - model with 2 predictors X₁, X₂:
- yi= β₀ +β₁x_i1+β₂x_i2+ ε_i
Model 2 - To test H_o that β₂ = 0 compare fit of model 1 to model 2 with 1 predictor:
- y_i= β₀ +β₁x_i1+ ε_i

Lecture 10: Hypotheses

If SS_regression of model₁=model₂, cannot reject H_o β₁ = 0
- adding X₂ did not explain any additional variance
If SS_regression of mod1 > mod2, evidence to reject H_o β₁ = 0
- adding X2 explains more variance
- evidence to reject H0: β₂ = 2
SS for β₁ is SS_extraβ₁ = Full SS_regression - Reduced SS_regression
- This quantifies how much additional variance X₂ explains
Use partial F-test to test Ho β₁ = 0 :

\[F_{w,n-p} = \frac{MS_{Extra}}{FULL\ MS_{Residual}} \] Can also use t-test (R provides this value)

Lecture 10: Explained variance

Explained variance (r²) is calculated the same way as for simple regression:

\[r^2 = \frac{SS_{Regression}}{SS_{Total}} = 1 - \frac{SS_{Residual}}{SS_{Total}} \]

r² values can not be used to directly compare models
r² values will always increase as predictors added
r² values with different transformation will differ

Lecture 10: Assumptions and diagnostics

Assume fixed Xs; unrealistic in most biological settings
No major (influential) outliers
Check leverage, influence- Cook’s Di

# Plot Cook's distance
plot(model, which = 4)

Lecture 10: Assumptions and diagnostics

Normality, equal variance, independence
Residual QQ-plots, residuals vs. predicted values plot
Distribution/variance often corrected by transforming Y

Lecture 10: Assumptions and diagnostics

More observations than predictor variables
- Ideally at least 10x observations than predictors to avoid “overfitting” 10:1
  - 2 predictors = 20 observations!!!
No colinearity
- Need Uncorrelated predictor variables (assessed using scatterplot matrix; VIFs)
Each X has linear relationship with Y after accounting for other predictors
- Added Variable (AV) plots (also called partial regression plots)
  - These show the relationship between Y and X₁ after removing the linear effects of all other predictors
  - Create AV plots for all predictors - avPlots(model)
  - Or for just one predictor - avPlot(model, variable = "proportion_black")

Lecture 10: Analyses

Regression of Y vs. each X does not consider effect of other predictors:

want to know shape of relationship while holding other predictors constant

Lecture 10: Collinearity

Potential predictor variables are often correlated (e.g., morphometrics, nutrients, climatic parameters)
Multicollinearity (strong correlation between predictors) causes problems for parameter estimates
Severe collinearity causes unstable parameter estimates: small change in a single value can result in large changes in βp - estimates
Inflates partial slope error estimates, loss of power

           elevation   latitude    ant_spp
elevation  1.0000000  0.1787454 -0.5545244
latitude   0.1787454  1.0000000 -0.5879407
ant_spp   -0.5545244 -0.5879407  1.0000000

Lecture 10: Collinearity

Collinearity can be detected by:

Variance inflation Factors (VIF):
- VIF for X_j=1/ (1-r² )
- VIF > 10 = bad
Best/simplest solution:
- exclude variables that are highly correlated with other variables
- they are probably measuring similar things and are redundant

Lecture 10: Interactions

Predictors can be modeled as:

additive (effect of temp, plus precip, plus fertility) or
multiplicative (interactive)
Interaction: effect of Xi depends on levels of Xj
The partial slope of Y vs. X₁ is different for different levels of X₂ (and vice versa); measured by β₃

\[y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \epsilon_i \quad \text{vs.} \quad y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \beta_3X_{i1}*X_{i2} \epsilon_i\]

This leads to “Curvature” of the regression (hyper)plane

Lecture 10: Analyses

Interaction terms lead to “Curvature” of the regression (hyper)plane

Lecture 10: Analyses

Adding interactions:

many more predictors (“parameter proliferation”):
2ⁿ is required; 6 parameters = 64 terms; 7 parameters= 128
interpretation more complex
When to include interactions? When they make biological sense!!!

Lecture 10: Dummy variables

Multiple Linear Regression accommodates continuous and categorical variables (gender, vegetation type, etc.) Categorical vars as “dummy vars”, n of dummy variables = n-1 categories

Sex M/F:
- Need 1 dummy var with two values (0, 1)
Fertility L/M/H:
- Need 2 dummy var, each with two values (0, 1): fert1 (0 if L or H, 1 if M), fert2 (1 if H, 0 if L or M)

Fertility	fert1	fert2
Low	0	0
Med	1	0
High	0	1

Lecture 10: Analyses

Coefficients interpreted relative to reference condition

R codes dummy variables automatically
picks “reference” level alphabetically
Dummy variables with more than 2 levels add extra predictor variables to model

Fertility	fert1	fert2
Low	0	0
Med	1	0
High	0	1

Lecture 10: Analyses

Lecture 10: Comparing models

When have multiple predictors (and interactions!)
- how to choose “best” model?
- Which predictors to include?
- Occam’s razor: “best” model balances complexity with fit to data
To chose:
- compare “nested” models
Overfitting
- getting high r2 just by having more (useless predictors)
- so r2 is not a good way of choosing between nested models

Lecture 10: Comparing models

Need to account for increase in fit with added predictors:

Adjusted r2
Akaike’s information criterion (AIC)
Both “penalize” models for extra predictors
Higher adjusted r2 and lower AIC are better when comparing models
- p = predictors
- n = #

\[\text{Adjusted } r^2 = 1 - \frac{SS_{\text{Residual}}/(n - (p + 1))}{SS_{\text{Total}}/(n - 1)}\] \[\text{Akaike Information Criterion (AIC)} = n[\ln(SS_{\text{Residual}})] + 2(p + 1) - n\ln(n)\]

Lecture 10: Comparing models

But how to compare models?
- Can fit all possible models
  - compare AICs or adj- r2,
  - tedious w lots of predictors
- Automated forward (and backward) stepwise procedures: start w no terms (all terms), add (remove) terms w largest (smallest)
  - partial F statistic
We will use manual form of backward selection

Lecture 10: Analyses


========== FULL MODEL (Both Variables) ==========


Call:
lm(formula = ant_spp ~ elevation + latitude, data = ant_df)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.1180 -2.3759  0.3218  1.9070  5.8369 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) 98.49651   26.50701   3.716  0.00147 **
elevation   -0.01226    0.00411  -2.983  0.00765 **
latitude    -2.00981    0.61956  -3.244  0.00427 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.022 on 19 degrees of freedom
Multiple R-squared:  0.5543,    Adjusted R-squared:  0.5074 
F-statistic: 11.82 on 2 and 19 DF,  p-value: 0.000463

Lecture 10: Analyses


========== ELEVATION ONLY MODEL ==========


Call:
lm(formula = ant_spp ~ elevation, data = ant_df)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.9010 -2.6947 -0.9502  2.9657  7.6363 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 12.589088   1.385615   9.086 1.55e-08 ***
elevation   -0.014641   0.004913  -2.980   0.0074 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.671 on 20 degrees of freedom
Multiple R-squared:  0.3075,    Adjusted R-squared:  0.2729 
F-statistic: 8.881 on 1 and 20 DF,  p-value: 0.0074

Lecture 10: Analyses


========== LATITUDE ONLY MODEL ==========


Call:
lm(formula = ant_spp ~ latitude, data = ant_df)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.2223 -2.1188  0.0599  2.1267  6.4990 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) 109.8532    30.9803   3.546  0.00203 **
latitude     -2.3401     0.7199  -3.251  0.00401 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.569 on 20 degrees of freedom
Multiple R-squared:  0.3457,    Adjusted R-squared:  0.313 
F-statistic: 10.57 on 1 and 20 DF,  p-value: 0.004006

Lecture 10: Analyses


========== MODEL COMPARISON TABLE ==========

           Model Adjusted_R2      AIC R2_vs_Full AIC_vs_Full
1 Both Variables   0.5074180 115.8646  0.0000000    0.000000
2 Elevation Only   0.2728722 123.5608 -0.2345459    7.696164
3  Latitude Only   0.3129580 122.3132 -0.1944600    6.448613


========== KEY FINDINGS ==========

Full Model AIC: 115.86 (Adjusted R²: 0.5074)

Elevation Only AIC: 123.56 (Adjusted R²: 0.2729)

Latitude Only AIC: 122.31 (Adjusted R²: 0.3130)


--- Differences from Full Model ---

Removing latitude: AIC increases by 7.70, Adj R² decreases by -0.2345

Removing elevation: AIC increases by 6.45, Adj R² decreases by -0.1945


========== CONCLUSION ==========

✓ Full model has LOWEST AIC (best fit)

✓ Full model has HIGHEST Adjusted R² (explains most variance)

✗ Removing either variable worsens model performance


Both elevation and latitude are important predictors of ant species diversity!

Lecture 10: Predictors

Usually want to know relative importance of predictors to explaining Y

Three general approaches:
Using F-tests (or t-tests) on partial regression slopes
Using coefficient of partial determination
Using standardized partial regression slopes

Lecture 10: Predictors

Using F-tests (or t-tests) on partial regression slopes:

Conduct F tests of Ho that each partial regression slope = 0
If cannot reject Ho, discard predictor
Can get additional clues from relative size of F-values
Does not tell us absolute importance of predictor (usually can not directly compare slope parameters)

Lecture 10: Predictors

Using coefficient of partial determination:

the reduction in variation of Y due to addition of predictor (Xj)

\[r_{X_j}^2 = \frac{SS_{\text{Extra}}}{\text{Reduced }SS_{\text{Residual}}}\]

SSextra

Increased in SSregression when Xj is added to model
Reduced SSresidual is the unexplained SS from model without Xj

Lecture 10: Predictors

Using standardized partial regression slopes:

predictors of predictor variables can not be directly compared
Why?
- Standardize all vars (mean = 0, sd= 1)
- Scales are identical and larger PRS mean more important variable

Lecture 10: Predictors

Using partial r² values
- pEta_sqr (Partial Eta Squared):
  - Elevation: 0.3189 (31.9%) - Elevation explains 31.9% of the variance after controlling for latitude
  - Latitude: 0.3564 (35.6%) - Latitude explains 35.6% of the variance after controlling for elevation
  - Both are important predictors! Latitude is slightly stronger
dR_sqr (Delta R Squared / Semi-partial R²):
- Elevation: 0.2087 (20.9%)
  - If you removed elevation from the model, R² would drop by 20.9%
- Latitude: 0.2468 (24.7%)
- If you removed latitude from the model, R² would drop by 24.7%
Shows the unique contribution of each predictor to the total variance explained


== Method 2: Type II ANOVA (car package) ==

Anova Table (Type II tests)

Response: ant_spp
           Sum Sq Df F value   Pr(>F)   
elevation  81.224  1  8.8955 0.007652 **
latitude   96.085  1 10.5231 0.004272 **
Residuals 173.487 19                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

== Coefficients Table ==

 Coefficients    SSR df pEta_sqr dR_sqr
    elevation  81.22  1   0.3189 0.2087
     latitude  96.09  1   0.3564 0.2468
    Residuals 173.49 19   0.5000 0.4457


== Summary Statistics ==

Sum of squared errors (SSE): 173.5

Sum of squared total (SST): 389.3

Model R²: 0.5543