Lecture 10 - Multiple Regression
Lecture 09: Review
Covered
- Regression T-Test Anova
- Regression Assumptions
- sum of squared allocation and subdivision from total sums of squares
Lecture 10: Overview
Multiple Linear Regression model
- Regression parameters
- Analysis of variance
- Null hypotheses
- Explained variance
- Assumptions and diagnostics
- Collinearity
- Interactions
- Dummy variables
- Model selection
- Importance of predictors
Lecture 10: Analyses
What if more than one predictor (X) variable?
- If predictors continuous
- Mix between categorical and continuous
- Can use multiple linear regression
Independent variable | ||
---|---|---|
Dependent variable | Continuous | Categorical |
Continuous | Regression | ANOVA |
Categorical | Logistic regression |
Lecture 10: Analyses
Abundance of ants can be modeled as function of
- latitude
- longitude
- both
Instead of line, modeled with (hyper)plane
Lecture 10: Analyses
Used in similar way to simple linear regression:
- Describe nature of relationship between Y and X’s
- Determine explained / unexplained variation in Y
- Predict new Ys from X
- Find the “best” model
Lecture 10: Analyses
Crawley 2012: “Multiple regression models provide some of the most profound challenges faced by the analyst”:
- Overfitting
- Parameter proliferation
- Multicollinearity
- Model selection
Lecture 10: Analyses
Multiple Regression:
- Set of i= 1 to n observations
- fixed X-values for p predictor variables (X1, X2…Xp)
- random Y-values:
\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]
yi: value of Y for ith observation X1 = xi1, X2 = xi2,…, xp = xip
β0: population intercept, the mean value of Y when X1= 0, X2 = 0,…, Xp = 0
Lecture 10: Multiple linear regression model
Multiple Regression:
\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]
β1: partial regression slope, change in Y per unit change in X1 holding other X-vars constant
β2: partial regression slope, change in Y per unit change in X2 holding other X-vars constant
βp: partial regression slope, change in Y per unit change in Xp holding other X-vars constant
What is partial - the relationship between a predictor variable and the response variable while holding all other predictor variables constant. It tells you the isolated effect of one variable, controlling for the influence of others.
Lecture 10: Partial regression slopes
model: ant_spp ~ elevation + latitude
- The partial slope for elevation tells you how ant species richness changes with elevation when latitude is held constant
1-m increase in elevation - ant spp decrease by 0.012 spp, holding latitude constant
Or 100-meter increase in elevation, lose 1.2 spp
- The partial slope for latitude tells you how ant species richness changes with latitude when elevation is held constant
1-degree increase in latitude, ant spp decrease by 2 species, holding elevation constant
Moving north = fewer ant spp, even when comparing sites at same elevation
<- lm(ant_spp ~ elevation + latitude, data = ant_df)
model summary(model)
Call:
lm(formula = ant_spp ~ elevation + latitude, data = ant_df)
Residuals:
Min 1Q Median 3Q Max
-6.1180 -2.3759 0.3218 1.9070 5.8369
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 98.49651 26.50701 3.716 0.00147 **
elevation -0.01226 0.00411 -2.983 0.00765 **
latitude -2.00981 0.61956 -3.244 0.00427 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.022 on 19 degrees of freedom
Multiple R-squared: 0.5543, Adjusted R-squared: 0.5074
F-statistic: 11.82 on 2 and 19 DF, p-value: 0.000463
Lecture 10: Regression parameters
Multiple Regression:
\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]
- εi: unexplained error - difference between yi and value predicted by model (ŷi)
- ant_spp = β0 + β1(lat) + β2(elevation) + εi
Lecture 10: Regression parameters
Multiple Regression:
\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]
- Estimate multiple regression parameters (intercept, partial slopes) using OLS to fit the regression line
- OLS minimize ∑(yi-ŷi)2, the SS (vertical distance) between observed yi and predicted ŷi for each xij
- ε estimated as residuals: εi = yi-ŷi
- Calculation solves set of simultaneous normal equations with matrix algebra
Lecture 10: Regression parameters
Regression equation can be used for prediction by subbing new values for predictor (X) variables
- Confidence intervals calculated for parameters
- Confidence and prediction intervals depend on number of observations and number of predictors
- More observations decrease interval width
- More predictors increase interval width
- Prediction should be restricted to within range of X variables
Lecture 10: Analyses of variance or sums of squares
Variance - SStotal partitioned into SSregression and SSresidual
SSregression is variance in Y explained by model
SSresidual is variance not explained by model
Source of variation | SS | df | MS | Interpretation |
---|---|---|---|---|
Regression | \(\sum_{i=1}^{n} (y_i - \bar{y})^2\) | \(p\) | \(\frac{\sum_{i=1}^{n} (y_i - \bar{y})^2}{p}\) | Difference between predicted observation and mean |
Residual | \(\sum_{i=1}^{n} (y_i - \hat{y}_i)^2\) | \(n-p-1\) | \(\frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{n-p-1}\) | Difference between each observation and predicted |
Total | \(\sum_{i=1}^{n} (y_i - \bar{y})^2\) | \(n-1\) | Difference between each observation and mean |
Lecture 10: Analyses
SS converted to non-additive MS=(SS/df)
- MSresidual: estimate population variance
- MSregression: estimate population variance + variation due to strength of X-Y relationships
- MS is an estimate of variance that doesn’t increase with sample size the way Sum of Squares (SS) does
Source of variation | SS | df | MS |
---|---|---|---|
Regression | \(\sum_{i=1}^{n} (y_i - \bar{y})^2\) | \(p\) | \(\frac{\sum_{i=1}^{n} (y_i - \bar{y})^2}{p}\) |
Residual | \(\sum_{i=1}^{n} (y_i - \hat{y}_i)^2\) | \(n-p-1\) | \(\frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{n-p-1}\) |
Total | \(\sum_{i=1}^{n} (y_i - \bar{y})^2\) | \(n-1\) |
Lecture 10: Hypotheses
Two Null Hypotheses usually tested in MLR:
- “Basic” Ho: all partial regression slopes equal 0; β1 = β2 = … = βp = 0
- If “basic” Ho true, MSregression and MSresidual estimate variance and their ratio (F-ratio) = 1
- If “basic” Ho false (at least one β ≠ 0) MSregression estimates variance + partial regression slope and their ratio (F-ratio) will be > 1 - F-ratio compared to F-distribution for p-value
Lecture 10: Hypotheses
Also: is any specific β = 0 (explanatory role)?
- E.g., does LAT have effect on Ant species?
- These H’s tested through model comparison
- Model 1 - model with 2 predictors X1, X2:
- yi= β0 +β1xi1+β2xi2+ εi
- Model 2 - To test Ho that β2 = 0 compare fit of model 1 to model 2 with 1 predictor:
- yi= β0 +β1xi1+ εi
Lecture 10: Hypotheses
- If SSregression of model1=model2, cannot reject Ho β1 = 0
- adding X2 did not explain any additional variance
- If SSregression of mod1 > mod2, evidence to reject Ho β1 = 0
- adding X2 explains more variance
- evidence to reject H0: β2 = 2
- SS for β1 is SSextraβ1 = Full SSregression - Reduced SSregression
- This quantifies how much additional variance X2 explains
- Use partial F-test to test Ho β1 = 0 :
\[F_{w,n-p} = \frac{MS_{Extra}}{FULL\ MS_{Residual}} \] Can also use t-test (R provides this value)
Lecture 10: Explained variance
Explained variance (r2) is calculated the same way as for simple regression:
\[r^2 = \frac{SS_{Regression}}{SS_{Total}} = 1 - \frac{SS_{Residual}}{SS_{Total}} \]
- r2 values can not be used to directly compare models
- r2 values will always increase as predictors added
- r2 values with different transformation will differ
Lecture 10: Assumptions and diagnostics
- Assume fixed Xs; unrealistic in most biological settings
- No major (influential) outliers
- Check leverage, influence- Cook’s Di
# Plot Cook's distance
plot(model, which = 4)
Lecture 10: Assumptions and diagnostics
- Normality, equal variance, independence
- Residual QQ-plots, residuals vs. predicted values plot
- Distribution/variance often corrected by transforming Y
Lecture 10: Assumptions and diagnostics
- More observations than predictor variables
Ideally at least 10x observations than predictors to avoid “overfitting” 10:1
- 2 predictors = 20 observations!!!
- No colinearity
- Need Uncorrelated predictor variables (assessed using scatterplot matrix; VIFs)
- Each X has linear relationship with Y after accounting for other predictors
Added Variable (AV) plots (also called partial regression plots)
These show the relationship between Y and X₁ after removing the linear effects of all other predictors
Create AV plots for all predictors -
avPlots(model)
Or for just one predictor -
avPlot(model, variable = "proportion_black")
Lecture 10: Analyses
Regression of Y vs. each X does not consider effect of other predictors:
want to know shape of relationship while holding other predictors constant
Lecture 10: Collinearity
- Potential predictor variables are often correlated (e.g., morphometrics, nutrients, climatic parameters)
- Multicollinearity (strong correlation between predictors) causes problems for parameter estimates
- Severe collinearity causes unstable parameter estimates: small change in a single value can result in large changes in βp - estimates
- Inflates partial slope error estimates, loss of power
elevation latitude ant_spp
elevation 1.0000000 0.1787454 -0.5545244
latitude 0.1787454 1.0000000 -0.5879407
ant_spp -0.5545244 -0.5879407 1.0000000
Lecture 10: Collinearity
Collinearity can be detected by:
- Variance inflation Factors (VIF):
- VIF for Xj=1/ (1-r2 )
- VIF > 10 = bad
- Best/simplest solution:
- exclude variables that are highly correlated with other variables
- they are probably measuring similar things and are redundant
Lecture 10: Interactions
Predictors can be modeled as:
- additive (effect of temp, plus precip, plus fertility) or
- multiplicative (interactive)
- Interaction: effect of Xi depends on levels of Xj
- The partial slope of Y vs. X1 is different for different levels of X2 (and vice versa); measured by β3
\[y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \epsilon_i \quad \text{vs.} \quad y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \beta_3X_{i1}*X_{i2} \epsilon_i\]
This leads to “Curvature” of the regression (hyper)plane
Lecture 10: Analyses
Interaction terms lead to “Curvature” of the regression (hyper)plane
Lecture 10: Analyses
Adding interactions:
- many more predictors (“parameter proliferation”):
- 2n is required; 6 parameters= 64 terms; 7 parameters= 128
- interpretation more complex
- When to include interactions? When they make biological sense!!!
Lecture 10: Dummy variables
Multiple Linear Regression accommodates continuous and categorical variables (gender, vegetation type, etc.) Categorical vars as “dummy vars”, n of dummy variables = n-1 categories
- Sex M/F:
- Need 1 dummy var with two values (0, 1)
- Fertility L/M/H:
- Need 2 dummy var, each with two values (0, 1): fert1 (0 if L or H, 1 if M), fert2 (1 if H, 0 if L or M)
Fertility | fert1 | fert2 |
---|---|---|
Low | 0 | 0 |
Med | 1 | 0 |
High | 0 | 1 |
Lecture 10: Analyses
Coefficients interpreted relative to reference condition
- R codes dummy variables automatically
- picks “reference” level alphabetically
- Dummy variables with more than 2 levels add extra predictor variables to model
Fertility | fert1 | fert2 |
---|---|---|
Low | 0 | 0 |
Med | 1 | 0 |
High | 0 | 1 |
Lecture 10: Analyses
Lecture 10: Comparing models
- When have multiple predictors (and interactions!)
- how to choose “best” model?
- Which predictors to include?
- Occam’s razor: “best” model balances complexity with fit to data
- To chose:
- compare “nested” models
- Overfitting
- getting high r2 just by having more (useless predictors)
- so r2 is not a good way of choosing between nested models
Lecture 10: Comparing models
Need to account for increase in fit with added predictors:
- Adjusted r2
- Akaike’s information criterion (AIC)
- Both “penalize” models for extra predictors
- Higher adjusted r2 and lower AIC are better when comparing models
- p = predictors
- n = #
\[\text{Adjusted } r^2 = 1 - \frac{SS_{\text{Residual}}/(n - (p + 1))}{SS_{\text{Total}}/(n - 1)}\] \[\text{Akaike Information Criterion (AIC)} = n[\ln(SS_{\text{Residual}})] + 2(p + 1) - n\ln(n)\]
Lecture 10: Comparing models
- But how to compare models?
Can fit all possible models
- compare AICs or adj- r2,
- tedious w lots of predictors
Automated forward (and backward) stepwise procedures: start w no terms (all terms), add (remove) terms w largest (smallest)
- partial F statistic
- We will use manual form of backward selection
Lecture 10: Analyses
========== FULL MODEL (Both Variables) ==========
Call:
lm(formula = ant_spp ~ elevation + latitude, data = ant_df)
Residuals:
Min 1Q Median 3Q Max
-6.1180 -2.3759 0.3218 1.9070 5.8369
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 98.49651 26.50701 3.716 0.00147 **
elevation -0.01226 0.00411 -2.983 0.00765 **
latitude -2.00981 0.61956 -3.244 0.00427 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.022 on 19 degrees of freedom
Multiple R-squared: 0.5543, Adjusted R-squared: 0.5074
F-statistic: 11.82 on 2 and 19 DF, p-value: 0.000463
Lecture 10: Analyses
========== ELEVATION ONLY MODEL ==========
Call:
lm(formula = ant_spp ~ elevation, data = ant_df)
Residuals:
Min 1Q Median 3Q Max
-4.9010 -2.6947 -0.9502 2.9657 7.6363
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.589088 1.385615 9.086 1.55e-08 ***
elevation -0.014641 0.004913 -2.980 0.0074 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.671 on 20 degrees of freedom
Multiple R-squared: 0.3075, Adjusted R-squared: 0.2729
F-statistic: 8.881 on 1 and 20 DF, p-value: 0.0074
Lecture 10: Analyses
========== LATITUDE ONLY MODEL ==========
Call:
lm(formula = ant_spp ~ latitude, data = ant_df)
Residuals:
Min 1Q Median 3Q Max
-6.2223 -2.1188 0.0599 2.1267 6.4990
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 109.8532 30.9803 3.546 0.00203 **
latitude -2.3401 0.7199 -3.251 0.00401 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.569 on 20 degrees of freedom
Multiple R-squared: 0.3457, Adjusted R-squared: 0.313
F-statistic: 10.57 on 1 and 20 DF, p-value: 0.004006
Lecture 10: Analyses
========== MODEL COMPARISON TABLE ==========
Model Adjusted_R2 AIC R2_vs_Full AIC_vs_Full
1 Both Variables 0.5074180 115.8646 0.0000000 0.000000
2 Elevation Only 0.2728722 123.5608 -0.2345459 7.696164
3 Latitude Only 0.3129580 122.3132 -0.1944600 6.448613
========== KEY FINDINGS ==========
Full Model AIC: 115.86 (Adjusted R²: 0.5074)
Elevation Only AIC: 123.56 (Adjusted R²: 0.2729)
Latitude Only AIC: 122.31 (Adjusted R²: 0.3130)
--- Differences from Full Model ---
Removing latitude: AIC increases by 7.70, Adj R² decreases by -0.2345
Removing elevation: AIC increases by 6.45, Adj R² decreases by -0.1945
========== CONCLUSION ==========
✓ Full model has LOWEST AIC (best fit)
✓ Full model has HIGHEST Adjusted R² (explains most variance)
✗ Removing either variable worsens model performance
Both elevation and latitude are important predictors of ant species diversity!
Lecture 10: Predictors
Usually want to know relative importance of predictors to explaining Y
- Three general approaches:
- Using F-tests (or t-tests) on partial regression slopes
- Using coefficient of partial determination
- Using standardized partial regression slopes
Lecture 10: Predictors
Using F-tests (or t-tests) on partial regression slopes:
- Conduct F tests of Ho that each partial regression slope = 0
- If cannot reject Ho, discard predictor
- Can get additional clues from relative size of F-values
- Does not tell us absolute importance of predictor (usually can not directly compare slope parameters)
Lecture 10: Predictors
Using coefficient of partial determination:
- the reduction in variation of Y due to addition of predictor (Xj)
\[r_{X_j}^2 = \frac{SS_{\text{Extra}}}{\text{Reduced }SS_{\text{Residual}}}\]
SSextra
- Increased in SSregression when Xj is added to model
- Reduced SSresidual is the unexplained SS from model without Xj
Lecture 10: Predictors
Using standardized partial regression slopes:
- predictors of predictor variables can not be directly compared
- Why?
Standardize all vars (mean = 0, sd= 1)
Scales are identical and larger PRS mean more important variable
Lecture 10: Predictors
- Using partial r2 values
pEta_sqr (Partial Eta Squared):
Elevation: 0.3189 (31.9%) - Elevation explains 31.9% of the variance after controlling for latitude
Latitude: 0.3564 (35.6%) - Latitude explains 35.6% of the variance after controlling for elevation
Both are important predictors! Latitude is slightly stronger
- dR_sqr (Delta R Squared / Semi-partial R²):
Elevation: 0.2087 (20.9%)
- If you removed elevation from the model, R² would drop by 20.9%
Latitude: 0.2468 (24.7%)
If you removed latitude from the model, R² would drop by 24.7%
- Shows the unique contribution of each predictor to the total variance explained
== Method 2: Type II ANOVA (car package) ==
Anova Table (Type II tests)
Response: ant_spp
Sum Sq Df F value Pr(>F)
elevation 81.224 1 8.8955 0.007652 **
latitude 96.085 1 10.5231 0.004272 **
Residuals 173.487 19
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
== Coefficients Table ==
Coefficients SSR df pEta_sqr dR_sqr
elevation 81.22 1 0.3189 0.2087
latitude 96.09 1 0.3564 0.2468
Residuals 173.49 19 0.5000 0.4457
== Summary Statistics ==
Sum of squared errors (SSE): 173.5
Sum of squared total (SST): 389.3
Model R²: 0.5543