Lecture 10 - Multiple Regression
Lecture 09: Review
Covered
- Regression T-Test Anova
- Regression Assumptions
- sum of squared allocation and subdivision from total sums of squares
Lecture 10: Overview
Multiple Linear Regression model
- Regression parameters
- Analysis of variance
- Null hypotheses
- Explained variance
- Assumptions and diagnostics
- Collinearity
- Interactions
- Dummy variables
- Model selection
- Importance of predictors
Lecture 10: Analyses
What if more than one predictor (X) variable?
- If predictors continuous
- Mix between categorical and continuous
- Can use multiple linear regression
| Independent variable | ||
|---|---|---|
| Dependent variable | Continuous | Categorical | 
| Continuous | Regression | ANOVA | 
| Categorical | Logistic regression | 
Lecture 10: Analyses
Abundance of ants can be modeled as function of
- latitude
- longitude
- both
Instead of line, modeled with (hyper)plane
Lecture 10: Analyses
Used in similar way to simple linear regression:
- Describe nature of relationship between Y and X’s
- Determine explained / unexplained variation in Y
- Predict new Ys from X
- Find the “best” model
Lecture 10: Analyses
Crawley 2012: “Multiple regression models provide some of the most profound challenges faced by the analyst”:
- Overfitting
- Parameter proliferation
- Multicollinearity
- Model selection
Lecture 10: Analyses
Multiple Regression:
- Set of i= 1 to n observations
- fixed X-values for p predictor variables (X1, X2…Xp)
- random Y-values:
\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]
- yi: value of Y for ith observation X1 = xi1, X2 = xi2,…, xp = xip 
- β0: population intercept, the mean value of Y when X1= 0, X2 = 0,…, Xp = 0 
Lecture 10: Multiple linear regression model
Multiple Regression:
\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]
- β1: partial regression slope, change in Y per unit change in X1 holding other X-vars constant 
- β2: partial regression slope, change in Y per unit change in X2 holding other X-vars constant 
- βp: partial regression slope, change in Y per unit change in Xp holding other X-vars constant 
What is partial - the relationship between a predictor variable and the response variable while holding all other predictor variables constant. It tells you the isolated effect of one variable, controlling for the influence of others.
Lecture 10: Partial regression slopes
model: ant_spp ~ elevation + latitude
- The partial slope for elevation tells you how ant species richness changes with elevation when latitude is held constant
- 1-m increase in elevation - ant spp decrease by 0.012 spp, holding latitude constant 
- Or 100-meter increase in elevation, lose 1.2 spp 
 
- The partial slope for latitude tells you how ant species richness changes with latitude when elevation is held constant
- 1-degree increase in latitude, ant spp decrease by 2 species, holding elevation constant 
- Moving north = fewer ant spp, even when comparing sites at same elevation 
 
model <- lm(ant_spp ~ elevation + latitude, data = ant_df)
summary(model)
Call:
lm(formula = ant_spp ~ elevation + latitude, data = ant_df)
Residuals:
    Min      1Q  Median      3Q     Max 
-6.1180 -2.3759  0.3218  1.9070  5.8369 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) 98.49651   26.50701   3.716  0.00147 **
elevation   -0.01226    0.00411  -2.983  0.00765 **
latitude    -2.00981    0.61956  -3.244  0.00427 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.022 on 19 degrees of freedom
Multiple R-squared:  0.5543,    Adjusted R-squared:  0.5074 
F-statistic: 11.82 on 2 and 19 DF,  p-value: 0.000463Lecture 10: Regression parameters
Multiple Regression:
\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]
- εi: unexplained error - difference between yi and value predicted by model (ŷi)
- ant_spp = β0 + β1(lat) + β2(elevation) + εi
Lecture 10: Regression parameters
Multiple Regression:
\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]
- Estimate multiple regression parameters (intercept, partial slopes) using OLS to fit the regression line
- OLS minimize ∑(yi-ŷi)2, the SS (vertical distance) between observed yi and predicted ŷi for each xij
- ε estimated as residuals: εi = yi-ŷi
- Calculation solves set of simultaneous normal equations with matrix algebra
Lecture 10: Regression parameters
Regression equation can be used for prediction by subbing new values for predictor (X) variables
- Confidence intervals calculated for parameters
- Confidence and prediction intervals depend on number of observations and number of predictors
- More observations decrease interval width
- More predictors increase interval width
 
- Prediction should be restricted to within range of X variables
Lecture 10: Analyses of variance or sums of squares
Variance - SStotal partitioned into SSregression and SSresidual
- SSregression is variance in Y explained by model 
- SSresidual is variance not explained by model 
| Source of variation | SS | df | MS | Interpretation | 
|---|---|---|---|---|
| Regression | \(\sum_{i=1}^{n} (y_i - \bar{y})^2\) | \(p\) | \(\frac{\sum_{i=1}^{n} (y_i - \bar{y})^2}{p}\) | Difference between predicted observation and mean | 
| Residual | \(\sum_{i=1}^{n} (y_i - \hat{y}_i)^2\) | \(n-p-1\) | \(\frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{n-p-1}\) | Difference between each observation and predicted | 
| Total | \(\sum_{i=1}^{n} (y_i - \bar{y})^2\) | \(n-1\) | Difference between each observation and mean | 
Lecture 10: Analyses
SS converted to non-additive MS=(SS/df)
- MSresidual: estimate population variance
- MSregression: estimate population variance + variation due to strength of X-Y relationships
- MS is an estimate of variance that doesn’t increase with sample size the way Sum of Squares (SS) does
| Source of variation | SS | df | MS | 
|---|---|---|---|
| Regression | \(\sum_{i=1}^{n} (y_i - \bar{y})^2\) | \(p\) | \(\frac{\sum_{i=1}^{n} (y_i - \bar{y})^2}{p}\) | 
| Residual | \(\sum_{i=1}^{n} (y_i - \hat{y}_i)^2\) | \(n-p-1\) | \(\frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{n-p-1}\) | 
| Total | \(\sum_{i=1}^{n} (y_i - \bar{y})^2\) | \(n-1\) | 
Lecture 10: Hypotheses
Two Null Hypotheses usually tested in MLR:
- “Basic” Ho: all partial regression slopes equal 0; β1 = β2 = … = βp = 0
- If “basic” Ho true, MSregression and MSresidual estimate variance and their ratio (F-ratio) = 1
- If “basic” Ho false (at least one β ≠ 0) MSregression estimates variance + partial regression slope and their ratio (F-ratio) will be > 1 - F-ratio compared to F-distribution for p-value
Lecture 10: Hypotheses
Also: is any specific β = 0 (explanatory role)?
- E.g., does LAT have effect on Ant species?
- These H’s tested through model comparison
- Model 1 - model with 2 predictors X1, X2:
- yi= β0 +β1xi1+β2xi2+ εi
 
- Model 2 - To test Ho that β2 = 0 compare fit of model 1 to model 2 with 1 predictor:
- yi= β0 +β1xi1+ εi
 
Lecture 10: Hypotheses
- If SSregression of model1=model2, cannot reject Ho β1 = 0
- adding X2 did not explain any additional variance
 
- If SSregression of mod1 > mod2, evidence to reject Ho β1 = 0
- adding X2 explains more variance
- evidence to reject H0: β2 = 2
 
- SS for β1 is SSextraβ1 = Full SSregression - Reduced SSregression
- This quantifies how much additional variance X2 explains
 
- Use partial F-test to test Ho β1 = 0 :
\[F_{w,n-p} = \frac{MS_{Extra}}{FULL\ MS_{Residual}} \] Can also use t-test (R provides this value)
Lecture 10: Explained variance
Explained variance (r2) is calculated the same way as for simple regression:
\[r^2 = \frac{SS_{Regression}}{SS_{Total}} = 1 - \frac{SS_{Residual}}{SS_{Total}} \]
- r2 values can not be used to directly compare models
- r2 values will always increase as predictors added
- r2 values with different transformation will differ
Lecture 10: Assumptions and diagnostics
- Assume fixed Xs; unrealistic in most biological settings
- No major (influential) outliers
- Check leverage, influence- Cook’s Di
# Plot Cook's distance
plot(model, which = 4)Lecture 10: Assumptions and diagnostics
- Normality, equal variance, independence
- Residual QQ-plots, residuals vs. predicted values plot
- Distribution/variance often corrected by transforming Y
Lecture 10: Assumptions and diagnostics
- More observations than predictor variables
- Ideally at least 10x observations than predictors to avoid “overfitting” 10:1 - 2 predictors = 20 observations!!!
 
 
- No colinearity
- Need Uncorrelated predictor variables (assessed using scatterplot matrix; VIFs)
 
- Each X has linear relationship with Y after accounting for other predictors
- Added Variable (AV) plots (also called partial regression plots) - These show the relationship between Y and X₁ after removing the linear effects of all other predictors 
- Create AV plots for all predictors - - avPlots(model)
- Or for just one predictor - - avPlot(model, variable = "proportion_black")
 
 
Lecture 10: Analyses
Regression of Y vs. each X does not consider effect of other predictors:
want to know shape of relationship while holding other predictors constant
Lecture 10: Collinearity
- Potential predictor variables are often correlated (e.g., morphometrics, nutrients, climatic parameters)
- Multicollinearity (strong correlation between predictors) causes problems for parameter estimates
- Severe collinearity causes unstable parameter estimates: small change in a single value can result in large changes in βp - estimates
- Inflates partial slope error estimates, loss of power
           elevation   latitude    ant_spp
elevation  1.0000000  0.1787454 -0.5545244
latitude   0.1787454  1.0000000 -0.5879407
ant_spp   -0.5545244 -0.5879407  1.0000000Lecture 10: Collinearity
Collinearity can be detected by:
- Variance inflation Factors (VIF):
- VIF for Xj=1/ (1-r2 )
- VIF > 10 = bad
 
- Best/simplest solution:
- exclude variables that are highly correlated with other variables
- they are probably measuring similar things and are redundant
 
Lecture 10: Interactions
Predictors can be modeled as:
- additive (effect of temp, plus precip, plus fertility) or
- multiplicative (interactive)
- Interaction: effect of Xi depends on levels of Xj
- The partial slope of Y vs. X1 is different for different levels of X2 (and vice versa); measured by β3
\[y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \epsilon_i \quad \text{vs.} \quad y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \beta_3X_{i1}*X_{i2} \epsilon_i\]
This leads to “Curvature” of the regression (hyper)plane
Lecture 10: Analyses
Interaction terms lead to “Curvature” of the regression (hyper)plane
Lecture 10: Analyses
Adding interactions:
- many more predictors (“parameter proliferation”):
- 2n is required; 6 parameters = 64 terms; 7 parameters= 128
- interpretation more complex
- When to include interactions? When they make biological sense!!!
Lecture 10: Dummy variables
Multiple Linear Regression accommodates continuous and categorical variables (gender, vegetation type, etc.) Categorical vars as “dummy vars”, n of dummy variables = n-1 categories
- Sex M/F:
- Need 1 dummy var with two values (0, 1)
 
- Fertility L/M/H:
- Need 2 dummy var, each with two values (0, 1): fert1 (0 if L or H, 1 if M), fert2 (1 if H, 0 if L or M)
 
| Fertility | fert1 | fert2 | 
|---|---|---|
| Low | 0 | 0 | 
| Med | 1 | 0 | 
| High | 0 | 1 | 
Lecture 10: Analyses
Coefficients interpreted relative to reference condition
- R codes dummy variables automatically
- picks “reference” level alphabetically
- Dummy variables with more than 2 levels add extra predictor variables to model
| Fertility | fert1 | fert2 | 
|---|---|---|
| Low | 0 | 0 | 
| Med | 1 | 0 | 
| High | 0 | 1 | 
Lecture 10: Analyses
Lecture 10: Comparing models
- When have multiple predictors (and interactions!)
- how to choose “best” model?
- Which predictors to include?
- Occam’s razor: “best” model balances complexity with fit to data
 
- To chose:
- compare “nested” models
 
- Overfitting
- getting high r2 just by having more (useless predictors)
- so r2 is not a good way of choosing between nested models
 
Lecture 10: Comparing models
Need to account for increase in fit with added predictors:
- Adjusted r2
- Akaike’s information criterion (AIC)
- Both “penalize” models for extra predictors
- Higher adjusted r2 and lower AIC are better when comparing models
- p = predictors
- n = #
 
\[\text{Adjusted } r^2 = 1 - \frac{SS_{\text{Residual}}/(n - (p + 1))}{SS_{\text{Total}}/(n - 1)}\] \[\text{Akaike Information Criterion (AIC)} = n[\ln(SS_{\text{Residual}})] + 2(p + 1) - n\ln(n)\]
Lecture 10: Comparing models
- But how to compare models?
- Can fit all possible models - compare AICs or adj- r2,
- tedious w lots of predictors
 
- Automated forward (and backward) stepwise procedures: start w no terms (all terms), add (remove) terms w largest (smallest) - partial F statistic
 
 
- We will use manual form of backward selection
Lecture 10: Analyses
========== FULL MODEL (Both Variables) ==========
Call:
lm(formula = ant_spp ~ elevation + latitude, data = ant_df)
Residuals:
    Min      1Q  Median      3Q     Max 
-6.1180 -2.3759  0.3218  1.9070  5.8369 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) 98.49651   26.50701   3.716  0.00147 **
elevation   -0.01226    0.00411  -2.983  0.00765 **
latitude    -2.00981    0.61956  -3.244  0.00427 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.022 on 19 degrees of freedom
Multiple R-squared:  0.5543,    Adjusted R-squared:  0.5074 
F-statistic: 11.82 on 2 and 19 DF,  p-value: 0.000463Lecture 10: Analyses
========== ELEVATION ONLY MODEL ==========
Call:
lm(formula = ant_spp ~ elevation, data = ant_df)
Residuals:
    Min      1Q  Median      3Q     Max 
-4.9010 -2.6947 -0.9502  2.9657  7.6363 
Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 12.589088   1.385615   9.086 1.55e-08 ***
elevation   -0.014641   0.004913  -2.980   0.0074 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.671 on 20 degrees of freedom
Multiple R-squared:  0.3075,    Adjusted R-squared:  0.2729 
F-statistic: 8.881 on 1 and 20 DF,  p-value: 0.0074Lecture 10: Analyses
========== LATITUDE ONLY MODEL ==========
Call:
lm(formula = ant_spp ~ latitude, data = ant_df)
Residuals:
    Min      1Q  Median      3Q     Max 
-6.2223 -2.1188  0.0599  2.1267  6.4990 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) 109.8532    30.9803   3.546  0.00203 **
latitude     -2.3401     0.7199  -3.251  0.00401 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.569 on 20 degrees of freedom
Multiple R-squared:  0.3457,    Adjusted R-squared:  0.313 
F-statistic: 10.57 on 1 and 20 DF,  p-value: 0.004006Lecture 10: Analyses
========== MODEL COMPARISON TABLE ==========           Model Adjusted_R2      AIC R2_vs_Full AIC_vs_Full
1 Both Variables   0.5074180 115.8646  0.0000000    0.000000
2 Elevation Only   0.2728722 123.5608 -0.2345459    7.696164
3  Latitude Only   0.3129580 122.3132 -0.1944600    6.448613
========== KEY FINDINGS ==========Full Model AIC: 115.86 (Adjusted R²: 0.5074)Elevation Only AIC: 123.56 (Adjusted R²: 0.2729)Latitude Only AIC: 122.31 (Adjusted R²: 0.3130)
--- Differences from Full Model ---Removing latitude: AIC increases by 7.70, Adj R² decreases by -0.2345Removing elevation: AIC increases by 6.45, Adj R² decreases by -0.1945
========== CONCLUSION ==========✓ Full model has LOWEST AIC (best fit)✓ Full model has HIGHEST Adjusted R² (explains most variance)✗ Removing either variable worsens model performance
Both elevation and latitude are important predictors of ant species diversity!Lecture 10: Predictors
Usually want to know relative importance of predictors to explaining Y
- Three general approaches:
- Using F-tests (or t-tests) on partial regression slopes
- Using coefficient of partial determination
- Using standardized partial regression slopes
Lecture 10: Predictors
Using F-tests (or t-tests) on partial regression slopes:
- Conduct F tests of Ho that each partial regression slope = 0
- If cannot reject Ho, discard predictor
- Can get additional clues from relative size of F-values
- Does not tell us absolute importance of predictor (usually can not directly compare slope parameters)
Lecture 10: Predictors
Using coefficient of partial determination:
- the reduction in variation of Y due to addition of predictor (Xj)
\[r_{X_j}^2 = \frac{SS_{\text{Extra}}}{\text{Reduced }SS_{\text{Residual}}}\]
SSextra
- Increased in SSregression when Xj is added to model
- Reduced SSresidual is the unexplained SS from model without Xj
Lecture 10: Predictors
Using standardized partial regression slopes:
- predictors of predictor variables can not be directly compared
- Why?
- Standardize all vars (mean = 0, sd= 1) 
- Scales are identical and larger PRS mean more important variable 
 
Lecture 10: Predictors
- Using partial r2 values
- pEta_sqr (Partial Eta Squared): - Elevation: 0.3189 (31.9%) - Elevation explains 31.9% of the variance after controlling for latitude 
- Latitude: 0.3564 (35.6%) - Latitude explains 35.6% of the variance after controlling for elevation 
- Both are important predictors! Latitude is slightly stronger 
 
 
- dR_sqr (Delta R Squared / Semi-partial R²):
- Elevation: 0.2087 (20.9%) - If you removed elevation from the model, R² would drop by 20.9%
 
- Latitude: 0.2468 (24.7%) 
- If you removed latitude from the model, R² would drop by 24.7% 
 
- Shows the unique contribution of each predictor to the total variance explained
== Method 2: Type II ANOVA (car package) ==Anova Table (Type II tests)
Response: ant_spp
           Sum Sq Df F value   Pr(>F)   
elevation  81.224  1  8.8955 0.007652 **
latitude   96.085  1 10.5231 0.004272 **
Residuals 173.487 19                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1== Coefficients Table == Coefficients    SSR df pEta_sqr dR_sqr
    elevation  81.22  1   0.3189 0.2087
     latitude  96.09  1   0.3564 0.2468
    Residuals 173.49 19   0.5000 0.4457
== Summary Statistics ==Sum of squared errors (SSE): 173.5Sum of squared total (SST): 389.3Model R²: 0.5543