Lecture 10 - Multiple Regression
Lecture 09: Review
Covered
- Regression T-Test Anova
- Regression Assumptions
- Model II Regression
Lecture 10: Overview
Multiple Linear Regression model
- Regression parameters
- Analysis of variance
- Null hypotheses
- Explained variance
- Assumptions and diagnostics
- Collinearity
- Interactions
- Dummy variables
- Model selection
- Importance of predictors
Lecture 10: Analyses
What if more than one predictor (X) variable?
- If predictors continuous
- Mix between categorical and continuous
- Can use multiple linear regression
Independent variable | ||
---|---|---|
Dependent variable | Continuous | Categorical |
Continuous | Regression | ANOVA |
Categorical | Logistic regression | Tabular |
Lecture 10: Analyses
Abundance of C3 grasses can be modeled as function of
- latitude
- longitude
- both
Instead of line, modeled with (hyper)plane
Lecture 10: Analyses
Used in similar way to simple linear regression:
- Describe nature of relationship between Y and X’s
- Determine explained / unexplained variation in Y
- Predict new Ys from X
- Find the “best” model
S
Lecture 10: Analyses
Crawley 2012: “Multiple regression models provide some of the most profound challenges faced by the analyst”:
- Overfitting
- Parameter proliferation
- Multicollinearity
- Model selection
Lecture 10: Analyses
Multiple Regression:
- Set of i= 1 to n observations
- fixed X-values for p predictor variables (X1, X2…Xp)
- random Y-values:
\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]
yi: value of Y for ith observation X1 = xi1, X2 = xi2,…, Xp = xip
β0: population intercept, the mean value of Y when X1 = 0, X2 = 0,…, Xp = 0
Lecture 10: Multiple linear regression model
Multiple Regression:
\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]
β1: partial population slope, change in Y per unit change in X1 holding other X-vars constant
β2: partial population slope, change in Y per unit change in X2 holding other X-vars constant
βp: partial population slope, change in Y per unit change in Xp holding other X-vars constant
Lecture 10: Regression parameters
Multiple Regression:
\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]
εi: unexplained error - difference bw yi and value predicted by model (ŷi)
NPP = β0 + β1(lat) + β2 (long) + β3 (soil fertility) + εi
Lecture 10: Regression parameters
Multiple Regression:
\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]
- Estimate multiple regression parameters (intercept, partial slopes) using OLS to fit the regression line
- OLS minimize ∑(yi-ŷi)2, the SS (vertical distance) between observed yi and predicted ŷi for each xij
- ε estimated as residuals: εi = yi-ŷi
- Calculation solves set of simultaneous normal equations with matrix algebra
Lecture 10: Regression parameters
Regression equation can be used for prediction by subbing new values for predictor (X) variables
Confidence intervals calculated for parameters
Confidence and prediction intervals depend on number of observations and number of predictors
- More observations decrease interval width
- More predictors increase interval width
Prediction should be restricted to within range of X variables
Lecture 10: Analyses of variance
Variance - SStotal partitioned into SSregression and SSresidual
SSregression is variance in Y explained by model
SSresidual is variance not explained by model
Source of variation | SS | df | MS | Interpretation |
---|---|---|---|---|
Regression | \(\sum_{i=1}^{n} (y_i - \bar{y})^2\) | \(p\) | \(\frac{\sum_{i=1}^{n} (y_i - \bar{y})^2}{p}\) | Difference between predicted observation and mean |
Residual | \(\sum_{i=1}^{n} (y_i - \hat{y}_i)^2\) | \(n-p-1\) | \(\frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{n-p-1}\) | Difference between each observation and predicted |
Total | \(\sum_{i=1}^{n} (y_i - \bar{y})^2\) | \(n-1\) | Difference between each observation and mean |
Lecture 10: Analyses
SS converted to non-additive MS (SS/df)
- MSresidual: estimate population variance
- MSregression: estimate population variance + variation due to strength of X-Y relationships
- MS do not depend on sample size
Source of variation | SS | df | MS |
---|---|---|---|
Regression | \(\sum_{i=1}^{n} (y_i - \bar{y})^2\) | \(p\) | \(\frac{\sum_{i=1}^{n} (y_i - \bar{y})^2}{p}\) |
Residual | \(\sum_{i=1}^{n} (y_i - \hat{y}_i)^2\) | \(n-p-1\) | \(\frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{n-p-1}\) |
Total | \(\sum_{i=1}^{n} (y_i - \bar{y})^2\) | \(n-1\) |
Lecture 10: Hypotheses
Two Hos usually tested in MLR:
- “Basic” Ho: all partial regression slopes equal 0; β1 = β2 = … = βp = 0
- If “basic” Ho true, MSregression and MSresidual estimate variance and their ratio (F-ratio) = 1
- If “basic” Ho false (at least one β ≠ 0) MSregression estimates variance + partial regression slope and their ratio (F-ratio)
- will be > 1 - F-ratio compared to F-distribution for p-value
Lecture 10: Hypotheses
Also: is any specific β = 0 (explanatory role)?
- E.g., does LAT have effect on NPP?
- These Hs tested through model comparison
- Model w 3 predictors X1, X2,X3 (model 1):
- yi= β0 +β1xi1+β2xi2+β3xi3+ εi
- To test Ho that β1 = 0 compare fit of model 1 to model 2:
- yi= β0 +β2xi2+β3xi3+ εi
Lecture 10: Hypotheses
- If SSregression of mod1=mod2, cannot reject Ho β1 = 0
- If SSregression of mod1 > mod2, evidence to reject Ho β1 = 0
- SS for β1 is SSextraβ1 = Full SSregression - Reduced SSregression
- Use partial F-test to test Ho β1 = 0 :
\[F_{w,n-p} = \frac{MS_{Extra}}{FULL\ MS_{Residual}} \] Can also use t-test (R provides this value)
Lecture 10: Explained variance
Explained variance (r2) is calculated the same way as for simple regression:
\[r^2 = \frac{SS_{Regression}}{SS_{Total}} = 1 - \frac{SS_{Residual}}{SS_{Total}} \]
- r2 values can not be used to directly compare models
- r2 values will always increase as predictors added
- r2 values with different transformation will differ
Lecture 10: Assumptions and diagnostics
- Assume fixed Xs; unrealistic in most biological settings
- No major (influential) outliers
- Check leverage, influence- Cook’s Di
Lecture 10: Assumptions and diagnostics
- Normality, equal variance, independence
- Residual QQ-plots, residuals vs. predicted values plot
- Distribution/variance often corrected by transforming Y
Lecture 10: Assumptions and diagnostics
More observations than predictor variables
- Ideally at least 10x observations than predictors to avoid “overfitting”
- Uncorrelated predictor variables (assessed using scatterplot matrix; VIFs)
- Linear relationship between Y and each X, holding others constant (non-linearity assessed by AV plots)
Lecture 10: Analyses
Regression of Y vs. each X does not consider effect of other predictors:
want to know shape of relationship while holding other predictors constant
Lecture 10: Collinearity
- Potential predictor variables are often correlated (e.g., morphometrics, nutrients, climatic parameters)
- Multicollinearity (strong correlation between predictors) causes problems for parameter estimates
- Severe collinearity causes unstable parameter estimates: small change in a single value can result in large changes in βp - estimates
- Inflates partial slope error estimates, loss of power
Lecture 10: Collinearity
Collinearity can be detected by:
Variance inflation Factors:
- VIF for Xj=1/ (1-r2 )
- VIF > 10 = bad
Best/simplest solution:
- exclude variables that are highly correlated with other variables
- they are probably measuring similar
- thing and are redundant
Lecture 10: Interactions
Predictors can be modeled as:
- additive (effect of temp, plus precip, plus fertility) or
- multiplicative (interactive)
- Interaction: effect of Xi depends on levels of Xj
- The partial slope of Y vs. X1 is different for different levels of X2 (and vice versa); measured by β3
\[y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \epsilon_i \quad \text{vs.} \quad y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + + \beta_3X_{i3} \epsilon_i\]
“Curvature” of the regression (hyper)plane
Lecture 10: Analyses
Lecture 10: Analyses
Adding interactions:
- many more predictors (“parameter proliferation”):
- 2n; 6 params= 64 terms; 7 params= 128
- interpretation more complex
- When to include interactions? When they make biological sense
Lecture 10: Dummy variables
Multiple Linear Regression accommodates continuous and categorical variables (gender, vegetation type, etc.) Categorical vars as “dummy vars”, n of dummy variables = n-1 categories
Sex M/F:
- Need 1 dummy var with two values (0, 1)
Fertility L/M/H:
- Need 2 dummy var, each with two values (0, 1): fert1 (0 if L or H, 1 if M), fert2 (1 if H, 0 if L or M)
Fertility | fert1 | fert2 |
---|---|---|
Low | 0 | 0 |
Med | 1 | 0 |
High | 0 | 1 |
Lecture 10: Analyses
Coefficients interpreted relative to reference condition
- R codes dummy variables automatically
- picks “reference” level alphabetically
- Dummy variables with more than 2 levels add extra predictor variables to model
Fertility | fert1 | fert2 |
---|---|---|
Low | 0 | 0 |
Med | 1 | 0 |
High | 0 | 1 |
Lecture 10: Analyses
S
Lecture 10: Comparing models
When have multiple predictors (and interactions!)
- how to choose “best” model?
- Which predictors to include?
- Occam’s razor: “best” model balances complexity with fit to data
To chose:
- compare “nested” models
Overfitting
- getting high r2 just by having more (useless predictors)
- so r2 is not a good way of choosing between nested models
Lecture 10: Comparing models
Need to account for increase in fit with added predictors:
- Adjusted r2
- Akaike’s information criterion (AIC)
- Both “penalize” models for extra predictors
- Higher adjusted r2 and lower AIC are better when comparing models
\[\text{Adjusted } r^2 = 1 - \frac{SS_{\text{Residual}}/(n - (p + 1))}{SS_{\text{Total}}/(n - 1)}\] \[\text{Akaike Information Criterion (AIC)} = n[\ln(SS_{\text{Residual}})] + 2(p + 1) - n\ln(n)\]
Lecture 10: Comparing models
But how to compare models?
Can fit all possible models
- compare AICs or adj- r2,
- tedious w lots of predictors
Automated forward (and backward) stepwise procedures: start w no terms (all terms), add (remove) terms w largest (smallest)
- partial F statistic
We will use manual form of backward selection
Lecture 10: Analyses
Lecture 10: Predictors
Usually want to know relative importance of predictors to explaining Y
- Three general approaches:
- Using F-tests (or t-tests) on partial regression slopes
- Using coefficient of partial determination
- Using standardized partial regression slopes
Lecture 10: Predictors
Using F-tests (or t-tests) on partial regression slopes:
- Conduct F tests of Ho that each partial regression slope = 0
- If cannot reject Ho, discard predictor
- Can get additional clues from relative size of F-values
- Does not tell us absolute importance of predictor (usually can not directly compare slope parameters)
Lecture 10: Predictors
Using coefficient of partial determination:
- the reduction in variation of Y due to addition of predictor (Xj)
\[r_{X_j}^2 = \frac{SS_{\text{Extra}}}{\text{Reduced }SS_{\text{Residual}}}\]
SSextra
Increased in SSregression when Xj is added to model
Reduced SSresidual is the unexplained SS from model without Xj
Lecture 10: Predictors
Using standardized partial regression slopes:
- predictors of predictor variables can not be directly compared
- Why?
- Standardize all vars (mean = 0, sd= 1)
- Scales are identical and larger PRS mean more important variable
Lecture 10: Predictors
Using partial r2 values:
Lecture 10: Reporting results
Results are easiest to report in tabular format
Lecture 10: Reporting results
Results are easiest to report in tabular format