This means we are happy to adopt the original reduced model.
Diagnostics Summary
The main tools we will use to validate regression assumptions are,
+- Plots involving standardized residuals and/or *fitted values.
+- Determine leverage points
+- Determine which (if any) of the data points are outliers.
+- Asses the effect of each predictor variable, having adjusted for the effect of other predictor variables using added variable plots.
+- Asses the extent of collinearity among the predictor variables using variance inflation factors.
+- Examine whether the assumption of normality of error and constant error variance is reasonable.
Leverage Points and Residuals
Remember from simple linear regression, leverage points are points that have extreme values of the predictor variable.
+\mathbf{\hat{Y}} = \mathbf{X} \hat{\beta} = \mathbf{X}(\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{Y} = \mathbf{H} \mathbf{Y},
where $\mathbf{H} = \mathbf{X}(\mathbf{X} \mathbf{X})^{-1} \mathbf{X}^T$ (also called the hat matrix).
Let $h_{ij}$ be the $(i, j)$-entry of $\mathbf{H}$, then,
+\hat{Y_i} = h_{ii} Y_i + \sum_{j \neq i} h_{ij} Y_j
The rule of thumb for leverage points are,
+h_{ii} > 2 \times \text{average}(h_{ii}) = 2 \times \frac{p+1}{n}
The residuals are defined as,
+\mathbf{\hat{e}} = \mathbf{Y} - \mathbf{\hat{Y}} = (\mathbf{I} - \mathbf{H}) \mathbf{Y}
We can show that,
+\text{Var}(\mathbf{\hat{e}} | \mathbf{X}) = \sigma^2(\mathbf{I} - \mathbf{H}),
and that the standardized residuals are,
+r_i = \frac{\hat{e_i}}{s \sqrt{1 - h_{ii}}}
+s = \sqrt{\frac{\sum_{i=1}^n \hat{e_i^2}}{n - p - 1}}
Any pattern in a residual plot indicates that an incorrect model has been fit, but the pattern in general does not provide direct information on how the model is misspecified.
Added Variable Plots
Assume we originally have the model,
+\mathbf{Y} \mathbf{X} \beta + \mathbf{e}.
With an additional predictor, we now consider,
+\mathbf{Y} = \mathbf{X} \beta + \mathbf{Z} \alpha + \mathbf{e},
+\mathbf{Z} =
+z_1 \newline
+z_2 \newline
+\vdots \newline
The procedure is as follows,
+- Perform Regression
+\mathbf{Y} = \mathbf{X} \beta + \mathbf{e}
to get the residual $\mathbf{e}_{\mathbf{Y}.\mathbf{X}}$.
+- Perform Regression
+\mathbf{Z} = \mathbf{X} \beta + \mathbf{e}
to get residual $\mathbf{e}_{\mathbf{Z}.\mathbf{X}}$.
+- Plot $\mathbf{e}_{\mathbf{Y}.\mathbf{X}}$ (on $y$-axis)
against $\mathbf{e}_{\mathbf{Z}.\mathbf{X}}$ (on $x$-axis).
Transforming only $Y$ using inverse response plot.
Assume the true model is actually,
+Y = g(\beta_0 + \beta_1 x_1 + \ldots + \beta_p x_p + e),
the inverse model is thus,
+g^{-1}(Y) = \beta_0 + \beta_1 x_1 + \ldots + \beta_p x_p + e
To use inverse response plot, an important assumption is that the predictors are pairwise linearly related.
Example: Defective rates.
We want to develop a model for number of defectives based on $x_1$: temperature, $x_2$: density, and $x_3$: production rate.
To get the pairwise linear relationship, we can use the scatterplot matrix.
pairs(~Defective + Temperature + Density + Rate)
To get the inverse response plot, we can use the following code.
Collinearity of Predictors
When higly correlated predictor variables are included, they are effectively carrying very similar information about the response variable.
Thus, it is difficult for least squares to distinguish their separate effects on the response variable.
Some of the coefficients in the regression model are of the opposite sign than expected.
Consider the multiple regression model,
+Y = \beta_0 + \beta_1 x_1 + \ldots \beta_p x_p + e.
Let $R_j^2$ be the coefficient of determination $R^2$ obtained when regressing $x_j$ on other predictors.
Then it can be shown that,
+\text{Var}(\hat{\beta_j}) = \frac{1}{1 - R_j^2} \frac{\sigma^2}{(n - 1) S_{x_j}^2}
$\frac{1}{1 - R_j^2}$ is called the variance inflation factor.
A rough guide for identifying large VIF is to use the cut-off value 5.
What do you do when collinearity exists?
+- Do nothing but be careful of interpretation.
+- Remove highly correlated variables (keep only one of them).