Introduction to Statistical Modeling with SAS/STAT Software

Residual Analysis

The model errors bold-italic epsilon equals bold upper Y minus bold upper X bold-italic beta are unobservable. Yet important features of the statistical model are connected to them, such as the distribution of the data, the correlation among observations, and the constancy of variance. It is customary to diagnose and investigate features of the model errors through the fitted residuals ModifyingAbove bold-italic epsilon With caret equals bold upper Y minus ModifyingAbove bold upper Y With caret equals bold upper Y minus bold upper H bold upper Y equals bold upper M bold upper Y. These residuals are projections of the data onto the null space of bold upper X and are also referred to as the "raw" residuals to contrast them with other forms of residuals that are transformations of ModifyingAbove bold-italic epsilon With caret. For the classical linear model, the statistical properties of ModifyingAbove bold-italic epsilon With caret are affected by the features of that projection and can be summarized as follows:

StartLayout 1st Row 1st Column normal upper E left-bracket ModifyingAbove bold-italic epsilon With caret right-bracket 2nd Column equals bold 0 2nd Row 1st Column normal upper V normal a normal r left-bracket ModifyingAbove bold-italic epsilon With caret right-bracket 2nd Column equals sigma squared bold upper M 3rd Row 1st Column normal r normal a normal n normal k left-parenthesis bold upper M right-parenthesis 2nd Column equals n minus normal r normal a normal n normal k left-parenthesis bold upper X right-parenthesis EndLayout

Furthermore, if bold-italic epsilon tilde upper N left-parenthesis bold 0 comma sigma squared bold upper I right-parenthesis, then ModifyingAbove bold-italic epsilon With caret tilde upper N left-parenthesis bold 0 comma sigma squared bold upper M right-parenthesis.

Because bold upper M equals bold upper I minus bold upper H, and the "hat" matrix bold upper H satisfies partial-differential ModifyingAbove bold upper Y With caret slash partial-differential bold upper Y, the hat matrix is also the leverage matrix of the model. If h Subscript i i denotes the ith diagonal element of bold upper H (the leverage of observation i), then the leverages are bounded in a model with intercept, 1 slash n less-than-or-equal-to h Subscript i i Baseline less-than-or-equal-to 1. Consequently, the variance of a raw residual is less than that of an observation: normal upper V normal a normal r left-bracket ModifyingAbove epsilon With caret Subscript i Baseline right-bracket equals sigma squared left-parenthesis 1 minus h Subscript i i Baseline right-parenthesis less-than sigma squared. In applications where the variability of the data is estimated from fitted residuals, the estimate is invariably biased low. An example is the computation of an empirical semivariogram based on fitted (detrended) residuals.

More important, the diagonal entries of bold upper H are not necessarily identical; the residuals are heteroscedastic. The "hat" matrix is also not a diagonal matrix; the residuals are correlated. In summary, the only property that the fitted residuals ModifyingAbove bold-italic epsilon With caret share with the model errors is a zero mean. It is thus commonplace to use transformations of the fitted residuals for diagnostic purposes.

Raw and Studentized Residuals

A standardized residual is a raw residual that is divided by its standard deviation:

ModifyingAbove epsilon With caret Subscript i Superscript asterisk Baseline equals StartFraction upper Y Subscript i Baseline minus ModifyingAbove upper Y With caret Subscript i Baseline Over StartRoot normal upper V normal a normal r left-bracket upper Y Subscript i Baseline minus ModifyingAbove upper Y With caret Subscript i Baseline right-bracket EndRoot EndFraction equals StartFraction ModifyingAbove epsilon With caret Subscript i Baseline Over StartRoot sigma squared left-parenthesis 1 minus h Subscript i i Baseline right-parenthesis EndRoot EndFraction

Because sigma squared is unknown, residual standardization is usually not practical. A studentized residual is a raw residual that is divided by its estimated standard deviation. If the estimate of the standard deviation is based on the same data that were used in fitting the model, the residual is also called an internally studentized residual:

ModifyingAbove epsilon With caret Subscript i s Baseline equals StartFraction upper Y Subscript i Baseline minus ModifyingAbove upper Y With caret Subscript i Baseline Over StartRoot ModifyingAbove normal upper V normal a normal r With caret left-bracket upper Y Subscript i Baseline minus ModifyingAbove upper Y With caret Subscript i Baseline right-bracket EndRoot EndFraction equals StartFraction ModifyingAbove epsilon With caret Subscript i Baseline Over StartRoot ModifyingAbove sigma With caret squared left-parenthesis 1 minus h Subscript i i Baseline right-parenthesis EndRoot EndFraction

If the estimate of the residual’s variance does not involve the ith observation, it is called an externally studentized residual. Suppose that ModifyingAbove sigma With caret Subscript negative i Superscript 2 denotes the estimate of the residual variance obtained without the ith observation; then the externally studentized residual is

ModifyingAbove epsilon With caret Subscript i r Baseline equals StartFraction ModifyingAbove epsilon With caret Subscript i Baseline Over StartRoot ModifyingAbove sigma With caret Subscript negative i Superscript 2 Baseline left-parenthesis 1 minus h Subscript i i Baseline right-parenthesis EndRoot EndFraction
Scaled Residuals

A scaled residual is simply a raw residual divided by a scalar quantity that is not an estimate of the variance of the residual. For example, residuals divided by the standard deviation of the response variable are scaled and referred to as Pearson or Pearson-type residuals:

ModifyingAbove epsilon With caret Subscript i c Baseline equals StartFraction upper Y Subscript i Baseline minus ModifyingAbove upper Y With caret Subscript i Baseline Over StartRoot ModifyingAbove normal upper V normal a normal r With caret left-bracket upper Y Subscript i Baseline right-bracket EndRoot EndFraction

In generalized linear models, where the variance of an observation is a function of the mean mu and possibly of an extra scale parameter, normal upper V normal a normal r left-bracket upper Y right-bracket equals a left-parenthesis mu right-parenthesis phi, the Pearson residual is

ModifyingAbove epsilon With caret Subscript i upper P Baseline equals StartFraction upper Y Subscript i Baseline minus ModifyingAbove mu With caret Subscript i Baseline Over StartRoot a left-parenthesis ModifyingAbove mu With caret right-parenthesis EndRoot EndFraction

because the sum of the squared Pearson residuals equals the Pearson upper X squared statistic:

upper X squared equals sigma-summation Underscript i equals 1 Overscript n Endscripts ModifyingAbove epsilon With caret Subscript i upper P Superscript 2

When the scale parameter phi participates in the scaling, the residual is also referred to as a Pearson-type residual:

ModifyingAbove epsilon With caret Subscript i upper P Baseline equals StartFraction upper Y Subscript i Baseline minus ModifyingAbove mu With caret Subscript i Baseline Over StartRoot a left-parenthesis ModifyingAbove mu With caret right-parenthesis phi EndRoot EndFraction
Other Residuals

You might encounter other residuals in SAS/STAT software. A "leave-one-out" residual is the difference between the observed value and the residual obtained from fitting a model in which the observation in question did not participate. If ModifyingAbove upper Y With caret Subscript i is the predicted value of the ith observation and ModifyingAbove upper Y With caret Subscript i comma negative i is the predicted value if upper Y Subscript i is removed from the analysis, then the "leave-one-out" residual is

ModifyingAbove epsilon With caret Subscript i comma negative i Baseline equals upper Y Subscript i Baseline minus ModifyingAbove upper Y With caret Subscript i comma negative i

Since the sum of the squared "leave-one-out" residuals is the PRESS statistic (prediction sum of squares; Allen 1974), ModifyingAbove epsilon With caret Subscript i comma negative i is also called the PRESS residual. The concept of the PRESS residual can be generalized if the deletion residual can be based on the removal of sets of observations. In the classical linear model, the PRESS residual for case deletion has a particularly simple form:

ModifyingAbove epsilon With caret Subscript i comma negative i Baseline equals upper Y Subscript i Baseline minus ModifyingAbove upper Y With caret Subscript i comma negative i Baseline equals StartFraction ModifyingAbove epsilon With caret Subscript i Baseline Over 1 minus h Subscript i i Baseline EndFraction

That is, the PRESS residual is simply a scaled form of the raw residual, where the scaling factor is a function of the leverage of the observation.

When data are correlated, normal upper V normal a normal r left-bracket bold upper Y right-bracket equals bold upper V, you can scale the vector of residuals rather than scale each residual separately. This takes the covariances among the observations into account. This form of scaling is accomplished by forming the Cholesky root bold upper C prime bold upper C equals bold upper V, where bold upper C prime is a lower-triangular matrix. Then bold upper C Superscript prime negative 1 Baseline bold upper Y is a vector of uncorrelated variables with unit variance. The Cholesky residuals in the model bold upper Y equals bold upper X bold-italic beta plus bold-italic epsilon are

ModifyingAbove bold-italic epsilon With caret Subscript upper C Baseline equals bold upper C Superscript prime negative 1 Baseline left-parenthesis bold upper Y minus bold upper X ModifyingAbove bold-italic beta With caret right-parenthesis

In generalized linear models, the fit of a model can be measured by the scaled deviance statistic upper D Superscript asterisk. It measures the difference between the log likelihood under the model and the maximum log likelihood that is achievable. In models with a scale parameter phi, the deviance is upper D equals phi times upper D Superscript asterisk Baseline equals sigma-summation Underscript i equals 1 Overscript n Endscripts d Subscript i. The deviance residuals are the signed square roots of the contributions to the deviance statistic:

ModifyingAbove epsilon With caret Subscript i d Baseline equals normal s normal i normal g normal n StartSet y Subscript i Baseline minus ModifyingAbove mu With caret Subscript i Baseline EndSet StartRoot d Subscript i Baseline EndRoot
Last updated: December 09, 2022