The GENMOD Procedure

Assessment of Models Based on Aggregates of Residuals

Lin, Wei, and Ying (2002) present graphical and numerical methods for model assessment based on the cumulative sums of residuals over certain coordinates (such as covariates or linear predictors) or some related aggregates of residuals. The distributions of these stochastic processes under the assumed model can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be generated by simulation. Each observed residual pattern can then be compared, both graphically and numerically, with a number of realizations from the null distribution. Such comparisons enable you to assess objectively whether the observed residual pattern reflects anything beyond random fluctuation. These procedures are useful in determining appropriate functional forms of covariates and link function. You use the ASSESS|ASSESSMENT statement to perform this kind of model-checking with cumulative sums of residuals, moving sums of residuals, or LOESS smoothed residuals. See Example 51.8 and Example 51.9 for examples of model assessment.

Let the model for the mean be

g left-parenthesis mu Subscript i Baseline right-parenthesis equals bold x prime Subscript i Baseline bold-italic beta

where is the mean of the response and is the vector of covariates for the ith observation. Denote the raw residual resulting from fitting the model as

e Subscript i Baseline equals y Subscript i Baseline minus ModifyingAbove mu With caret Subscript i

and let be the value of the jth covariate in the model for observation i. Then to check the functional form of the jth covariate, consider the cumulative sum of residuals with respect to ,

upper W Subscript j Baseline left-parenthesis x right-parenthesis equals StartFraction 1 Over StartRoot n EndRoot EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts upper I left-parenthesis x Subscript i j Baseline less-than-or-equal-to x right-parenthesis e Subscript i

where is the indicator function. For any x, is the sum of the residuals with values of less than or equal to x.

Denote the score, or gradient vector, by

upper U left-parenthesis bold-italic beta right-parenthesis equals sigma-summation Underscript i equals 1 Overscript n Endscripts h left-parenthesis bold x prime bold-italic beta right-parenthesis bold x Subscript i Baseline left-parenthesis y Subscript i Baseline minus nu left-parenthesis bold x prime bold-italic beta right-parenthesis right-parenthesis

where , and

h left-parenthesis r right-parenthesis equals StartFraction 1 Over g prime left-parenthesis nu left-parenthesis r right-parenthesis right-parenthesis upper V left-parenthesis nu left-parenthesis r right-parenthesis right-parenthesis EndFraction

Let be the Fisher information matrix

bold upper J left-parenthesis bold-italic beta right-parenthesis equals minus StartFraction partial-differential upper U left-parenthesis bold-italic beta right-parenthesis Over partial-differential bold-italic beta prime EndFraction

Define

ModifyingAbove upper W With caret Subscript j Baseline left-parenthesis x right-parenthesis equals StartFraction 1 Over StartRoot n EndRoot EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts left-bracket upper I left-parenthesis x Subscript i j Baseline less-than-or-equal-to x right-parenthesis plus bold-italic eta prime left-parenthesis x semicolon ModifyingAbove bold-italic beta With caret right-parenthesis bold upper J Superscript negative 1 Baseline left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis bold x Subscript i Baseline h left-parenthesis bold x prime ModifyingAbove bold-italic beta With caret right-parenthesis right-bracket e Subscript i Baseline upper Z Subscript i

where

bold-italic eta left-parenthesis x semicolon bold-italic beta right-parenthesis equals minus sigma-summation Underscript i equals 1 Overscript n Endscripts upper I left-parenthesis x Subscript i j Baseline less-than-or-equal-to x right-parenthesis StartFraction partial-differential nu left-parenthesis bold x prime Subscript i Baseline bold-italic beta right-parenthesis Over partial-differential bold-italic beta EndFraction

and are independent random variables. Then the conditional distribution of , given , under the null hypothesis that the model for the mean is correct, is the same asymptotically as as the unconditional distribution of (Lin, Wei, and Ying 2002).

You can approximate realizations from the null hypothesis distribution of by repeatedly generating normal samples , while holding , at their observed values and computing for each sample.

You can assess the functional form of covariate j by plotting a few realizations of on the same plot as the observed and visually comparing to see how typical the observed is of the null distribution samples.

You can supplement the graphical inspection method with a Kolmogorov-type supremum test. Let be the observed value of . The p-value is approximated by , where . is estimated by generating realizations of (1,000 is the default number of realizations).

You can check the link function instead of the jth covariate by using values of the linear predictor in place of values of the jth covariate . The graphical and numerical methods described previously are then sensitive to inadequacies in the link function.

An alternative aggregate of residuals is the moving sum statistic

upper W Subscript j Baseline left-parenthesis x comma b right-parenthesis equals StartFraction 1 Over StartRoot n EndRoot EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts upper I left-parenthesis x minus b less-than-or-equal-to x Subscript i j Baseline less-than-or-equal-to x right-parenthesis e Subscript i

If you specify the keyword WINDOW(b), then the moving sum statistic with window size b is used instead of the cumulative sum of residuals, with replacing in the earlier equation.

If you specify the keyword LOESS(f), loess smoothed residuals are used in the preceding formulas, where f is the fraction of the data to be used at a given point. If f is not specified, is used. For data , define r as the nearest integer to and h as the rth smallest among . Let

upper K Subscript i Baseline left-parenthesis x right-parenthesis equals upper K left-parenthesis StartFraction upper X Subscript i Baseline minus x Over h EndFraction right-parenthesis

where

upper K left-parenthesis t right-parenthesis equals StartFraction 70 Over 81 EndFraction left-parenthesis 1 minus StartAbsoluteValue t EndAbsoluteValue cubed right-parenthesis cubed upper I left-parenthesis negative 1 less-than-or-equal-to t less-than-or-equal-to 1 right-parenthesis

Define

w Subscript i Baseline left-parenthesis x right-parenthesis equals upper K Subscript i Baseline left-parenthesis x right-parenthesis left-bracket upper S 2 left-parenthesis x right-parenthesis minus left-parenthesis upper X Subscript i Baseline minus x right-parenthesis upper S 1 left-parenthesis x right-parenthesis right-bracket

where

upper S 1 left-parenthesis x right-parenthesis equals sigma-summation Underscript i equals 1 Overscript n Endscripts upper K Subscript i Baseline left-parenthesis x right-parenthesis left-parenthesis upper X Subscript i Baseline minus x right-parenthesis

upper S 2 left-parenthesis x right-parenthesis equals sigma-summation Underscript i equals 1 Overscript n Endscripts upper K Subscript i Baseline left-parenthesis x right-parenthesis left-parenthesis upper X Subscript i Baseline minus x right-parenthesis squared

Then the loess estimate of Y at x is defined by

ModifyingAbove upper Y With caret left-parenthesis x right-parenthesis equals sigma-summation Underscript i equals 1 Overscript n Endscripts StartFraction w Subscript i Baseline left-parenthesis x right-parenthesis Over sigma-summation Underscript i equals 1 Overscript n Endscripts w Subscript i Baseline left-parenthesis x right-parenthesis EndFraction upper Y Subscript i

Loess smoothed residuals for checking the functional form of the jth covariate are defined by replacing with and with . To implement the graphical and numerical assessment methods, is replaced with in the formulas for and .

You can perform the model checking described earlier for marginal models for dependent responses fit by generalized estimating equations (GEEs). Let denote the kth measurement on the ith cluster, , , and let denote the corresponding vector of covariates. The marginal mean of the response is assumed to depend on the covariate vector by