The GENMOD Procedure

Assessment of Models Based on Aggregates of Residuals

Lin, Wei, and Ying (2002) present graphical and numerical methods for model assessment based on the cumulative sums of residuals over certain coordinates (such as covariates or linear predictors) or some related aggregates of residuals. The distributions of these stochastic processes under the assumed model can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be generated by simulation. Each observed residual pattern can then be compared, both graphically and numerically, with a number of realizations from the null distribution. Such comparisons enable you to assess objectively whether the observed residual pattern reflects anything beyond random fluctuation. These procedures are useful in determining appropriate functional forms of covariates and link function. You use the ASSESS|ASSESSMENT statement to perform this kind of model-checking with cumulative sums of residuals, moving sums of residuals, or LOESS smoothed residuals. See Example 51.8 and Example 51.9 for examples of model assessment.

Let the model for the mean be

g left-parenthesis mu Subscript i Baseline right-parenthesis equals bold x prime Subscript i Baseline bold-italic beta

where mu Subscript i is the mean of the response y Subscript i and bold x Subscript i is the vector of covariates for the ith observation. Denote the raw residual resulting from fitting the model as

e Subscript i Baseline equals y Subscript i Baseline minus ModifyingAbove mu With caret Subscript i

and let x Subscript i j be the value of the jth covariate in the model for observation i. Then to check the functional form of the jth covariate, consider the cumulative sum of residuals with respect to x Subscript i j,

upper W Subscript j Baseline left-parenthesis x right-parenthesis equals StartFraction 1 Over StartRoot n EndRoot EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts upper I left-parenthesis x Subscript i j Baseline less-than-or-equal-to x right-parenthesis e Subscript i

where upper I left-parenthesis right-parenthesis is the indicator function. For any x, upper W Subscript j Baseline left-parenthesis x right-parenthesis is the sum of the residuals with values of x Subscript j less than or equal to x.

Denote the score, or gradient vector, by

upper U left-parenthesis bold-italic beta right-parenthesis equals sigma-summation Underscript i equals 1 Overscript n Endscripts h left-parenthesis bold x prime bold-italic beta right-parenthesis bold x Subscript i Baseline left-parenthesis y Subscript i Baseline minus nu left-parenthesis bold x prime bold-italic beta right-parenthesis right-parenthesis

where nu left-parenthesis r right-parenthesis equals g Superscript negative 1 Baseline left-parenthesis r right-parenthesis, and

h left-parenthesis r right-parenthesis equals StartFraction 1 Over g prime left-parenthesis nu left-parenthesis r right-parenthesis right-parenthesis upper V left-parenthesis nu left-parenthesis r right-parenthesis right-parenthesis EndFraction

Let bold upper J be the Fisher information matrix

bold upper J left-parenthesis bold-italic beta right-parenthesis equals minus StartFraction partial-differential upper U left-parenthesis bold-italic beta right-parenthesis Over partial-differential bold-italic beta prime EndFraction

Define

ModifyingAbove upper W With caret Subscript j Baseline left-parenthesis x right-parenthesis equals StartFraction 1 Over StartRoot n EndRoot EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts left-bracket upper I left-parenthesis x Subscript i j Baseline less-than-or-equal-to x right-parenthesis plus bold-italic eta prime left-parenthesis x semicolon ModifyingAbove bold-italic beta With caret right-parenthesis bold upper J Superscript negative 1 Baseline left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis bold x Subscript i Baseline h left-parenthesis bold x prime ModifyingAbove bold-italic beta With caret right-parenthesis right-bracket e Subscript i Baseline upper Z Subscript i

where

bold-italic eta left-parenthesis x semicolon bold-italic beta right-parenthesis equals minus sigma-summation Underscript i equals 1 Overscript n Endscripts upper I left-parenthesis x Subscript i j Baseline less-than-or-equal-to x right-parenthesis StartFraction partial-differential nu left-parenthesis bold x prime Subscript i Baseline bold-italic beta right-parenthesis Over partial-differential bold-italic beta EndFraction

and upper Z Subscript i are independent upper N left-parenthesis 0 comma 1 right-parenthesis random variables. Then the conditional distribution of ModifyingAbove upper W With caret Subscript j Baseline left-parenthesis x right-parenthesis, given left-parenthesis y Subscript i Baseline comma bold x Subscript i Baseline right-parenthesis comma i equals 1 comma ellipsis comma n, under the null hypothesis upper H 0 that the model for the mean is correct, is the same asymptotically as n right-arrow normal infinity as the unconditional distribution of upper W Subscript j Baseline left-parenthesis x right-parenthesis (Lin, Wei, and Ying 2002).

You can approximate realizations from the null hypothesis distribution of upper W Subscript j Baseline left-parenthesis x right-parenthesis by repeatedly generating normal samples upper Z Subscript i Baseline comma i equals 1 comma ellipsis comma n, while holding left-parenthesis y Subscript i Baseline comma bold x Subscript i Baseline right-parenthesis comma i equals 1 comma ellipsis comma n, at their observed values and computing ModifyingAbove upper W With caret Subscript j Baseline left-parenthesis x right-parenthesis for each sample.

You can assess the functional form of covariate j by plotting a few realizations of ModifyingAbove upper W With caret Subscript j Baseline left-parenthesis x right-parenthesis on the same plot as the observed upper W Subscript j Baseline left-parenthesis x right-parenthesis and visually comparing to see how typical the observed upper W Subscript j Baseline left-parenthesis x right-parenthesis is of the null distribution samples.

You can supplement the graphical inspection method with a Kolmogorov-type supremum test. Let s Subscript j be the observed value of upper S Subscript j Baseline equals sup Underscript x Endscripts StartAbsoluteValue upper W Subscript j Baseline left-parenthesis x right-parenthesis EndAbsoluteValue. The p-value normal upper P normal r left-bracket upper S Subscript j Baseline greater-than-or-equal-to s Subscript j Baseline right-bracket is approximated by normal upper P normal r left-bracket ModifyingAbove upper S With caret Subscript j Baseline greater-than-or-equal-to s Subscript j Baseline right-bracket, where ModifyingAbove upper S With caret Subscript j Baseline equals sup Underscript x Endscripts StartAbsoluteValue ModifyingAbove upper W With caret Subscript j Baseline left-parenthesis x right-parenthesis EndAbsoluteValue. normal upper P normal r left-bracket ModifyingAbove upper S With caret Subscript j Baseline greater-than-or-equal-to s Subscript j Baseline right-bracket is estimated by generating realizations of ModifyingAbove upper W With caret Subscript j Baseline left-parenthesis period right-parenthesis (1,000 is the default number of realizations).

You can check the link function instead of the jth covariate by using values of the linear predictor bold x prime Subscript i Baseline ModifyingAbove bold-italic beta With caret in place of values of the jth covariate x Subscript i j. The graphical and numerical methods described previously are then sensitive to inadequacies in the link function.

An alternative aggregate of residuals is the moving sum statistic

upper W Subscript j Baseline left-parenthesis x comma b right-parenthesis equals StartFraction 1 Over StartRoot n EndRoot EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts upper I left-parenthesis x minus b less-than-or-equal-to x Subscript i j Baseline less-than-or-equal-to x right-parenthesis e Subscript i

If you specify the keyword WINDOW(b), then the moving sum statistic with window size b is used instead of the cumulative sum of residuals, with upper I left-parenthesis x minus b less-than-or-equal-to x Subscript i j Baseline less-than-or-equal-to x right-parenthesis replacing upper I left-parenthesis x Subscript i j Baseline less-than-or-equal-to x right-parenthesis in the earlier equation.

If you specify the keyword LOESS(f), loess smoothed residuals are used in the preceding formulas, where f is the fraction of the data to be used at a given point. If f is not specified, f equals one-third is used. For data left-parenthesis upper Y Subscript i Baseline comma upper X Subscript i Baseline right-parenthesis comma i equals 1 comma ellipsis comma n, define r as the nearest integer to n f and h as the rth smallest among StartAbsoluteValue upper X Subscript i Baseline minus x EndAbsoluteValue comma i equals 1 comma ellipsis comma n. Let

upper K Subscript i Baseline left-parenthesis x right-parenthesis equals upper K left-parenthesis StartFraction upper X Subscript i Baseline minus x Over h EndFraction right-parenthesis

where

upper K left-parenthesis t right-parenthesis equals StartFraction 70 Over 81 EndFraction left-parenthesis 1 minus StartAbsoluteValue t EndAbsoluteValue cubed right-parenthesis cubed upper I left-parenthesis negative 1 less-than-or-equal-to t less-than-or-equal-to 1 right-parenthesis

Define

w Subscript i Baseline left-parenthesis x right-parenthesis equals upper K Subscript i Baseline left-parenthesis x right-parenthesis left-bracket upper S 2 left-parenthesis x right-parenthesis minus left-parenthesis upper X Subscript i Baseline minus x right-parenthesis upper S 1 left-parenthesis x right-parenthesis right-bracket

where

upper S 1 left-parenthesis x right-parenthesis equals sigma-summation Underscript i equals 1 Overscript n Endscripts upper K Subscript i Baseline left-parenthesis x right-parenthesis left-parenthesis upper X Subscript i Baseline minus x right-parenthesis
upper S 2 left-parenthesis x right-parenthesis equals sigma-summation Underscript i equals 1 Overscript n Endscripts upper K Subscript i Baseline left-parenthesis x right-parenthesis left-parenthesis upper X Subscript i Baseline minus x right-parenthesis squared

Then the loess estimate of Y at x is defined by

ModifyingAbove upper Y With caret left-parenthesis x right-parenthesis equals sigma-summation Underscript i equals 1 Overscript n Endscripts StartFraction w Subscript i Baseline left-parenthesis x right-parenthesis Over sigma-summation Underscript i equals 1 Overscript n Endscripts w Subscript i Baseline left-parenthesis x right-parenthesis EndFraction upper Y Subscript i

Loess smoothed residuals for checking the functional form of the jth covariate are defined by replacing upper Y Subscript i with e Subscript i and upper X Subscript i with x Subscript i j. To implement the graphical and numerical assessment methods, upper I left-parenthesis x Subscript i j Baseline less-than-or-equal-to x right-parenthesis is replaced with StartFraction w Subscript i Baseline left-parenthesis x right-parenthesis Over sigma-summation Underscript i equals 1 Overscript n Endscripts w Subscript i Baseline left-parenthesis x right-parenthesis EndFraction in the formulas for upper W Subscript j Baseline left-parenthesis x right-parenthesis and ModifyingAbove upper W With caret Subscript j Baseline left-parenthesis x right-parenthesis.

You can perform the model checking described earlier for marginal models for dependent responses fit by generalized estimating equations (GEEs). Let y Subscript i k denote the kth measurement on the ith cluster, i equals 1 comma ellipsis comma upper K, k equals 1 comma ellipsis comma n Subscript i Baseline, and let bold x Subscript i k denote the corresponding vector of covariates. The marginal mean of the response mu Subscript i k Baseline equals normal upper E left-parenthesis y Subscript i k Baseline right-parenthesis is assumed to depend on the covariate vector by

g left-parenthesis mu Subscript i k Baseline right-parenthesis equals bold x prime Subscript i k Baseline bold-italic beta

where g is the link function.

Define the vector of residuals for the ith cluster as

bold e Subscript i Baseline equals left-parenthesis e Subscript i Baseline 1 Baseline comma ellipsis comma e Subscript i n Sub Subscript i Subscript Baseline right-parenthesis prime equals left-parenthesis y Subscript i Baseline 1 Baseline minus ModifyingAbove mu With caret Subscript i Baseline 1 Baseline comma ellipsis comma y Subscript i n Sub Subscript i Subscript Baseline minus ModifyingAbove mu With caret Subscript i n Sub Subscript i Subscript Baseline right-parenthesis prime

You use the following extension of upper W Subscript j Baseline left-parenthesis x right-parenthesis defined earlier to check the functional form of the jth covariate:

upper W Subscript j Baseline left-parenthesis x right-parenthesis equals StartFraction 1 Over StartRoot upper K EndRoot EndFraction sigma-summation Underscript i equals 1 Overscript upper K Endscripts sigma-summation Underscript k equals 1 Overscript n Subscript i Baseline Endscripts upper I left-parenthesis x Subscript i k j Baseline less-than-or-equal-to x right-parenthesis e Subscript i k

where x Subscript i k j is the jth component of bold x Subscript i k.

The null distribution of upper W Subscript j Baseline left-parenthesis x right-parenthesis can be approximated by the conditional distribution of

ModifyingAbove upper W With caret Subscript j Baseline left-parenthesis x right-parenthesis equals StartFraction 1 Over StartRoot upper K EndRoot EndFraction sigma-summation Underscript i equals 1 Overscript upper K Endscripts StartSet sigma-summation Underscript k equals 1 Overscript n Subscript i Baseline Endscripts upper I left-parenthesis x Subscript i k j Baseline less-than-or-equal-to x right-parenthesis e Subscript i k Baseline plus bold-italic eta prime left-parenthesis x comma ModifyingAbove bold-italic beta With caret right-parenthesis bold upper I 0 Superscript negative 1 Baseline ModifyingAbove bold upper D With caret prime Subscript i Baseline ModifyingAbove bold upper V With caret Subscript i Superscript negative 1 Baseline bold e Subscript i Baseline EndSet upper Z Subscript i

where ModifyingAbove bold upper D With caret Subscript i and ModifyingAbove bold upper V With caret Subscript i are defined as in the section Generalized Estimating Equations with the unknown parameters replaced by their estimated values,

bold-italic eta left-parenthesis x comma bold-italic beta right-parenthesis equals minus sigma-summation Underscript i equals 1 Overscript upper K Endscripts sigma-summation Underscript k equals 1 Overscript n Subscript i Baseline Endscripts upper I left-parenthesis x Subscript i k j Baseline less-than-or-equal-to x right-parenthesis StartFraction partial-differential mu Subscript i k Baseline Over partial-differential bold-italic beta EndFraction
bold upper I 0 equals sigma-summation Underscript i equals 1 Overscript upper K Endscripts ModifyingAbove bold upper D With caret prime Subscript i Baseline ModifyingAbove bold upper V With caret Subscript i Superscript negative 1 Baseline ModifyingAbove bold upper D With caret Subscript i

and upper Z Subscript i Baseline comma i equals 1 comma ellipsis comma upper K, are independent upper N left-parenthesis 0 comma 1 right-parenthesis random variables. You replace x Subscript i k j with the linear predictor bold x prime Subscript i k Baseline ModifyingAbove bold-italic beta With caret in the preceding formulas to check the link function.

Last updated: December 09, 2022