The GENMOD Procedure

Case Deletion Diagnostic Statistics

For ordinary generalized linear models, regression diagnostic statistics developed by Williams (1987) can be requested in an output data set or in the OBSTATS table by specifying the DIAGNOSTICS | INFLUENCE option in the MODEL statement. These diagnostics measure the influence of an individual observation on model fit, and generalize the one-step diagnostics developed by Pregibon (1981) for the logistic regression model for binary data.

Preisser and Qaqish (1996) further generalized regression diagnostics to apply to models for correlated data fit by generalized estimating equations (GEEs), where the influence of entire clusters of correlated observations, or the influence of individual observations within a cluster, is measured. These diagnostic statistics can be requested in an output data set or in the OBSTATS table if a model for correlated data is specified with a REPEATED statement.

The next two sections use the following notation:

ModifyingAbove bold-italic beta With caret

is the maximum likelihood estimate of the regression parameters bold-italic beta, or, in the case of correlated data, the solution of the GEEs.

ModifyingAbove bold-italic beta With caret Subscript left-bracket i right-bracket

is the corresponding estimate evaluated with the ith observation deleted, or, in the case of correlated data, with the ith cluster deleted.

p

is the dimension of the regression parameter vector bold-italic beta.

r Subscript p i

is the standardized Pearson residual StartFraction y Subscript i Baseline minus mu Subscript i Baseline Over StartRoot v Subscript i Baseline left-parenthesis 1 minus h Subscript i Baseline right-parenthesis EndRoot EndFraction, where v Subscript i is the variance of the ith response and h Subscript i is the leverage defined in the section H |LEVERAGE.

v Subscript i

is the variance of response i, normal upper V normal a normal r left-parenthesis upper Y Subscript i Baseline right-parenthesis equals phi upper V left-parenthesis mu Subscript i Baseline right-parenthesis, where upper V left-parenthesis mu right-parenthesis is the variance function and phi is the dispersion parameter.

w Subscript i

is the prior weight of the ith observation specified with the WEIGHT statement. If there is no WEIGHT statement, w Subscript i Baseline equals 1 for all i.

All unknown quantities are replaced by their estimated values in the following two sections.

Diagnostics for Ordinary Generalized Linear Models

The following statistics are available for generalized linear models.

DFBETA

The DFBETA statistic for measuring the influence of the ith observation is defined as the one-step approximation to the difference in the MLE of the regression parameter vector and the MLE of the regression parameter vector without the ith observation. This one-step approximation assumes a Fisher scoring step, and is given by

ModifyingAbove bold-italic beta With caret minus ModifyingAbove bold-italic beta With caret Subscript left-bracket i right-bracket Baseline almost-equals normal upper D normal upper F normal upper B normal upper E normal upper T normal upper A Subscript i Baseline equals left-parenthesis bold upper X prime bold upper W bold upper X right-parenthesis Superscript negative 1 Baseline bold upper X prime Subscript i Baseline bold upper W Subscript i Superscript one-half Baseline left-parenthesis 1 minus h Subscript i Baseline right-parenthesis Superscript negative one-half Baseline r Subscript p i

where h Subscript i is the leverage defined in the section H |LEVERAGE.

DFBETAS

The standardized DFBETA statistic for assessing the influence of the ith observation on the jth regression parameter is defined as the DFBETA statistic for the jth parameter divided by its estimated standard deviation, where the standard deviation is estimated from all the data.

normal upper D normal upper F normal upper B normal upper E normal upper T normal upper A normal upper S Subscript i j Baseline equals normal upper D normal upper F normal upper B normal upper E normal upper T normal upper A Subscript i j Baseline slash ModifyingAbove sigma With caret left-parenthesis beta Subscript j Baseline right-parenthesis
DOBS |COOKD |COOKSD

In normal linear regression, the influence of observation i can be measured by Cook’s distance (Cook and Weisberg 1982). A measure of influence of observation i for generalized linear models that is equivalent to Cook’s distance for normal linear regression is given by

normal upper D normal upper O normal upper B normal upper S Subscript i Baseline equals p Superscript negative 1 Baseline h Subscript i Baseline left-parenthesis 1 minus h Subscript i Baseline right-parenthesis Superscript negative 1 Baseline r Subscript p i Superscript 2

where h Subscript i is the leverage defined in the section H |LEVERAGE. This measure is the one-step approximation to 2 p Superscript negative 1 Baseline left-bracket upper L left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis minus upper L left-parenthesis ModifyingAbove bold-italic beta With caret Subscript left-bracket i right-bracket Baseline right-parenthesis right-bracket, where upper L left-parenthesis bold-italic beta right-parenthesis is the log likelihood evaluated at bold-italic beta.

H |LEVERAGE

The Fisher scores, or expected, weight for observation i is w Subscript e i Baseline equals StartFraction w Subscript i Baseline Over phi upper V left-parenthesis mu Subscript i Baseline right-parenthesis left-parenthesis g prime left-parenthesis mu Subscript i Baseline right-parenthesis right-parenthesis squared EndFraction. Let bold upper W be the diagonal matrix with w Subscript e i as the ith diagonal. The leverage h Subscript i of the ith observation is defined as the ith diagonal element of the hat matrix

bold upper H equals bold upper W Superscript one-half Baseline bold upper X left-parenthesis bold upper X prime bold upper W bold upper X right-parenthesis Superscript negative 1 Baseline bold upper X prime bold upper W Superscript one-half

Diagnostics for Models Fit by Generalized Estimating Equations (GEEs)

The diagnostic statistics in this section were developed by Preisser and Qaqish (1996). See the section Generalized Estimating Equations for further information and notation for generalized estimating equations (GEEs). The following additional notation is used in this section.

Partition the design matrix bold upper X and response vector bold upper Y by cluster; that is, let bold upper X equals left-parenthesis upper X prime 1 comma ellipsis comma upper X prime Subscript upper K right-parenthesis prime, and bold upper Y equals left-parenthesis upper Y prime 1 comma ellipsis comma upper Y prime Subscript upper K right-parenthesis prime corresponding to the K clusters.

Let n Subscript i be the number of responses for cluster i, and denote by upper N equals sigma-summation Underscript i equals 1 Overscript upper K Endscripts n Subscript i the total number of observations. Denote by upper A Subscript i the n Subscript i Baseline times n Subscript i diagonal matrix with upper V left-parenthesis mu Subscript i j Baseline right-parenthesis as the jth diagonal element. If there is a WEIGHT statement, the diagonal element of upper A Subscript i is upper V left-parenthesis mu Subscript i j Baseline right-parenthesis slash w Subscript i j, where w Subscript i j is the specified weight of the jth observation in the ith cluster. Let bold upper B the upper N times upper N diagonal matrix with g prime left-parenthesis mu Subscript i j Baseline right-parenthesis as diagonal elements, i equals 1 comma ellipsis comma upper K, j equals 1 comma ellipsis comma n Subscript i Baseline. Let bold upper B Subscript i the n Subscript i Baseline times n Subscript i diagonal matrix corresponding to cluster i with g prime left-parenthesis mu Subscript i j Baseline right-parenthesis as the jth diagonal element.

Let bold upper W be the upper N times upper N block diagonal weight matrix whose ith block, corresponding to the ith cluster, is the n Subscript i Baseline times n Subscript i matrix

bold upper W Subscript e i Baseline equals bold upper B Subscript i Superscript negative 1 Baseline bold upper A Subscript i Superscript negative one-half Baseline bold upper R Subscript i Superscript negative 1 Baseline left-parenthesis ModifyingAbove bold-italic alpha With caret right-parenthesis bold upper A Subscript i Superscript negative one-half Baseline bold upper B Subscript i Superscript negative 1

where bold upper R Subscript i is the working correlation matrix for cluster i.

Let

upper Q Subscript i Baseline equals bold upper X Subscript i Baseline left-parenthesis bold upper X prime bold upper W bold upper X right-parenthesis Superscript negative 1 Baseline bold upper X prime Subscript i

where bold upper X Subscript i is the n Subscript i Baseline times p design matrix corresponding to cluster i.

Define the adjusted residual vector as

bold upper E equals bold upper B left-parenthesis bold upper Y minus ModifyingAbove bold-italic mu With caret right-parenthesis

and bold upper E Subscript i Baseline equals bold upper B Subscript i Baseline left-parenthesis bold upper Y Subscript i Baseline minus ModifyingAbove bold-italic mu With caret Subscript i Baseline right-parenthesis, the estimated residual for the ith cluster.

Let the subscript left-bracket i right-bracket denote estimates evaluated without the ith cluster, left-bracket i t right-bracket estimates evaluated using all the data except the tth observation of the ith cluster, and let i left-bracket t right-bracket denote matrices corresponding to the ith cluster without the tth observation.

The following statistics are available for generalized estimating equation models.

CH |CLUSTERH |CLEVERAGE

The leverage of cluster i is contained in the matrix bold upper H Subscript i Baseline equals bold upper Q Subscript i Baseline bold upper W Subscript e i, and is summarized by the trace of bold upper H Subscript i,

c h Subscript i Baseline equals normal t normal r left-parenthesis bold upper H Subscript i Baseline right-parenthesis

The leverage h Subscript i of the tth observation in the ith cluster is the tth diagonal element of bold upper H Subscript i.

DFBETAC

The effect of deleting cluster i on the estimated parameter vector is given by the following one-step approximation for ModifyingAbove bold-italic beta With caret minus ModifyingAbove bold-italic beta With caret Subscript left-bracket i right-bracket:

normal upper D normal upper B normal upper E normal upper T normal upper A normal upper C Subscript i Baseline equals left-parenthesis bold upper X prime bold upper W bold upper X right-parenthesis Superscript negative 1 Baseline bold upper X prime Subscript i Baseline left-parenthesis bold upper W Subscript e i Superscript negative 1 Baseline minus bold upper Q Subscript i Baseline right-parenthesis Superscript negative 1 Baseline bold upper E Subscript i
DFBETACS

The cluster deletion statistic DFBETAC can be standardized using the variances of ModifyingAbove bold-italic beta With caret based on the complete data. The standardized one-step approximation for the change in ModifyingAbove beta With caret Subscript j due to deletion of cluster i is

normal upper D normal upper B normal upper E normal upper T normal upper A normal upper C normal upper S Subscript i j Baseline equals StartFraction normal upper D normal upper B normal upper E normal upper T normal upper A normal upper C Subscript i j Baseline Over ModifyingAbove phi With caret left-bracket left-parenthesis bold upper X prime bold upper W bold upper X right-parenthesis Superscript negative 1 Baseline right-bracket Subscript j j Superscript one-half Baseline EndFraction
DFBETA

Partition the matrices bold upper W Subscript e i and bold upper V Subscript i as

bold upper W Subscript e i Baseline equals Start 2 By 2 Matrix 1st Row 1st Column upper W Subscript e i t Baseline 2nd Column bold upper W Subscript e i t left-bracket t right-bracket Baseline 2nd Row 1st Column bold upper W Subscript e i left-bracket t right-bracket t Baseline 2nd Column bold upper W Subscript e i left-bracket t right-bracket EndMatrix
bold upper V Subscript i Baseline equals bold upper W Subscript e i Superscript negative 1 Baseline equals Start 2 By 2 Matrix 1st Row 1st Column bold upper V Subscript i t Baseline 2nd Column bold upper V Subscript i t left-bracket t right-bracket Baseline 2nd Row 1st Column bold upper V Subscript i left-bracket t right-bracket t Baseline 2nd Column bold upper V Subscript i left-bracket t right-bracket EndMatrix

and let bold upper E Subscript i t Baseline equals bold upper B Subscript i t Baseline left-parenthesis bold upper Y Subscript i t Baseline minus ModifyingAbove mu With caret Subscript i t Baseline right-parenthesis and bold upper E Subscript i left-bracket t right-bracket Baseline equals bold upper B Subscript i left-bracket t right-bracket Baseline left-parenthesis bold upper Y Subscript i left-bracket t right-bracket Baseline minus ModifyingAbove mu With caret Subscript i left-bracket t right-bracket Baseline right-parenthesis.

The effect of deleting the tth observation from the ith cluster is given by the following one-step approximation to ModifyingAbove bold-italic beta With caret minus ModifyingAbove bold-italic beta With caret Subscript left-bracket i t right-bracket:

bold upper D bold upper B bold upper E bold upper T bold upper A bold upper O Subscript i t Baseline equals left-parenthesis bold upper X prime bold upper W bold upper X right-parenthesis Superscript negative 1 Baseline bold upper X overTilde prime Subscript i t Baseline StartFraction upper E overTilde Subscript i t Baseline Over upper W Subscript e i t Superscript negative 1 Baseline minus upper Q overTilde Subscript i t Baseline EndFraction

where bold upper X overTilde Subscript i t Baseline equals bold upper X Subscript i t Baseline minus bold upper V Subscript i t left-bracket t right-bracket Baseline bold upper V Subscript i left-bracket t right-bracket Superscript negative 1 Baseline bold upper X Subscript i left-bracket t right-bracket, upper Q overTilde Subscript i t Baseline equals bold upper X overTilde Subscript i t Baseline left-parenthesis bold upper X prime bold upper W bold upper X right-parenthesis Superscript negative 1 Baseline upper X overTilde prime Subscript i t, and upper E overTilde Subscript i t Baseline equals bold upper E Subscript i t Baseline minus bold upper V Subscript i t left-bracket t right-bracket Baseline bold upper V Subscript i left-bracket t right-bracket Superscript negative 1 Baseline bold upper E Subscript i left-bracket t right-bracket. Note that upper W Subscript e i t, upper Q overTilde Subscript i t, and upper E overTilde Subscript i t are scalars.

DFBETAS

The observation deletion statistic DFBETA can be standardized using the variances of ModifyingAbove bold-italic beta With caret based on the complete data. The standardized one-step approximation for the change in ModifyingAbove beta With caret Subscript j due to deletion of observation t in cluster i is

normal upper D normal upper B normal upper E normal upper T normal upper A normal upper O normal upper S Subscript i t j Baseline equals StartFraction normal upper D normal upper B normal upper E normal upper T normal upper A normal upper O Subscript i t j Baseline Over ModifyingAbove phi With caret left-bracket left-parenthesis bold upper X prime bold upper W bold upper X right-parenthesis Superscript negative 1 Baseline right-bracket Subscript j j Superscript one-half Baseline EndFraction
DCLS |CLUSTERCOOKD |CLUSTERCOOKSD

A measure of the standardized influence of the subset m of observations on the overall fit is left-parenthesis ModifyingAbove bold-italic beta With caret minus ModifyingAbove bold-italic beta With caret Subscript left-bracket m right-bracket Baseline right-parenthesis prime left-parenthesis bold upper X prime bold upper W bold upper X right-parenthesis left-parenthesis ModifyingAbove bold-italic beta With caret minus ModifyingAbove bold-italic beta With caret Subscript left-bracket m right-bracket Baseline right-parenthesis slash p ModifyingAbove phi With caret. For deletion of cluster i, this is approximated by

normal upper D normal upper C normal upper L normal upper S Subscript i Baseline equals bold upper E prime Subscript i Baseline left-parenthesis bold upper W Subscript e i Superscript negative 1 Baseline minus bold upper Q Subscript i Baseline right-parenthesis Superscript negative 1 Baseline right-parenthesis bold upper Q Subscript i Baseline left-parenthesis bold upper W Subscript e i Superscript negative 1 Baseline minus bold upper Q Subscript i Baseline right-parenthesis Superscript negative 1 Baseline right-parenthesis bold upper E Subscript i Baseline slash p ModifyingAbove phi With caret
DOBS |COOKD |COOKSD

The measure of overall fit in the section DCLS |CLUSTERCOOKD |CLUSTERCOOKSD for the deletion of the tth observation in the ith cluster is approximated by

normal upper D normal upper O normal upper B normal upper S Subscript i t Baseline equals StartFraction upper E overTilde Subscript i t Superscript 2 Baseline upper Q overTilde Subscript i t Baseline Over p ModifyingAbove phi With caret left-parenthesis upper W Subscript e i t Superscript negative 1 Baseline minus upper Q overTilde Subscript i t Baseline right-parenthesis squared EndFraction

where upper E overTilde Subscript i t, upper Q overTilde Subscript i t, and upper W Subscript e i t are defined in the section DFBETA. In the case of the independence working correlation, this is equal to the measure for ordinary generalized linear models defined in the section DOBS |COOKD |COOKSD.

MCLS |CLUSTERDFIT

A studentized distance measure of the type defined in the section DCLS |CLUSTERCOOKD |CLUSTERCOOKSD of the influence of the ith cluster is given by

normal upper M normal upper C normal upper L normal upper S Subscript i Baseline equals bold upper E prime Subscript i Baseline left-parenthesis bold upper W Subscript e i Superscript negative 1 Baseline minus bold upper Q Subscript i Baseline right-parenthesis Superscript negative 1 Baseline bold upper H Subscript i Baseline bold upper E Subscript i Baseline slash p ModifyingAbove phi With caret
Last updated: December 09, 2022