The GLIMMIX Procedure

Pseudo-likelihood Estimation for Weighted Multilevel Models

Multilevel models provide a flexible and powerful tool for the analysis of data that are observed in nested units at multiple levels. A multilevel model is a special case of generalized linear mixed models that can be handled by the GLIMMIX procedure. In proc GLIMMIX, the SUBJECT= option in the RANDOM statement identifies the clustering structure for the random effects. When the subjects of the multiple RANDOM statements are nested, the model is a multilevel model and each RANDOM statement corresponds to one level.

Using a pseudo-maximum-likelihood approach, you can extend the multilevel model framework to accommodate weights at different levels. Such an approach is very useful in analyzing survey data that arise from multistage sampling. In these sampling designs, survey weights are often constructed to account for unequal sampling probabilities, nonresponse adjustments, and poststratification.

The following survey example from a three-stage sampling design illustrates the use of multiple levels of weighting in a multilevel model. Extending this example to models that have more than three levels is straightforward. Let i equals 1 comma ellipsis comma n Superscript left-parenthesis 3 right-parenthesis, j equals 1 comma ellipsis n Subscript i Superscript left-parenthesis 2 right-parenthesis, and k equals 1 comma ellipsis n Subscript j Superscript left-parenthesis 1 right-parenthesis denote the indices of units at level 3, level 2, and level 1, respectively. Let superscript left-parenthesis l right-parenthesis denote the lth level and n Superscript left-parenthesis l right-parenthesis denote the number of level-l units in the sample. Assume that the first-stage cluster (level-3 unit) i is selected with probability pi Subscript i; the second-stage cluster (level-2 unit) j is selected with probability pi Subscript j vertical-bar i, given that the first-stage cluster i is already selected in the sample; and the third-stage unit (level-1 unit) k is selected with probability pi Subscript k vertical-bar i j, given that the second-stage cluster j within the first-stage cluster i is already selected in the sample.

If you use the inverse selection probability weights w Subscript j vertical-bar i Baseline equals 1 slash pi Subscript j vertical-bar i, w Subscript k vertical-bar i j Baseline equals 1 slash pi Subscript k vertical-bar i j, the conditional log likelihood contribution of the first-stage cluster i is

log left-parenthesis p left-parenthesis y Subscript i Baseline vertical-bar gamma Subscript i Superscript left-parenthesis 2 right-parenthesis Baseline comma gamma Subscript i Superscript left-parenthesis 3 right-parenthesis Baseline right-parenthesis right-parenthesis equals sigma-summation Underscript j equals 1 Overscript n Subscript i Superscript left-parenthesis 2 right-parenthesis Baseline Endscripts w Subscript j vertical-bar i Baseline sigma-summation Underscript k equals 1 Overscript n Subscript j Superscript left-parenthesis 1 right-parenthesis Baseline Endscripts w Subscript k vertical-bar i j Baseline log left-parenthesis p left-parenthesis y Subscript i j k Baseline vertical-bar gamma Subscript i j Superscript left-parenthesis 2 right-parenthesis Baseline gamma Subscript i Superscript left-parenthesis 3 right-parenthesis Baseline right-parenthesis right-parenthesis

where gamma Subscript i j Superscript left-parenthesis 2 right-parenthesis is the random-effects vector for the jth second-stage cluster, gamma Subscript i Superscript left-parenthesis 2 right-parenthesis Baseline equals left-parenthesis gamma prime Subscript i Baseline 1 Baseline comma gamma prime Subscript i Baseline 2 Baseline comma ellipsis comma gamma prime Subscript i n Sub Subscript i Sub Superscript left-parenthesis 2 right-parenthesis Subscript right-parenthesis prime, and gamma Subscript i Superscript left-parenthesis 3 right-parenthesis is the random-effects vector for the ith first-stage cluster.

As with unweighted multilevel models, the adaptive quadrature method is used to compute the pseudo-likelihood of the first-stage cluster i:

p left-parenthesis y Subscript i Baseline right-parenthesis equals integral p left-parenthesis y Subscript i Baseline vertical-bar gamma Subscript i Superscript left-parenthesis 2 right-parenthesis Baseline comma gamma Subscript i Superscript left-parenthesis 3 right-parenthesis Baseline right-parenthesis p left-parenthesis gamma Subscript i Superscript left-parenthesis 2 right-parenthesis Baseline right-parenthesis p left-parenthesis gamma Subscript i Superscript left-parenthesis 3 right-parenthesis Baseline right-parenthesis d left-parenthesis gamma Subscript i Superscript left-parenthesis 2 right-parenthesis Baseline right-parenthesis d left-parenthesis gamma Subscript i Superscript left-parenthesis 3 right-parenthesis Baseline right-parenthesis

The total log pseudo-likelihood is

log left-parenthesis p left-parenthesis y right-parenthesis right-parenthesis equals sigma-summation Underscript i equals 1 Overscript n Superscript left-parenthesis 3 right-parenthesis Baseline Endscripts w Subscript i Baseline log left-parenthesis p left-parenthesis y Subscript i Baseline right-parenthesis right-parenthesis

where w Subscript i Baseline equals 1 slash pi Subscript i.

To illustrate weighting in a multilevel model, consider the following data set. In these simulated data, the response y is a Poisson-distributed count, w3 is the weight for the first-stage clusters, w2 is the weight for the second-stage clusters, and w1 is the weight for the observation-level units.

data d;
   input A w3 AB w2 w1 y x;
   datalines;
1    6.1     1    5.3     7.1    56    -.214
1    6.1     1    5.3     3.9    41    0.732
1    6.1     2    7.3     6.3    50    0.372
1    6.1     2    7.3     3.9    36    -.892
1    6.1     3    4.6     8.4    39    0.424
1    6.1     3    4.6     6.3    35    -.200
2    8.5     1    4.8     7.4    30    0.868
2    8.5     1    4.8     6.7    25    0.110
2    8.5     2    8.4     3.5    36    0.004
2    8.5     2    8.4     4.1    28    0.755
2    8.5     3    .80     3.8    33    -.600
2    8.5     3    .80     7.4    30    -.525
3    9.5     1    8.2     6.7    32    -.454
3    9.5     1    8.2     7.1    24    0.458
3    9.5     2     11     4.8    31    0.162
3    9.5     2     11     7.9    27    1.099
3    9.5     3    3.9     3.8    15    -1.57
3    9.5     3    3.9     5.5    19    -.448
4    4.5     1    8.0     5.7    30    -.468
4    4.5     1    8.0     2.9    25    0.890
4    4.5     2    6.0     5.0    35    0.635
4    4.5     2    6.0     3.0    30    0.743
4    4.5     3    6.8     7.3    17    -.015
4    4.5     3    6.8     3.1    18    -.560
;

You can use the following statements to fit a weighted three-level model:

proc glimmix  method=quadrature empirical=classical;
   class A AB;
   model y = x / dist=poisson link=log obsweight=w1;
   random int  / subject=A weight=w3;
   random int  / subject=AB(A) weight=w2;
run;

The SUBJECT= option in the first and second RANDOM statements specifies the first-stage and second-stage clusters A and AB(A), respectively. The OBSWEIGHT= option in the MODEL statement specifies the variable for the weight at the observation level. The WEIGHT= option in the RANDOM statement specifies the variable for the weight at the level that is specified by the SUBJECT= option.

For inference about fixed effects and variance that are estimated by pseudo-likelihood, you can use the empirical (sandwich) variance estimators. For weighted multilevel models, the only empirical estimator available in PROC GLIMMIX is EMPIRICAL=CLASSICAL. The EMPIRICAL=CLASSICAL variance estimator can be described as follows.

Let alpha equals left-parenthesis beta prime comma theta Superscript prime Baseline right-parenthesis prime, where beta is vector of the fixed-effects parameters and theta is the vector of covariance parameters. For an L-level model, Rabe-Hesketh and Skrondal (2006) show that the gradient can be written as a weighted sum of the gradients of the top-level units:

sigma-summation Underscript i equals 1 Overscript n Superscript left-parenthesis upper L right-parenthesis Baseline Endscripts w Subscript i Baseline StartFraction partial-differential log left-parenthesis p left-parenthesis y Subscript i Baseline semicolon alpha right-parenthesis right-parenthesis Over partial-differential alpha EndFraction identical-to sigma-summation Underscript i equals 1 Overscript n Superscript left-parenthesis upper L right-parenthesis Baseline Endscripts upper S Subscript i Baseline left-parenthesis alpha right-parenthesis

where n Superscript left-parenthesis upper L right-parenthesis is the number of level-L units and upper S Subscript i Baseline left-parenthesis alpha right-parenthesis is the weighted score vector of the top-level unit i. The estimator of the "meat" of the sandwich estimator can be written as

upper J equals StartFraction n Superscript left-parenthesis upper L right-parenthesis Baseline Over n Superscript left-parenthesis upper L right-parenthesis Baseline minus 1 EndFraction sigma-summation Underscript i equals 1 Overscript n Superscript left-parenthesis upper L right-parenthesis Baseline Endscripts upper S Subscript i Baseline left-parenthesis ModifyingAbove alpha With caret right-parenthesis upper S Subscript i Baseline left-parenthesis ModifyingAbove alpha With caret right-parenthesis prime

The empirical estimator of the covariance matrix of ModifyingAbove alpha With caret can be constructed as

upper H left-parenthesis ModifyingAbove alpha With caret right-parenthesis Superscript negative 1 Baseline upper J upper H left-parenthesis ModifyingAbove alpha With caret right-parenthesis Superscript negative 1

where upper H left-parenthesis alpha right-parenthesis is the second derivative matrix of the log pseudo-likelihood with respect to alpha.

The covariance parameter estimators that are obtained by the pseudo-maximum-likelihood method can be biased when the sample size is small. Pfeffermann et al. (1998) and Rabe-Hesketh and Skrondal (2006) discuss two weight-scaling methods for reducing the biases of the covariance parameter estimators in a two-level model. To derive the scaling factor lamda for a two-level model, let n Subscript i denote the number of level-1 units in the level-2 unit i and let w Subscript j vertical-bar i denote the weight of the jth level-1 unit in level-2 unit i. The first method computes an "apparent" cluster size as the "effective" sample size:

sigma-summation Underscript j equals 1 Overscript n Subscript i Baseline Endscripts lamda w Subscript j vertical-bar i Baseline equals StartFraction left-parenthesis sigma-summation Underscript j equals 1 Overscript n Subscript i Baseline Endscripts w Subscript j vertical-bar i Baseline right-parenthesis squared Over sigma-summation Underscript j equals 1 Overscript n Subscript i Baseline Endscripts w Subscript j vertical-bar i Superscript 2 Baseline EndFraction

Therefore the scale factor is

lamda equals StartFraction sigma-summation Underscript j equals 1 Overscript n Subscript i Baseline Endscripts w Subscript j vertical-bar i Baseline Over sigma-summation Underscript j equals 1 Overscript n Subscript i Baseline Endscripts w Subscript j vertical-bar i Superscript 2 Baseline EndFraction

The second method sets the apparent cluster size equal to the actual cluster size so that the scale factor is

lamda equals StartFraction n Subscript i Baseline Over sigma-summation Underscript j equals 1 Overscript n Subscript i Baseline Endscripts w Subscript j vertical-bar i Baseline EndFraction

PROC GLIMMIX uses the weights provided in the data set directly. To use the scaled weights, you need to provide them in the data set.

Last updated: December 09, 2022