The CALIS Procedure

Gradient, Hessian, Information Matrix, and Approximate Standard Errors

For a single-sample setting with a discrepancy function upper F equals upper F left-parenthesis bold upper Sigma left-parenthesis bold upper Theta right-parenthesis comma bold-italic mu left-parenthesis bold upper Theta right-parenthesis semicolon bold upper S comma bold x overbar right-parenthesis, the gradient is defined as the first partial derivatives of the discrepancy function with respect to the model parameters bold upper Theta:

g left-parenthesis bold upper Theta right-parenthesis equals StartFraction partial-differential Over partial-differential bold upper Theta EndFraction upper F left-parenthesis bold upper Theta right-parenthesis

The Hessian is defined as the second partial derivatives of the discrepancy function with respect to the model parameters bold upper Theta:

upper H left-parenthesis bold upper Theta right-parenthesis equals StartFraction partial-differential squared Over partial-differential bold upper Theta partial-differential bold upper Theta prime EndFraction upper F left-parenthesis bold upper Theta right-parenthesis

Suppose that the mean and covariance structures fit perfectly with bold upper Theta equals bold upper Theta Subscript o in the population. The expected information matrix is defined as

upper I left-parenthesis bold upper Theta Subscript o Baseline right-parenthesis equals one-half script upper E left-parenthesis upper H left-parenthesis bold upper Theta Subscript o Baseline right-parenthesis right-parenthesis

where the expectation script upper E left-parenthesis dot right-parenthesis is taken over the sampling space of bold upper S and bold x overbar. Hence, the expected information matrix upper I left-parenthesis bold upper Theta Subscript o Baseline right-parenthesis does not contain any sample values.

The expected information matrix plays a significant role in statistical theory. Under certain regularity conditions, the inverse of the information matrix upper I Superscript negative 1 Baseline left-parenthesis bold upper Theta Subscript o Baseline right-parenthesis is the asymptotic covariance matrix for StartRoot upper N EndRoot left-parenthesis ModifyingAbove bold upper Theta With caret minus bold upper Theta Subscript o Baseline right-parenthesis, where N denotes the sample size and ModifyingAbove bold upper Theta With caret is an estimator.

In practice, bold upper Theta Subscript o is never known and can only be estimated. The information matrix is therefore evaluating at the sample estimate ModifyingAbove bold upper Theta With caret and is denoted as

upper I left-parenthesis ModifyingAbove bold upper Theta With caret right-parenthesis

This is the information matrix that PROC CALIS displays in the output.

For a sample of size N, PROC CALIS computes the estimated covariance matrix of ModifyingAbove bold upper Theta With caret by

left-parenthesis left-parenthesis upper N minus 1 right-parenthesis upper I left-parenthesis ModifyingAbove bold upper Theta With caret right-parenthesis right-parenthesis Superscript negative 1

It then computes approximate standard errors for ModifyingAbove bold upper Theta With caret as the square roots of the diagonal elements of this estimated covariance matrix. This formula is based on the expected information and is the default standard error method (INFORMATION=EXP) for the ML, MLSB, GLS, and WLS estimation methods.

In contrast, by default the FIML estimation method computes standard error estimates based on the so-called observed information matrix (INFORMATION=OBS), which is defined as

upper I Subscript o b s Baseline left-parenthesis bold upper Theta Subscript o Baseline right-parenthesis equals one-half upper H left-parenthesis bold upper Theta Subscript o Baseline right-parenthesis

The critical difference between upper I left-parenthesis bold upper Theta Subscript o Baseline right-parenthesis and upper I Subscript o b s Baseline left-parenthesis bold upper Theta Subscript o Baseline right-parenthesis is that the latter does not take the expectation of upper H left-parenthesis bold upper Theta Subscript o Baseline right-parenthesis over the distribution of sample statistics. Kenward and Molenberghs (1998) show that the use of the expected information leads to biased standard errors when the missing data mechanism satisfies only the missing at random (MAR; see Rubin 1976) condition but not the missing completely at random (MCAR) condition. Under the MAR condition, the observed information matrix is the correct choice. Because the FIML estimation is mostly applied when the data contain missing values, using the observed information by default is quite reasonable.

In practice, the observed information is computed by

upper I Subscript o b s Baseline left-parenthesis ModifyingAbove bold upper Theta With caret right-parenthesis

and the estimated covariance matrix of ModifyingAbove bold upper Theta With caret is given by

left-parenthesis left-parenthesis upper N minus 1 right-parenthesis upper I Subscript o b s Baseline left-parenthesis ModifyingAbove bold upper Theta With caret right-parenthesis right-parenthesis Superscript negative 1

However, PROC CALIS does not compute upper I Subscript o b s Baseline left-parenthesis ModifyingAbove bold upper Theta With caret right-parenthesis analytically. It computes upper I Subscript o b s Baseline left-parenthesis ModifyingAbove bold upper Theta With caret right-parenthesis by the finite-difference method based on the analytic formulas of the first-order partial derivatives of F.

Finally, PROC CALIS does not compute standard errors when you use the ULS and DWLS estimation methods.

If a particular information matrix is singular, PROC CALIS offers two ways to compute a generalized inverse of the matrix and, therefore, two ways to compute approximate standard errors of implicitly constrained parameter estimates, t values , and modification indices . Depending on the G4= specification, either a Moore-Penrose inverse or a G2 inverse is computed. The computationally expensive Moore-Penrose inverse calculates an estimate of the null space by using an eigenvalue decomposition. The computationally cheaper G2 inverse is produced by sweeping the linearly independent rows and columns and zeroing out the dependent ones.

Satorra-Bentler Sandwich Formula for Standard Errors

In addition to the scaled chi-square statistics, Satorra and Bentler (1994) propose the so-called sandwich formula for computing standard errors. For ML estimation, let upper C left-parenthesis ModifyingAbove bold upper Theta With caret right-parenthesis be the estimated covariance matrix of the parameter estimates, obtained through either the expected or observed information matrix formula. The Satorra-Bentler sandwich formula for the estimated covariance matrix is of the form

upper C Subscript upper S upper B Baseline left-parenthesis ModifyingAbove bold upper Theta With caret right-parenthesis equals upper C left-parenthesis ModifyingAbove bold upper Theta With caret right-parenthesis bold upper Upsilon left-parenthesis ModifyingAbove bold upper Sigma With caret right-parenthesis upper C left-parenthesis ModifyingAbove bold upper Theta With caret right-parenthesis

where normal upper Upsilon left-parenthesis ModifyingAbove bold upper Sigma With caret right-parenthesis depends on the model Jacobian, the weight matrix under the normal distribution theory, and the weight matrix under general distribution theory—all evaluated at the sample estimates or the sample data values. See Satorra and Bentler (1994) for detailed formulas.

If you specify METHOD=MLSB, PROC CALIS uses the Satorra-Bentler sandwich formula to compute standard error estimates. For all other estimation methods that can produce standard error estimates, it uses the unadjusted formula by default. To use the unadjusted formula for METHOD=MLSB, you can specify the STDERR=UNADJ option. To use the Satorra-Bentler sandwich formula for regular ML estimation, you can specify the STDERR=SBSW option.

Theoretically, if the population is truly multivariate normal, the weight matrix under normal distribution theory is correctly specified. Asymptotically, the first two terms in the formula for upper C Subscript upper S upper B Baseline left-parenthesis ModifyingAbove bold upper Theta With caret right-parenthesis cancel out, so that

upper C Subscript upper S upper B Baseline left-parenthesis ModifyingAbove bold upper Theta With caret right-parenthesis equals upper C left-parenthesis ModifyingAbove bold upper Theta With caret right-parenthesis

That is, you can use the unadjusted covariance formula to compute standard error estimates if the multivariate normality assumption is satisfied.

If the multivariate normal assumption is not true, then the full sandwich formula has to be involved. Specifically, the middle term, bold upper Upsilon left-parenthesis ModifyingAbove bold upper Sigma With caret right-parenthesis, in the sandwich formula needs to compute the normal-theory weight matrix and other quantities. Because the normal-theory weight matrix is a function of ModifyingAbove bold upper Sigma With caret, evaluation of bold upper Upsilon left-parenthesis ModifyingAbove bold upper Sigma With caret right-parenthesis depends on the choice of ModifyingAbove bold upper Sigma With caret. PROC CALIS uses the model-predicted covariance matrix by default (SBNTW=PRED). You can also use the sample covariance matrix by specifying the SBNTW=OBS option.

Multiple-Group Extensions

In the section Multiple-Group Discrepancy Function, the overall discrepancy function for multiple-group analysis is defined. The same notation is applied here. To begin with, the overall discrepancy function upper F left-parenthesis bold upper Theta right-parenthesis is expressed as a weighted sum of individual discrepancy functions upper F Subscript i’s for the groups as follows:

upper F left-parenthesis bold upper Theta right-parenthesis equals sigma-summation Underscript i equals 1 Overscript k Endscripts t Subscript i Baseline upper F Subscript i Baseline left-parenthesis bold upper Theta right-parenthesis

where

t Subscript i Baseline equals StartFraction upper N Subscript i Baseline minus 1 Over upper N minus k EndFraction

is the weight for group i,

upper N equals sigma-summation Underscript i equals 1 Overscript k Endscripts upper N Subscript i

is the total sample size, and upper N Subscript i is the sample size for group i.

The gradient g left-parenthesis bold upper Theta right-parenthesis and the Hessian upper H left-parenthesis bold upper Theta right-parenthesis are now defined as weighted sum of individual functions. That is,

g left-parenthesis bold upper Theta right-parenthesis equals sigma-summation Underscript i equals 1 Overscript k Endscripts t Subscript i Baseline g Subscript i Baseline left-parenthesis bold upper Theta right-parenthesis equals sigma-summation Underscript i equals 1 Overscript k Endscripts t Subscript i Baseline StartFraction partial-differential Over partial-differential bold upper Theta EndFraction upper F Subscript i Baseline left-parenthesis bold upper Theta right-parenthesis

and

upper H left-parenthesis bold upper Theta right-parenthesis equals sigma-summation Underscript i equals 1 Overscript k Endscripts t Subscript i Baseline upper H Subscript i Baseline left-parenthesis bold upper Theta right-parenthesis equals sigma-summation Underscript i equals 1 Overscript k Endscripts t Subscript i Baseline StartFraction partial-differential squared Over partial-differential bold upper Theta partial-differential bold upper Theta prime EndFraction upper F Subscript i Baseline left-parenthesis bold upper Theta right-parenthesis

Suppose that the mean and covariance structures fit perfectly with bold upper Theta equals bold upper Theta Subscript o in the population. If each t Subscript i converges to a fixed constant tau Subscript i (tau Subscript i Baseline greater-than 0) with increasing total sample size, the expected information matrix can be written as:

upper I left-parenthesis bold upper Theta Subscript o Baseline right-parenthesis equals one-half sigma-summation Underscript i equals 1 Overscript k Endscripts tau Subscript i Baseline script upper E left-parenthesis upper H Subscript i Baseline left-parenthesis bold upper Theta Subscript o Baseline right-parenthesis right-parenthesis

To compute the expected information empirically, ModifyingAbove bold upper Theta With caret replaces bold upper Theta Subscript o in the formula.

PROC CALIS computes the estimated covariance matrix of ModifyingAbove bold upper Theta With caret by:

left-parenthesis left-parenthesis upper N minus k right-parenthesis upper I left-parenthesis ModifyingAbove bold upper Theta With caret right-parenthesis right-parenthesis Superscript negative 1

Approximate standard errors for ModifyingAbove bold upper Theta With caret are then computed as the square roots of the diagonal elements of this estimated covariance matrix.

Again, by default the ML, MLSB, GLS, and WLS estimation use such an expected-information based method in the multiple-group setting. For the FIML estimation, the default standard error method is based on the observed information, which is defined as:

upper I Subscript o b s Baseline left-parenthesis bold upper Theta Subscript o Baseline right-parenthesis equals one-half sigma-summation Underscript i equals 1 Overscript k Endscripts tau Subscript i Baseline upper H Subscript i Baseline left-parenthesis bold upper Theta Subscript o Baseline right-parenthesis

Similar to the single-group analysis, standard errors are not computed with the ULS and DWLS estimation methods in the multiple-group setting.

Testing Rank Deficiency in the Approximate Covariance Matrix for Parameter Estimates

When computing the approximate covariance matrix and hence the standard errors for the parameter estimates, inversion of the scaled information matrix or Hessian matrix is involved. The numerical condition of the information matrix can be very poor in many practical applications, especially for the analysis of unscaled covariance data. The following four-step strategy is used for the inversion of the information matrix.

  1. The inversion (usually of a normalized matrix bold upper D Superscript negative 1 Baseline bold upper I bold upper D Superscript negative 1) is tried using a modified form of the Bunch and Kaufman (1977) algorithm, which allows the specification of a different singularity criterion for each pivot. The following three criteria for the detection of rank loss in the information matrix are used to specify thresholds:

    • ASING specifies absolute singularity.

    • MSING specifies relative singularity depending on the whole matrix norm.

    • VSING specifies relative singularity depending on the column matrix norm.

    If no rank loss is detected, the inverse of the information matrix is used for the covariance matrix of parameter estimates, and the next two steps are skipped.

  2. The linear dependencies among the parameter subsets are displayed based on the singularity criteria.

  3. If the number of parameters t is smaller than the value specified by the G4= option (the default value is 60), the Moore-Penrose inverse is computed based on the eigenvalue decomposition of the information matrix. If you do not specify the NOPRINT option, the distribution of eigenvalues is displayed, and those eigenvalues that are set to zero in the Moore-Penrose inverse are indicated. You should inspect this eigenvalue distribution carefully.

  4. If PROC CALIS did not set the right subset of eigenvalues to zero, you can specify the COVSING= option to set a larger or smaller subset of eigenvalues to zero in a further run of PROC CALIS.

Last updated: December 09, 2022