The CALIS Procedure

The LINEQS Model

The LINEQS modeling language is adapted from the EQS (equations) program by Bentler (1995). The statistical models that LINEQS or EQS analyzes are essentially the same as other general modeling languages such as LISMOD, RAM, and PATH. However, the terminology and approach of the LINEQS or EQS modeling language are different from other languages. They are based on the theoretical model developed by Bentler and Weeks (1980). For convenience, models that are analyzed using the LINEQS modeling language are called LINEQS models. Note that these so-called LINEQS models can also be analyzed by other general modeling languages in PROC CALIS.

In the LINEQS (or the original EQS) model, relationships among variables are represented by a system of equations. For example:

upper Y 1 equals a 0 plus a 1 upper X 1 plus a 2 upper X 2 plus upper E 1
upper Y 2 equals b 0 plus b 1 upper X 1 plus b 2 upper Y 1 plus upper E 2

On the left-hand side of each equation, an outcome variable is hypothesized to be a linear function of one or more predictor variables and an error, which are all specified on the right-hand side of the equation. The parameters specified in an equation are the effects (or regression coefficients) of the predictor variables. For example, in the preceding equations, upper Y 1 and upper Y 2 are outcome variables; upper E 1 and upper E 2 are error variables; a 1, a 2, b 1, and b 2 are effect parameters (or regression coefficients); and a 0 and b 0 are intercept parameters. Variables upper X 1 and upper X 2 serve as predictors in the first equation, while variables upper X 1 and upper Y 1 serve as predictors in the second equation.

This is almost the same representation as in multiple regression models. However, the LINEQS model entails more. It supports a system of equations that can also include latent variables, measurement errors, and correlated errors.

Types of Variables in the LINEQS Model

The distinction between dependent and independent variables is important in the LINEQS model.

A variable is dependent if it appears on the left-hand side of an equation in the model. A dependent variable might be observed (manifest) or latent. It might or might not appear on the right-hand side of other equations, but it cannot appear on the left-hand sides of two or more equations. Error variables cannot be dependent in the LINEQS model.

A variable in the LINEQS model is independent if it is not dependent. Independent variables can be observed (manifest) or latent. All error variables must be independent in the LINEQS model.

Dependent variables are also referred to as endogenous variables; these names are interchangeable. Similarly, independent variables are interchangeable with exogenous variables.

Whereas an outcome variable in any equation must be a dependent variable, a predictor variable in an equation is not necessarily an independent variable in the entire LINEQS model. For example, upper Y 1 is a predictor variable in the second equation of the preceding example, but it is a dependent variable in the LINEQS model. In summary, the predictor-outcome nature of a variable is determined within a single equation, while the exogenous-endogenous (independent-dependent) nature of variable is determined within the entire system of equations.

In addition to the dependent-independent variable distinction, variables in the LINEQS model are distinguished according to whether they are observed in the data. Variables that are observed in research are called observed or manifest variables. Hypothetical variables that are not observed in the LINEQS model are latent variables.

Two types of latent variables should be distinguished: one is error variables; the other is non-error variables. An error variable is unique to an equation. It serves as the unsystematic source of effect for the outcome variable in an equation. If the outcome variable in the equation is latent, the corresponding error variable is also called disturbance. In contrast, non-error or systematic latent variables are called factors. Factors are unmeasured hypothetical constructs in your model. They are systematic sources that explain or describe functional relationships in your model.

Both manifest variables and latent factors can be dependent or independent. However, error or disturbance terms must be independent (or exogenous) variables in your model.

Naming Variables in the LINEQS Model

Whether a variable in each equation is an outcome or a predictor variable is prescribed by the modeler. Whether a variable is independent or dependent can be determined by analyzing the entire system of equations in the model. Whether a variable is observed or latent can be determined if it is referenced in your data set. However, whether a latent variable serves as a factor or an error can be determined only if you provide the specific information.

To distinguish latent factors from errors and both from manifest variables, the following rules for naming variables in the LINEQS model are followed:

  • Manifest variables are referenced in the input data set. You use their names in the LINEQS model specification directly. There is no additional naming rule for the manifest variables in the LINEQS model beyond those required by the SAS System.

  • Latent factor variables must start with letter F or f (for factor).

  • Error variables must start with letter E or e (for error), or D or d (for disturbance). Although you might enforce the use of D- (or d-) variables for disturbances, it is not required. For flexibility, disturbance variables can also start with letter E or e in the LINEQS model.

  • The names of latent variables, errors, and disturbances (F-, E-, and D-variables) should not coincide with the names of manifest variables.

  • You should not use Intercept as a name for any variable. This name is reserved for the intercept specification in LINEQS model equations.

See the section Naming Variables and Parameters for the general rules about naming variables and parameters.

Matrix Representation of the LINEQS Model

As a programming language, the LINEQS model uses equations to describes relationships among variables. But as a mathematical model, the LINEQS model is more conveniently described by matrix terms. In this section, the LINEQS matrix model is described.

Suppose in a LINEQS model that there are n Subscript i independent variables and n Subscript d dependent variables. The vector of the independent variables is denoted by bold-italic xi, in the order of manifest variables, latent factors, and error variables. The vector of dependent variables is denoted by bold-italic eta, in the order of manifest variables and latent factors. The LINEQS model matrices are defined as follows:

bold-italic alpha left-parenthesis n Subscript d Baseline times 1 right-parenthesis :

intercepts of dependent variables

bold-italic beta left-parenthesis n Subscript d Baseline times n Subscript d Baseline right-parenthesis:

effects of dependent variables (in columns) on dependent variables (in rows)

bold-italic gamma left-parenthesis n Subscript d Baseline times n Subscript i Baseline right-parenthesis :

effects of independent variables (in columns) on dependent variables (in rows)

bold upper Phi left-parenthesis n Subscript i Baseline times n Subscript i Baseline right-parenthesis :

covariance matrix of independent variables

bold-italic nu left-parenthesis n Subscript i Baseline times 1 right-parenthesis :

means of independent variables

The model equation of the LINEQS model is

bold-italic eta equals bold-italic alpha plus bold-italic beta bold-italic eta plus bold-italic gamma bold-italic xi

Assuming that left-parenthesis bold upper I minus bold-italic beta right-parenthesis is invertible, under the model the covariance matrix of all variables left-parenthesis bold-italic eta prime comma bold-italic xi Superscript prime Baseline right-parenthesis prime is structured as

bold upper Sigma Subscript a Baseline equals Start 2 By 2 Matrix 1st Row 1st Column left-parenthesis bold upper I minus bold-italic beta right-parenthesis Superscript negative 1 Baseline bold-italic gamma bold upper Phi bold-italic gamma prime left-parenthesis bold upper I minus bold-italic beta right-parenthesis Superscript negative 1 prime Baseline 2nd Column left-parenthesis bold upper I minus bold-italic beta right-parenthesis Superscript negative 1 Baseline bold-italic gamma bold upper Phi 2nd Row 1st Column bold upper Phi bold-italic gamma prime left-parenthesis bold upper I minus bold-italic beta right-parenthesis Superscript negative 1 prime Baseline 2nd Column bold upper Phi EndMatrix

The mean vector of all variables left-parenthesis bold-italic eta prime comma bold-italic xi Superscript prime Baseline right-parenthesis prime is structured as

bold-italic mu Subscript a Baseline equals StartBinomialOrMatrix left-parenthesis bold upper I minus bold-italic beta right-parenthesis Superscript negative 1 Baseline left-parenthesis bold-italic alpha plus bold-italic gamma bold-italic nu right-parenthesis Choose bold-italic nu EndBinomialOrMatrix

As is shown in the structured covariance and mean matrices, the means bold upper G and covariances of independent variables are direct model parameters in bold-italic nu and bold upper Phi; whereas the means and covariances of dependent variables are functions of various model matrices and hence functions of model parameters.

The covariance and mean structures of all observed variables are obtained by selecting the elements in bold upper Sigma Subscript a and bold-italic mu Subscript a. Mathematically, define a selection matrix bold upper G of dimensions n times left-parenthesis n Subscript d Baseline plus n Subscript i Baseline right-parenthesis, where n is the number of observed variables in the model. The selection matrix bold upper G contains zeros and ones as its elements. Each row of bold upper G has exactly one nonzero element at the position that corresponds to the location of an observed row variable in bold upper Sigma Subscript a or bold-italic mu Subscript a. With each row of bold upper G selecting a distinct observed variable, the structured covariance matrix of all observed variables is represented by

bold upper Sigma equals bold upper G bold upper Sigma Subscript a Baseline bold upper G prime

The structured mean vector of all observed variables is represented by

bold-italic mu equals bold upper G bold-italic mu Subscript a

Partitions of Some LINEQS Model Matrices and Their Restrictions

There are some restrictions in some of the LINEQS model matrices. Although these restrictions do not affect the derivation of the covariance and mean structures, they are enforced in the LINEQS model specification.

Model Restrictions on the bold-italic beta Matrix

The diagonal of the bold-italic beta matrix must be zeros. This prevents the direct regression of dependent variables on themselves. Hence, in the LINEQS statement you cannot specify the same variable on both the left-hand and the right-hand sides of the same equation.

Partitions of the bold-italic gamma Matrix and the Associated Model Restrictions

The columns of the bold-italic gamma matrix refer to the variables in bold-italic xi, in the order of manifest variables, latent factors, and error variables. In the LINEQS model, the following partition of the bold-italic gamma matrix is assumed:

bold-italic gamma equals Start 1 By 2 Matrix 1st Row 1st Column bold-italic gamma 0 2nd Column bold upper E EndMatrix

where bold-italic gamma 0 is an n Subscript d Baseline times left-parenthesis n Subscript i Baseline minus n Subscript d Baseline right-parenthesis matrix for the effects of independent manifest variables and latent factors on the dependent variables and bold upper E is an n Subscript d Baseline times n Subscript d permutation matrix for the effects of errors on the dependent variables.

The dimension of submatrix bold upper E is n Subscript d Baseline times n Subscript d because in the LINEQS model each dependent variable signifies an equation with an error term. In addition, because bold upper E is a permutation matrix (which is formed by exchanging rows of an identity matrix of the same order), the partition of the bold-italic gamma matrix ensures that each dependent variable is associated with a unique error term and that the effect of each error term on its associated dependent variable is 1.

As a result of the error term restriction, in the LINEQS statement you must specify a unique error term in each equation. The coefficient associated with the error term can only be a fixed value at one, either explicitly (with 1.0 inserted immediately before the error term) or implicitly (with no coefficient specified).

Partitions of the bold-italic nu Vector and the Associated Model Restrictions

The bold-italic nu vector contains the means of independent variables, in the order of the manifest, latent factor, and error variables. In the LINEQS model, the following partition of the bold-italic nu vector is assumed:

bold-italic nu equals StartBinomialOrMatrix bold-italic nu 0 Choose 0 EndBinomialOrMatrix

where bold-italic nu 0 is an left-parenthesis n Subscript i Baseline minus n Subscript d Baseline right-parenthesis times 1 vector for the means of independent manifest variables and latent factors and 0 is a null vector of dimension n Subscript d for the means of errors or disturbances. Again, the dimension of the null vector is n Subscript d because each dependent variable is associated uniquely with an error term. This partition restricts the means of errors or disturbances to zeros.

Hence, when specifying a LINEQS model, you cannot specify the means of errors (or disturbances) as free parameter or fixed values other than zero in the MEAN statement.

Partitions of the bold upper Phi Matrix

The bold upper Phi matrix is for the covariances of the independent variables, in the order of the manifest, latent factor, and error variables. The following partition of the bold upper Phi matrix is assumed:

bold upper Phi equals Start 2 By 2 Matrix 1st Row 1st Column bold upper Phi 11 2nd Column bold upper Phi prime 21 2nd Row 1st Column bold upper Phi 21 2nd Column bold upper Phi 22 EndMatrix

where bold upper Phi 11 is an left-parenthesis n Subscript i Baseline minus n Subscript d Baseline right-parenthesis times left-parenthesis n Subscript i Baseline minus n Subscript d Baseline right-parenthesis covariance matrix for the independent manifest variables and latent factors, bold upper Phi 22 is an n Subscript d Baseline times n Subscript d covariance matrix for the errors, and bold upper Phi 21 is an n Subscript d Baseline times left-parenthesis n Subscript i Baseline minus n Subscript d Baseline right-parenthesis covariance matrix for the errors with other independent variables in the LINEQS model. Because bold upper Phi is symmetric, bold upper Phi 11 and bold upper Phi 22 are also symmetric.

There are actually no model restrictions placed on the submatrices of the partition. However, in most statistical applications, errors represent unsystematic sources of effects and therefore they are not to be correlated with other systematic sources. This implies that submatrix bold upper Phi 21 is a null matrix. However, bold upper Phi 21 being null is not enforced in the LINEQS model specification. If you ever specify a covariance between an error variable and a non-error independent variable in the COV statement, as a workaround trick or otherwise, you should provide your own theoretical justifications.

Summary of Matrices and Submatrices in the LINEQS Model

Let n Subscript d be the number of dependent variables and n Subscript i be the number of independent variables. The names, roles, and dimensions of the LINEQS model matrices and submatrices are summarized in the following table.

Matrix Name Description Dimensions
Model Matrices
bold-italic alpha _EQSALPHA_ Intercepts of dependent variables n Subscript d Baseline times 1
bold-italic beta _EQSBETA_ Effects of dependent (column) variables on dependent (row) variables n Subscript d Baseline times n Subscript d
bold-italic gamma _EQSGAMMA_ Effects of independent (column) variables on dependent (row) variables n Subscript d Baseline times n Subscript i
bold-italic nu _EQSNU_ Means of independent variables n Subscript i Baseline times 1
bold upper Phi _EQSPHI_ Covariance matrix of independent variables n Subscript i Baseline times n Subscript i
Submatrices
bold-italic gamma 0 _EQSGAMMA_SUB_ Effects of independent variables, excluding errors, on dependent variables n Subscript d Baseline times left-parenthesis n Subscript i Baseline minus n Subscript d Baseline right-parenthesis
bold-italic nu 0 _EQSNU_SUB_ Means of independent variables, excluding errors left-parenthesis n Subscript i Baseline minus n Subscript d Baseline right-parenthesis times 1
bold upper Phi 11 _EQSPHI11_ Covariance matrix of independent variables, excluding errors left-parenthesis n Subscript i Baseline minus n Subscript d Baseline right-parenthesis times
left-parenthesis n Subscript i Baseline minus n Subscript d Baseline right-parenthesis
bold upper Phi 21 _EQSPHI21_ Covariances of errors with other independent variables n Subscript d Baseline times left-parenthesis n Subscript i Baseline minus n Subscript d Baseline right-parenthesis
bold upper Phi 22 _EQSPHI22_ Covariance matrix of errors n Subscript d Baseline times n Subscript d

Specification of the LINEQS Model

Specification in Equations

In the LINEQS statement, you specify intercepts and effect parameters (or regression coefficients) along with the variable relationships in equations. In terms of model matrices, you specify the bold-italic alpha vector and the bold-italic beta and bold-italic gamma matrices in the LINEQS statement without using any matrix language.

For example:

upper Y equals b 0 plus b 1 asterisk upper X 1 plus b 2 asterisk upper F 2 plus upper E 1

In this equation, you specify Y as an outcome variable, upper X 1 and upper F 2 as predictor variables, and upper E 1 as an error variable. The parameters in the equation are the intercept b 0 and the path coefficients (or effects) b 1 and b 2.

This kind of model equation is specified in the LINEQS statement. For example, the previous equation translates into the following LINEQS statement specification:

lineqs Y = b0 * Intercept + b1 * X1 + b2 * F2 + E1;

If the mean structures of the model are not of interest, the intercept term can be omitted. The specification becomes:

lineqs Y =  b1 * X1 + b2 * F2 + E1;

See the LINEQS statement for the details about the syntax.

Because of the LINEQS model restrictions (see the section Partitions of Some LINEQS Model Matrices and Their Restrictions), you must also follow these rules when specifying LINEQS model equations:

  • A dependent variable can appear only on the left-hand side of an equation once. In other words, you must put all predictor variables for a dependent variable in one equation. This is different from some econometric models where a dependent variable can appear on the left-hand sides of two equations to represent an equilibrium point. However, this limitation can be resolved by reparameterization in some cases. See Example 33.18.

  • A dependent variable that appears on the left-hand side of an equation cannot appear on the right-hand side of the same equation. If you measure the same characteristic at different time points and the previous measurement serves as a predictor of the next measurement, you should use different variable names for the measurements so as to comply with this rule.

  • An error term must be specified in each equation and must be unique. The same error name cannot appear in two or more equations. When an equation is truly intended to have no error term, it should be represented equivalently in the LINEQS equation by introducing an error term with zero variance (specified in the VARIANCE statement).

  • The regression coefficient (effect) that is associated with an error term must be fixed at one (1.0). This is done automatically by omitting any fixed constants or parameters that are associated with the error terms. Inserting a parameter or a fixed value other than 1 immediately before an error term is not allowed.

Mean, Variance, and Covariance Parameter Specification

In addition to the intercept and effect parameters that are specified in equations, the means, variances, and covariances among all independent variables are parameters in the LINEQS model. An exception is that the means of all error variables are restricted to fixed zeros in the LINEQS model. To specify the mean, variance, and covariance parameters, you use the MEAN, VARIANCE, and the COV statements, respectively.

The means, variances, and covariances among dependent variables are not parameters themselves in the model. Rather, they are complex functions of the model parameters. See the section Matrix Representation of the LINEQS Model for mathematical details.

Default Parameters in the LINEQS Model

There are two types of default parameters of the LINEQS model, as implemented in PROC CALIS. One is the free parameters; the other is the fixed constants.

The following sets of parameters are free parameters by default:

  • the variances of all exogenous (independent) observed or latent variables (including error and disturbance variables)

  • the covariances among all exogenous (independent) manifest or latent variables (excluding error and disturbance variances)

  • the means of all exogenous (independent) observed variables if the mean structures are modeled

  • the intercepts of all endogenous (dependent) manifest variables if the mean structures are modeled

PROC CALIS names the default free parameters with the _Add prefix and a unique integer suffix. You can override the default free parameters by explicitly specifying them as free, constrained, or fixed parameters in the COV, LINEQS, MEAN, or VARIANCE statement.

Parameters that are not default free parameters in the LINEQS model are fixed constants by default. You can override almost all of the default fixed constants of the LINEQS model by using the COV, LINEQS, MEAN, or VARIANCE statement. You cannot override the following two sets of fixed constants:

  • fixed zero parameters for the direct effects (path coefficients) of variables on their own. You cannot have an equation in the LINEQS statement that has the same variable specified on the left-hand and the right-hand sides.

  • fixed one effects from the error or disturbance variables. You cannot set the path coefficient (effect) of the error or disturbance term to any value other than 1 in the LINEQS statement.

These two sets of fixed parameters reflect the LINEQS model restrictions so that they cannot be modified. Other than these two sets of default fixed parameters, all other default fixed parameters are zeros. You can override these default zeros by explicitly specifying them as free, constrained, or fixed parameters in the COV, LINEQS, MEAN, or VARIANCE statement.

Last updated: December 09, 2022