Introduction to Structural Equation Modeling with Latent Variables

Errors-in-Variables Regression

(View the complete code for this example.)

For ordinary unconstrained regression models, there is no reason to use PROC CALIS instead of PROC REG. But suppose that the predictor variable X is a random variable that is contaminated by errors (especially measurement errors), and you want to estimate the linear relationship between the true, error-free scores. The following model takes this kind of measurement errors into account:

StartLayout 1st Row 1st Column upper Y 2nd Column equals 3rd Column alpha plus beta upper F Subscript upper X plus upper E Subscript upper Y 2nd Row 1st Column upper X 2nd Column equals 3rd Column upper F Subscript upper X Baseline plus upper E Subscript upper X EndLayout

The model assumes the following:

Cov left-parenthesis upper F Subscript upper X Baseline comma upper E Subscript upper Y Baseline right-parenthesis equals Cov left-parenthesis upper F Subscript upper X Baseline comma upper E Subscript upper X Baseline right-parenthesis equals Cov left-parenthesis upper E Subscript upper X Baseline comma upper E Subscript upper Y Baseline right-parenthesis equals 0

There are two equations in the model. The first one is the so-called structural model, which describes the relationships between Y and the true score predictor upper F Subscript upper X. This equation is your main interest. However, upper F Subscript upper X is a latent variable that has not been observed. Instead, what you have observed for this predictor is X, which is the contaminated version of upper F Subscript upper X with measurement error or other errors, denoted by upper E Subscript upper X, added. This measurement process is described in the second equation, or the so-called measurement model. By analyzing the structural and measurement models (or the two linear equations) simultaneously, you want to estimate the true score effect beta.

The assumption that the error terms upper E Subscript upper X and upper E Subscript upper Y and the latent variable upper F Subscript upper X are jointly uncorrelated is of critical importance in the model. This assumption must be justified on substantive grounds such as the physical properties of the measurement process. If this assumption is violated, the estimators might be severely biased and inconsistent.

You can express the current errors-in-variables model by the LINEQS modeling language as shown in the following statements:

proc calis;
   lineqs
      Y  = alpha * Intercept + beta * Fx + Ey,
      X  =     0 * Intercept +    1 * Fx + Ex;
   mean
      Fx;
run;

In this specification, you need to specify only the equations involved without specifying the assumptions about the correlations among Fx, Ey, and Ex. In the LINEQS modeling language, you should always name latent factors with the 'F' or 'f' prefix (for example, Fx) and error terms with the 'E' or 'e' prefix (for example, Ey and Ex). Given this LINEQS notation, latent factors and error terms, by default, are uncorrelated in the model.

Notice that the intercept in the equation for X in the LINEQS statement is specified as a fixed zero. In addition, the MEAN statement specifies that the mean of latent factor Fx is a free parameter. These specifications override the default of free intercept parameters for observed variables (for X, in this case) and fixed zero means for latent factors (for Fx, in this case)—they are comparable to the conventional parameterization of the errors-in-variables regression models.

Consider an example of an errors-in-variables regression model. Fuller (1987, pp. 18–19) analyzes a data set from Voss (1969) that involves corn yields (Y) and available soil nitrogen (X) for which there is a prior estimate of the measurement error for soil nitrogen Var(upper E Subscript upper X) of 57. The scientific question is: how does nitrogen affect corn yields? The linear prediction of corn yields by nitrogen should be based on a measure of nitrogen that is not contaminated with measurement error. Hence, the errors-in-variables model is applied. upper F Subscript upper X in the model represents the "true" nitrogen measure, X represents the observed measure of nitrogen, which has a true score component upper F Subscript upper X and an error component upper E Subscript upper X. Given that the measurement error for soil nitrogen Var(upper E Subscript upper X) is 57, you can specify the errors-in-variables regression model with the following statements in PROC CALIS:

data corn(type=cov);
   input _type_ $ _name_ $ y x;
   datalines;
cov    y    87.6727    .
cov    x    104.8818   304.8545
mean   .    97.4545    70.6364
n      .    11         11
;
proc calis;
   lineqs
      Y  = alpha * Intercept + beta * Fx + Ey,
      X  =     0 * Intercept +    1 * Fx + Ex;
   variance
      Ex = 57;
   mean
      Fx;
run;

In the LINEQS statement, the intercept in the equation for X is set to a fixed zero. In the VARIANCE statement, the variance of Ex (measurement error for X) is specified as the constant value 57. In the MEAN statement, the means of upper F Subscript upper X is specified as a free parameter. PROC CALIS produces the estimates shown in Figure 4.

Figure 4: Errors-in-Variables Model for Corn Data

Linear Equations
y =   67.5641 (**) Intercept + 0.4232 (**) Fx + 1.0000   Ey
x =   0   Intercept + 1.0000   Fx + 1.0000   Ex

Effects in Linear Equations
Variable Predictor Parameter Estimate Standard
Error
t Value Pr > |t|
y Intercept alpha 67.56409 11.93888 5.6592 <.0001
y Fx beta 0.42316 0.16582 2.5520 0.0107
x Intercept   0      
x Fx   1.00000      

Estimates for Variances of Exogenous Variables
Variable
Type
Variable Parameter Estimate Standard
Error
t Value Pr > |t|
Error Ex   57.00000      
Latent Fx _Add1 247.85450 136.33508 1.8180 0.0691
Error Ey _Add2 43.29105 23.92488 1.8095 0.0704

Mean Parameters
Variable
Type
Variable Parameter Estimate Standard
Error
t Value Pr > |t|
Latent Fx _Parm1 70.63640 5.52136 12.7933 <.0001


In Figure 4, the estimate of beta is 0.4232 with a standard error estimate of 0.1658. The t value is 2.552. It is significant at the 0.05 alpha-level when compared to the critical value of the standard normal variate (that is, the z table). Also shown in Figure 4 are the estimated variances of Fx, Ey, and their estimated standard errors. The names of these parameters have the prefix '_Add'. They are added by PROC CALIS as default parameters. By employing some conventional rules for setting default parameters, PROC CALIS makes your model specification much easier and concise. For example, you do not need to specify each error variance parameter manually if it is not constrained in the model. However, you can specify these parameters explicitly if you desire. Note that in Figure 4, the variance of Ex is shown to be 57 without a standard error estimate because it is a fixed constant in the model. Finally, the last table shows that the intercept estimate of Fx is 70.636, which is the same as the sample mean of X, as expected.

What if you did not model the measurement error in the predictor X? That is, what is the estimate of beta if you use ordinary regression of Y on X, as described by the equation in the section Simple Linear Regression? You can specify such a linear regression model easily by the LINEQS modeling language. Here, you specify this linear regression model as a special case of the errors-in-variables model. That is, you constrain the variance of measurement error upper E Subscript x to 0 in the preceding LINEQS model specification to form the linear regression model, as shown in the following statements:

proc calis;
   lineqs
      Y  = alpha * Intercept + beta * Fx + Ey,
      X  =     0 * Intercept +    1 * Fx + Ex;
   variance
      Ex = 0;
   mean
      Fx;
run;

Fixing the variance of Ex to zero forces the equality of X and upper F Subscript upper X in the measurement model so that this "new" errors-in-variables model is in fact an ordinary regression model. PROC CALIS produces the estimation results in Figure 5.

Figure 5: Ordinary Regression Model for Corn Data: Zero Measurement Error in X

Linear Equations
y =   73.1528 (**) Intercept + 0.3440 (**) Fx + 1.0000   Ey
x =   0   Intercept + 1.0000   Fx + 1.0000   Ex

Effects in Linear Equations
Variable Predictor Parameter Estimate Standard
Error
t Value Pr > |t|
y Intercept alpha 73.15283 9.46542 7.7284 <.0001
y Fx beta 0.34404 0.13009 2.6447 0.0082
x Intercept   0      
x Fx   1.00000      

Estimates for Variances of Exogenous Variables
Variable
Type
Variable Parameter Estimate Standard
Error
t Value Pr > |t|
Error Ex   0      
Latent Fx _Add1 304.85450 136.33508 2.2361 0.0253
Error Ey _Add2 51.58928 23.07143 2.2361 0.0253

Mean Parameters
Variable
Type
Variable Parameter Estimate Standard
Error
t Value Pr > |t|
Latent Fx _Parm1 70.63640 5.52136 12.7933 <.0001


The estimate of beta is now 0.3440, which is an underestimate of the effect of nitrogen on corn yields given the presence of nonzero measurement error in X, where the estimate of beta is 0.4232.

Last updated: December 09, 2022