The QLIM Procedure

Heteroscedasticity and Box-Cox Transformation

Heteroscedasticity

If the variance of regression disturbance, (epsilon Subscript i), is heteroscedastic, the variance can be specified as a function of variables

upper E left-parenthesis epsilon Subscript i Superscript 2 Baseline right-parenthesis equals sigma Subscript i Superscript 2 Baseline equals f left-parenthesis bold z prime Subscript i Baseline bold-italic gamma right-parenthesis

The following table shows various functional forms of heteroscedasticity and the corresponding options to request each model:

No. Model Options
1 f left-parenthesis bold z prime Subscript i Baseline bold-italic gamma right-parenthesis equals sigma squared left-parenthesis 1 plus exp left-parenthesis bold z prime Subscript i Baseline gamma right-parenthesis right-parenthesis LINK=EXP (default)
2 f left-parenthesis bold z prime Subscript i Baseline bold-italic gamma right-parenthesis equals sigma squared exp left-parenthesis bold z prime Subscript i Baseline gamma right-parenthesis LINK=EXP NOCONST
3 f left-parenthesis bold z prime Subscript i Baseline bold-italic gamma right-parenthesis equals sigma squared left-parenthesis 1 plus sigma-summation Underscript l equals 1 Overscript upper L Endscripts gamma Subscript l Baseline z Subscript l i Baseline right-parenthesis LINK=LINEAR
4 f left-parenthesis bold z prime Subscript i Baseline bold-italic gamma right-parenthesis equals sigma squared left-parenthesis 1 plus left-parenthesis sigma-summation Underscript l equals 1 Overscript upper L Endscripts gamma Subscript l Baseline z Subscript l i Baseline right-parenthesis squared right-parenthesis LINK=LINEAR SQUARE

For discrete choice models, sigma squared is normalized (sigma squared equals 1) since this parameter is not identified. Note that in models 3 and 5, it may be possible that variances of some observations are negative. Although the QLIM procedure assigns a large penalty to move the optimization away from such region, it is possible that the optimization cannot improve the objective function value and gets locked in the region. Signs of such outcome include extremely small likelihood values or missing standard errors in the estimates. In models 2 and 6, variances are guaranteed to be greater or equal to zero, but it may be possible that variances of some observations are very close to zero. In these scenarios, standard errors may be missing. Models 1 and 4 do not have such problems. Variances in these models are always positive and never close to zero.

The heteroscedastic regression model is estimated using the log-likelihood function

script l equals minus StartFraction upper N Over 2 EndFraction ln left-parenthesis 2 pi right-parenthesis minus sigma-summation Underscript i equals 1 Overscript upper N Endscripts one-half ln left-parenthesis sigma Subscript i Superscript 2 Baseline right-parenthesis minus one-half sigma-summation Underscript i equals 1 Overscript upper N Endscripts left-parenthesis StartFraction e Subscript i Baseline Over sigma Subscript i Baseline EndFraction right-parenthesis squared

where e Subscript i Baseline equals y Subscript i Baseline minus bold x prime Subscript i Baseline bold-italic beta.

Box-Cox Modeling

The Box-Cox transformation on x is defined as

x Superscript left-parenthesis lamda right-parenthesis Baseline equals StartLayout Enlarged left-brace 1st Row 1st Column StartFraction x Superscript lamda Baseline minus 1 Over lamda EndFraction 2nd Column normal i normal f lamda not-equals 0 2nd Row 1st Column ln left-parenthesis x right-parenthesis 2nd Column normal i normal f lamda equals 0 EndLayout

The Box-Cox regression model with heteroscedasticity is written as

StartLayout 1st Row 1st Column y Subscript i Superscript left-parenthesis lamda 0 right-parenthesis 2nd Column equals 3rd Column beta 0 plus sigma-summation Underscript k equals 1 Overscript upper K Endscripts beta Subscript k Baseline x Subscript k i Superscript left-parenthesis lamda Super Subscript k Superscript right-parenthesis plus epsilon Subscript i 2nd Row 1st Column Blank 2nd Column equals 3rd Column mu Subscript i Baseline plus epsilon Subscript i EndLayout

where epsilon Subscript i Baseline tilde upper N left-parenthesis 0 comma sigma Subscript i Superscript 2 Baseline right-parenthesis and transformed variables must be positive. In practice, too many transformation parameters cause numerical problems in model fitting. It is common to have the same Box-Cox transformation performed on all the variables—that is, lamda 0 equals lamda 1 equals midline-horizontal-ellipsis equals lamda Subscript upper K. It is required for the magnitude of transformed variables to be in the tolerable range if the corresponding transformation parameters are StartAbsoluteValue lamda EndAbsoluteValue greater-than 1.

The log-likelihood function of the Box-Cox regression model is written as

script l equals minus StartFraction upper N Over 2 EndFraction ln left-parenthesis 2 pi right-parenthesis minus sigma-summation Underscript i equals 1 Overscript upper N Endscripts ln left-parenthesis sigma Subscript i Baseline right-parenthesis minus StartFraction 1 Over 2 sigma Subscript i Superscript 2 Baseline EndFraction sigma-summation Underscript i equals 1 Overscript upper N Endscripts e Subscript i Superscript 2 Baseline plus left-parenthesis lamda 0 minus 1 right-parenthesis sigma-summation Underscript i equals 1 Overscript upper N Endscripts ln left-parenthesis y Subscript i Baseline right-parenthesis

where e Subscript i Baseline equals y Subscript i Superscript left-parenthesis lamda 0 right-parenthesis Baseline minus mu Subscript i.

When the dependent variable is discrete, censored, or truncated, the Box-Cox transformation can be applied only to explanatory variables.

Last updated: August 08, 2024