The LOGISTIC Procedure

CLASS Statement

  • CLASS variable <(options)> …<variable <(options)>> </ global-options>;

The CLASS statement names the classification variables to be used as explanatory variables in the analysis. Response variables do not need to be specified in the CLASS statement.

The CLASS statement must precede the MODEL statement. Most options can be specified either as individual variable options or as global-options. You can specify options for each variable by enclosing the options in parentheses after the variable name. You can also specify global-options for the CLASS statement by placing them after a slash (/). Global-options are applied to all the variables that are specified in the CLASS statement. If you specify more than one CLASS statement, the global-options that are specified in any one CLASS statement apply to all CLASS statements. However, individual CLASS variable options override the global-options. You can specify the following values for either an option or a global-option:

CPREFIX=n

specifies that, at most, the first n characters of a CLASS variable name be used in creating names for the corresponding design variables. The default is 32 minus min left-parenthesis 32 comma max left-parenthesis 2 comma f right-parenthesis right-parenthesis, where f is the formatted length of the CLASS variable.

DESCENDING
DESC

reverses the sort order of the classification variable. If you specify both the DESCENDING and ORDER= options, PROC LOGISTIC orders the categories according to the ORDER= option and then reverses that order.

LPREFIX=n

specifies that, at most, the first n characters of a CLASS variable label be used in creating labels for the corresponding design variables. The default is 256 minus min left-parenthesis 256 comma max left-parenthesis 2 comma f right-parenthesis right-parenthesis, where f is the formatted length of the CLASS variable.

MISSING

treats missing values (., ._, .A, …, .Z for numeric variables and blanks for character variables) as valid values of the CLASS variable.

ORDER=DATA | FORMATTED | FREQ | INTERNAL

specifies the sort order for the levels of classification variables. This ordering determines which parameters in the model correspond to each level in the data, so this option can be useful when you use the CONTRAST statement. By default, ORDER=FORMATTED. For ORDER=FORMATTED and ORDER=INTERNAL, the sort order is machine-dependent. When ORDER=FORMATTED is in effect for numeric variables for which you have supplied no explicit format, the levels are ordered by their internal values.

The following table shows how PROC LOGISTIC interprets values of the ORDER= option:

Value of ORDER= Levels Sorted By
DATA Order of appearance in the input data set
FORMATTED External formatted values, except for numeric variables with no explicit format, which are sorted by their unformatted (internal) values
FREQ Descending frequency count; levels with more observations come earlier in the order
INTERNAL Unformatted value

For more information about sort order, see the chapter on the SORT procedure in the Base SAS Procedures Guide and the discussion of BY-group processing in the "Grouping Data" section of SAS Programmers Guide: Essentials.

PARAM=keyword

specifies the parameterization method for the classification variable or variables. You can specify any of the keywords shown in the following table. The default is PARAM=EFFECT. Design matrix columns are created from CLASS variables according to the corresponding coding schemes.

Value of PARAM= Coding
EFFECT Effect coding
GLM Less-than-full-rank reference cell coding (this keyword can be used only in a global option)
ORDINAL
THERMOMETER
Cumulative parameterization for an ordinal CLASS variable
POLYNOMIAL
POLY
Polynomial coding
REFERENCE
REF
Reference cell coding
ORTHEFFECT Orthogonalizes PARAM=EFFECT coding
ORTHORDINAL
ORTHOTHERM
Orthogonalizes PARAM=ORDINAL coding
ORTHPOLY Orthogonalizes PARAM=POLYNOMIAL coding
ORTHREF Orthogonalizes PARAM=REFERENCE coding

All parameterizations are full rank, except for the GLM parameterization. The REF= option in the CLASS statement determines the reference level for EFFECT and REFERENCE coding and for their orthogonal parameterizations. It also indirectly determines the reference level for a singular GLM parameterization through the order of levels.

If a PARAM= option is specified as a variable option for some variables, then any variables for which PARAM= is not specified use either the EFFECT parameterization if the global PARAM= option is not specified, or the full-rank parameterization indicated in the global PARAM= option if specified. If the global PARAM=GLM option is specified and PARAM= is also specified for some variables, GLM parameterization is used for all variables.

If PARAM=ORTHPOLY or PARAM=POLY and the classification variable is numeric, then the ORDER= option in the CLASS statement is ignored, and the internal unformatted values are used. For more information, see the section Other Parameterizations in Chapter 20, Shared Concepts and Topics.

REF=’level’ | keyword

specifies the reference level for PARAM=EFFECT, PARAM=REFERENCE, and their orthogonalizations. For PARAM=GLM, the REF= option specifies a level of the classification variable to be put at the end of the list of levels. This level thus corresponds to the reference level in the usual interpretation of the linear estimates with a singular parameterization.

For an individual variable REF= option (but not for a global REF= option), you can specify the level of the variable to use as the reference level. Specify the formatted value of the variable if a format is assigned. For a global or individual variable REF= option, you can use one of the following keywords:

FIRST

designates the first ordered level as reference.

LAST

designates the last ordered level as reference.

By default, REF=LAST.

TRUNCATE<=n>

specifies the length n of CLASS variable values to use in determining CLASS variable levels. The default is to use the full formatted length of the CLASS variable. If you specify TRUNCATE without the length n, the first 16 characters of the formatted values are used. The TRUNCATE option is available only as a global option.

Class Variable Naming Convention

Parameter names for a CLASS predictor variable are constructed by concatenating the CLASS variable name with the CLASS levels. However, for the POLYNOMIAL and orthogonal parameterizations, parameter names are formed by concatenating the CLASS variable name and keywords that reflect the parameterization. For examples and more information, see the section Other Parameterizations in Chapter 20, Shared Concepts and Topics.

Class Variable Parameterization with Unbalanced Designs

PROC LOGISTIC initially parameterizes the CLASS variables by looking at the levels of the variables across the complete data set. If you have an unbalanced replication of levels across variables or BY groups, then the design matrix and the parameter interpretation might be different from what you expect. For example, suppose you have a model that has one CLASS variable A with three levels (1, 2, and 3) and another CLASS variable B with two levels (1 and 2). If the third level of A occurs only with the first level of B, if you use the EFFECT parameterization, and if your model contains the effect A(B) and an intercept, then the design for A within the second level of B is not a differential effect. In particular, the design looks like the following:

Design Matrix
A(B=1) A(B=2)
B A A1 A2 A1 A2
1 1 1 0 0 0
1 2 0 1 0 0
1 3 –1 –1 0 0
2 1 0 0 1 0
2 2 0 0 0 1

PROC LOGISTIC detects linear dependency among the last two design variables and sets the parameter for A2(B=2) to zero, resulting in an interpretation of these parameters as if they were reference- or dummy-coded. The REFERENCE or GLM parameterization might be more appropriate for such problems.

Last updated: December 09, 2022