The STEPDISC Procedure

PROC STEPDISC Statement

  • PROC STEPDISC <options>;

The PROC STEPDISC statement invokes the STEPDISC procedure. Table 1 summarizes the options available in the PROC STEPDISC statement.

Table 1: STEPDISC Procedure Options

Option Description
Input Data Set
DATA= Specifies input SAS data set
Method Details
MAXMACRO= Specifies maximum macro variable lists
METHOD= Specifies method
SINGULAR= Specifies singularity
Control Stepwise Selection
SLENTRY= Specifies entry significance
SLSTAY= Specifies staying significance
PR2ENTRY= Specifies entry partial R square
PR2STAY= Specifies staying partial R square
INCLUDE= Forces inclusion of variables
MAXSTEP= Specifies maximum number of steps
START= Specifies variables to begin
STOP= Specifies number of variables in final model
Control Displayed Output
ALL Displays all
BCORR Displays between correlations
BCOV Displays between covariances
BSSCP Displays between SSCPs
PCORR Displays pooled correlations
PCOV Displays pooled covariances
PSSCP Displays pooled SSCPs
SHORT Suppresses output
SIMPLE Displays descriptive statistics
STDMEAN Displays standardized class means
TCORR Displays total correlations
TCOV Displays total covariances
TSSCP Displays total SSCPs
WCORR Displays within correlations
WCOV Displays within covariances
WSSCP Displays within SSCPs


ALL

activates all of the display options.

BCORR

displays between-class correlations.

BCOV

displays between-class covariances. The between-class covariance matrix equals the between-class SSCP matrix divided by n left-parenthesis c minus 1 right-parenthesis slash c, where n is the number of observations and c is the number of classes. The between-class covariances should be interpreted in comparison with the total-sample and within-class covariances, not as formal estimates of population parameters.

BSSCP

displays the between-class SSCP matrix.

DATA=SAS-data-set

specifies the data set to be analyzed. The data set can be an ordinary SAS data set or one of several specially structured data sets created by statistical procedures available with SAS/STAT software. These specially structured data sets include TYPE=CORR, COV, CSSCP, and SSCP. If the DATA= option is omitted, the procedure uses the most recently created SAS data set.

INCLUDE=n

includes the first n variables in the VAR statement in every model. By default, INCLUDE=0.

MAXMACRO=n

specifies the maximum number of macro variables with independent variable lists to create. By default, MAXMACRO=100. PROC STEPDISC saves the list of selected variables in a macro variable, &_StdVar. Suppose your input variable list consists of x1-x10; then &_StdVar would be set to x1 x3 x4 x10 if, for example, the first, third, fourth, and tenth variables were selected for the model. This list can be used, for example, in a subsequent procedure’s VAR statement as follows:

var &_stdvar;

With BY processing, one macro variable is created for each BY group, and the macro variables are indexed by the BY-group number. The MAXMACRO= option can be used to either limit or increase the number of these macro variables in processing data sets with many BY groups. The macro variables are created as follows:

With no BY processing, PROC STEPDISC creates the following:
_StdVar selected variables
_StdVar1 selected variables
_StdNumBys number of BY groups (1)
_StdNumMacroBys number of _StdVari macro variables actually made (1)
With BY processing, PROC STEPDISC creates the following:
_StdVar selected variables for BY group 1
_StdVar1 selected variables for BY group 1
_StdVar2 selected variables for BY group 2
.
.
.
_StdVarm selected variables for BY group m, where a number is substituted for m
_StdNumBys n, the number of BY groups
_StdNumMacroBys the number m of _StdVari macro variables actually made. This value might be less than _StdNumbys = n, and it is less than or equal to the MAXMACRO= value.

MAXSTEP=n

specifies the maximum number of steps. By default, MAXSTEP= two times the number of variables in the VAR statement.

METHOD=BACKWARD | BW
METHOD=FORWARD | FW
METHOD=STEPWISE | SW

specifies the method used to select the variables in the model. The BACKWARD method specifies backward elimination, FORWARD specifies forward selection, and STEPWISE specifies stepwise selection. By default, METHOD=STEPWISE.

PCORR

displays pooled within-class correlations (partial correlations based on the pooled within-class covariances).

PCOV

displays pooled within-class covariances.

PR2ENTRY=p
PR2E=p

specifies the partial R square for adding variables in the forward selection mode, where sans-serif-italic p less-than-or-equal-to 1.

PR2STAY=p
PR2S=p

specifies the partial R square for retaining variables in the backward elimination mode, where sans-serif-italic p less-than-or-equal-to 1.

PSSCP

displays the pooled within-class corrected SSCP matrix.

SHORT

suppresses the displayed output from each step.

SIMPLE

displays simple descriptive statistics for the total sample and within each class.

SINGULAR=p

specifies the singularity criterion for entering variables, where 0 < p < 1. PROC STEPDISC precludes the entry of a variable if the squared multiple correlation of the variable with the variables already in the model exceeds 1 – p. With more than one variable already in the model, PROC STEPDISC also excludes a variable if it would cause any of the variables already in the model to have a squared multiple correlation (with the entering variable and the other variables in the model) exceeding 1 – p. By default, SINGULAR= 1E–8.

SLENTRY=p
SLE=p

specifies the significance level for adding variables in the forward selection mode, where 0 less-than-or-equal-to sans-serif-italic p less-than-or-equal-to 1. The default value is 0.15.

SLSTAY=p
SLS=p

specifies the significance level for retaining variables in the backward elimination mode, where 0 less-than-or-equal-to sans-serif-italic p less-than-or-equal-to 1. The default value is 0.15.

START=n

specifies that the first n variables in the VAR statement be used to begin the selection process. When you specify METHOD=FORWARD or METHOD=STEPWISE, the default value is 0; when you specify METHOD=BACKWARD, the default value is the number of variables in the VAR statement.

STDMEAN

displays total-sample and pooled within-class standardized class means.

STOP=n

specifies the number of variables in the final model. The STEPDISC procedure stops the selection process when a model with n variables is found. This option applies only when you specify METHOD=FORWARD or METHOD=BACKWARD. When you specify METHOD=FORWARD, the default value is the number of variables in the VAR statement; when you specify METHOD=BACKWARD, the default value is 0.

TCORR

displays total-sample correlations.

TCOV

displays total-sample covariances.

TSSCP

displays the total-sample corrected SSCP matrix.

WCORR

displays within-class correlations for each class level.

WCOV

displays within-class covariances for each class level.

WSSCP

displays the within-class corrected SSCP matrix for each class level.

Last updated: December 09, 2022