The HPREG Procedure
The OUTPUT statement creates a data set that contains observationwise statistics, which are computed after fitting the model. The variables in the input data set are not included in the output data set to avoid data duplication for large data sets; however, variables specified in the ID statement or COPYVARS= option are included.
The output statistics are computed based on the parameter estimates for the selected model.
You can specify the following syntax elements in the OUTPUT statement:
-
OUT=SAS-data-set
DATA=SAS-data-set
specifies the name of the output data set. If the OUT= (or DATA=) option is omitted, the procedure uses the DATAn convention to name the output data set.
-
COPYVAR=variable
COPYVARS=(variables)
transfers one or more variables from the input data set to the output data set. Variables named in an ID statement are also copied from the input data set to the output data set.
-
keyword <=name>
-
specifies the statistics to include in the output data set and optionally names the new variables that contain the statistics. Specify a keyword for each desired statistic (see the following list of keywords), followed optionally by an equal sign and a variable to contain the statistic.
If you specify keyword=name, the new variable that contains the requested statistic has the specified name. If you omit the optional =name after a keyword, then a default name is used.
The following are valid values for keyword to request statistics that are available with all selection methods:
-
PREDICTED
PRED
P
requests predicted values for the response variable. The default name is Pred.
-
RESIDUAL
RESID
R
requests the residual, calculated as ACTUAL–PREDICTED. The default name is Residual.
-
ROLE
-
requests a numeric variable that indicates the role played by each observation in fitting the model. The default name is _ROLE_. For each observation the interpretation of this variable is shown in Table 3:
Table 3: Role Interpretation
| Value |
Observation Role |
| 0 |
Not used |
| 1 |
Training |
| 2 |
Validation |
| 3 |
Testing |
If you do not partition the input data by using a PARTITION statement, then the role variable value is 1 for observations used in fitting the model, and 0 for observations that have at least one missing or invalid value for the response, regressors, frequency or weight variables.
In addition to the preceding statistics, you can also use the keywords listed in Table 4 in the OUTPUT statement to obtain additional statistics. These statistics are not available if you use METHOD=LAR or METHOD=LASSO in the SELECTION statement, unless you also specify the LSCOEFFS option. See the section Diagnostic Statistics for computational formulas. All the statistics available in the OUTPUT statement are conditional on the selected model and do not take into account the variability introduced by doing model selection.
Last updated: December 09, 2022