The HPPLS Procedure
The PROC HPPLS statement invokes the HPPLS procedure. Table 1 summarizes the options available in the PROC HPPLS statement.
Table 1: PROC HPPLS Statement Options
| Option |
Description |
| Basic Options |
|
DATA= |
Specifies the input data set |
|
NAMELEN= |
Limits the length of effect names |
| Model Fitting Options |
|
CVTEST |
Requests that van der Voet’s (1994) randomization-based model comparison test be performed |
|
METHOD= |
Specifies the general factor extraction method to be used |
|
NFAC= |
Specifies the number of factors to extract |
|
NOCENTER |
Suppresses centering of the responses and predictors before fitting |
|
NOCVSTDIZE |
Suppresses re-centering and rescaling of the responses and predictors when cross-validating |
|
NOSCALE |
Suppresses scaling of the responses and predictors before fitting |
| Output Options |
|
CENSCALE |
Displays the centering and scaling information |
|
DETAILS |
Displays the details of the fitted model |
| NOCLPRINT |
Limits or suppresses the display of class levels |
|
NOPRINT |
Suppresses ODS output |
|
VARSS |
Displays the amount of variation accounted for in each response and predictor |
The following list provides details about these options.
-
CENSCALE
lists the centering and scaling information for each response and predictor.
-
CVTEST <(cvtest-options)>
-
requests that van der Voet’s (1994) randomization-based model comparison test be performed to test models that have different numbers of extracted factors against the model that minimizes the predicted residual sum of squares. For more information, see the section Test Set Validation. You can also specify the following cvtest-options in parentheses after the CVTEST option:
-
PVAL=n
specifies the cutoff probability for declaring an insignificant difference. By default, PVAL=0.10.
-
STAT=PRESS | T2
specifies the test statistic for the model comparison. You can specify either T2, for Hotelling’s
statistic, or PRESS, for the predicted residual sum of squares. By default, STAT=T2.
-
NSAMP=n
specifies the number of randomizations to perform. By default, NSAMP=1000.
-
SEED=n
-
specifies the seed value for the random number stream. If you do not specify a seed, or if you specify a value less than or equal to 0, the seed is generated from reading the time of day from the computer’s clock.
Analyses that use the same (nonzero) seed are not completely reproducible if they are executed on a different number of threads because the random number streams in separate threads are independent. You can control the number of threads on which the HPPLS procedure executes by using SAS system options or by using the PERFORMANCE statement in the HPPLS procedure.
-
DATA=SAS-data-set
names the input SAS data set to be used by PROC HPPLS. The default is the most recently created data set.
-
DETAILS
lists the details of the fitted model for each successive factor. The listed details are different for different extraction methods. For more information, see the section Displayed Output.
-
METHOD=PLS<(PLS-options)> | SIMPLS | PCR | RRR
-
specifies the general factor extraction method to be used. You can specify the following values:
-
PCR
requests principal components regression.
-
PLS<(PLS-options)>
-
requests partial least squares. You can also specify the following optional PLS-options in parentheses after METHOD=PLS:
-
ALGORITHM=NIPALS | SVD | EIG
names the specific algorithm used to compute extracted PLS factors. NIPALS requests the usual iterative NIPALS algorithm, SVD bases the extraction on the singular value decomposition of
, and EIG bases the extraction on the eigenvalue decomposition of
. ALGORITHM=SVD is the most accurate but least efficient approach. By default, ALGORITHM=NIPALS.
-
EPSILON=n
specifies the convergence criterion for the NIPALS algorithm. By default, EPSILON=
.
-
MAXITER=n
specifies the maximum number of iterations for the NIPALS algorithm. By default, MAXITER=200.
-
RRR
requests reduced rank regression.
-
SIMPLS
requests the straightforward implementation of a statistically inspired modification of the partial least squares (SIMPLS) method of De Jong (1993).
By default, METHOD=PLS.
-
NAMELEN=number
specifies the length to which long effect names are shortened. By default, NAMELEN=20. If you specify a value less than 20 for number, the default is used.
-
NFAC=n
specifies the number of factors to extract. The default is
, where p is the number of predictors (or the number of dependent variables when METHOD=RRR) and N is the number of runs (observations). You probably do not need to extract this many factors for most applications. Extracting too many factors can lead to an overfit model (one that matches the training data too well), sacrificing predictive ability. Thus, if you use the default, you should also either specify the PARTITION statement to select the appropriate number of factors for the final model or consider the analysis to be preliminary and examine the results to determine the appropriate number of factors for a subsequent analysis.
-
NOCENTER
suppresses centering of the responses and predictors before fitting. This option is useful if the analysis variables are already centered and scaled. For more information, see the section Centering and Scaling.
-
NOCLPRINT<=number>
suppresses the display of the "Class Level Information" table if you do not specify number. If you specify number, the values of the classification variables are displayed only for variables whose number of levels is less than number. Specifying a number helps to reduce the size of the "Class Level Information" table if some classification variables have a large number of levels.
-
NOCVSTDIZE
suppresses re-centering and rescaling of the responses and predictors before each model is fit in the cross validation. For more information, see the section Centering and Scaling.
-
NOPRINT
suppresses the normal display of results. This option is useful when you want only the output statistics saved in a data set. This option temporarily disables the Output Delivery System (ODS). For more information, see Chapter 23, Using the Output Delivery System.
-
NOSCALE
suppresses scaling of the responses and predictors before fitting. This option is useful if the analysis variables are already centered and scaled. For more information, see the section Centering and Scaling.
lists, in addition to the average response and predictor sum of squares accounted for by each successive factor, the amount of variation accounted for in each response and predictor.
Last updated: December 09, 2022