The PLS Procedure

Regression Methods

All of the predictive methods implemented in PROC PLS work essentially by finding linear combinations of the predictors (factors) to use to predict the responses linearly. The methods differ only in how the factors are derived, as explained in the following sections.

Partial Least Squares

Partial least squares (PLS) works by extracting one factor at a time. Let bold upper X equals bold upper X 0 be the centered and scaled matrix of predictors and let bold upper Y equals bold upper Y 0 be the centered and scaled matrix of response values. The PLS method starts with a linear combination bold t equals bold upper X 0 bold w of the predictors, where bold t is called a score vector and bold w is its associated weight vector. The PLS method predicts both bold upper X 0 and bold upper Y 0 by regression on bold t:

StartLayout 1st Row 1st Column ModifyingAbove bold upper X With caret Subscript 0 2nd Column equals 3rd Column bold t bold p Superscript prime Baseline comma 4th Column where 5th Column bold p prime 6th Column equals 7th Column left-parenthesis bold t prime bold t right-parenthesis Superscript negative 1 Baseline bold t prime bold upper X 0 2nd Row 1st Column ModifyingAbove bold upper Y With caret Subscript 0 2nd Column equals 3rd Column bold t bold c Superscript prime Baseline comma 4th Column where 5th Column bold c prime 6th Column equals 7th Column left-parenthesis bold t prime bold t right-parenthesis Superscript negative 1 Baseline bold t prime bold upper Y 0 EndLayout

The vectors bold p and bold c are called the X- and Y-loadings, respectively.

The specific linear combination bold t equals bold upper X 0 bold w is the one that has maximum covariance bold t prime bold u with some response linear combination bold u equals bold upper Y 0 bold q. Another characterization is that the X- and Y-weights bold w and bold q are proportional to the first left and right singular vectors of the covariance matrix bold upper X prime 0 bold upper Y 0 or, equivalently, the first eigenvectors of bold upper X prime 0 bold upper Y 0 Superscript Baseline bold upper Y prime 0 bold upper X 0 Superscript and bold upper Y prime 0 bold upper X 0 Superscript Baseline bold upper X prime 0 bold upper Y 0 Superscript, respectively.

This accounts for how the first PLS factor is extracted. The second factor is extracted in the same way by replacing bold upper X 0 and bold upper Y 0 with the X- and Y-residuals from the first factor:

StartLayout 1st Row 1st Column bold upper X 1 2nd Column equals 3rd Column bold upper X 0 minus ModifyingAbove bold upper X With caret Subscript 0 2nd Row 1st Column bold upper Y 1 2nd Column equals 3rd Column bold upper Y 0 minus ModifyingAbove bold upper Y With caret Subscript 0 EndLayout

These residuals are also called the deflated bold upper X and bold upper Y blocks. The process of extracting a score vector and deflating the data matrices is repeated for as many extracted factors as are wanted.

SIMPLS

Note that each extracted PLS factor is defined in terms of different X-variables bold upper X Subscript i. This leads to difficulties in comparing different scores, weights, and so forth. The SIMPLS method of De Jong (1993) overcomes these difficulties by computing each score bold t Subscript i Baseline equals bold upper X bold r Subscript i in terms of the original (centered and scaled) predictors bold upper X. The SIMPLS X-weight vectors r Subscript i are similar to the eigenvectors of bold upper S bold upper S Superscript prime Baseline equals bold upper X prime bold upper Y bold upper Y prime bold upper X, but they satisfy a different orthogonality condition. The bold r 1 vector is just the first eigenvector bold e 1 (so that the first SIMPLS score is the same as the first PLS score), but whereas the second eigenvector maximizes

bold e prime 1 bold upper S bold upper S prime bold e 2 subject to bold e prime 1 bold e 2 equals 0

the second SIMPLS weight bold r 2 maximizes

bold r prime 1 upper S upper S prime bold r 2 subject to bold r prime 1 bold upper X prime bold upper X bold r 2 equals bold t prime 1 bold t 2 equals 0

The SIMPLS scores are identical to the PLS scores for one response but slightly different for more than one response; see De Jong (1993) for details. The X- and Y-loadings are defined as in PLS, but since the scores are all defined in terms of bold upper X, it is easy to compute the overall model coefficients bold upper B:

StartLayout 1st Row 1st Column ModifyingAbove bold upper Y With caret 2nd Column equals 3rd Column sigma-summation Underscript i Endscripts bold t Subscript i Baseline bold c prime Subscript i 2nd Row 1st Column Blank 2nd Column equals 3rd Column sigma-summation Underscript i Endscripts bold upper X bold r Subscript i Baseline bold c prime Subscript i 3rd Row 1st Column Blank 2nd Column equals 3rd Column bold upper X bold upper B comma where bold upper B equals bold upper R bold upper C prime EndLayout

Principal Components Regression

Like the SIMPLS method, principal components regression (PCR) defines all the scores in terms of the original (centered and scaled) predictors bold upper X. However, unlike both the PLS and SIMPLS methods, the PCR method chooses the X-weights/X-scores without regard to the response data. The X-scores are chosen to explain as much variation in bold upper X as possible; equivalently, the X-weights for the PCR method are the eigenvectors of the predictor covariance matrix bold upper X prime bold upper X. Again, the X- and Y-loadings are defined as in PLS; but, as in SIMPLS, it is easy to compute overall model coefficients for the original (centered and scaled) responses bold upper Y in terms of the original predictors bold upper X.

Reduced Rank Regression

As discussed in the preceding sections, partial least squares depends on selecting factors bold t equals bold upper X bold w of the predictors and bold u equals bold upper Y bold q of the responses that have maximum covariance, whereas principal components regression effectively ignores bold u and selects bold t to have maximum variance, subject to orthogonality constraints. In contrast, reduced rank regression selects bold u to account for as much variation in the predicted responses as possible, effectively ignoring the predictors for the purposes of factor extraction. In reduced rank regression, the Y-weights bold q Subscript i are the eigenvectors of the covariance matrix ModifyingAbove bold upper Y With caret prime Subscript normal upper L normal upper S Baseline ModifyingAbove bold upper Y With caret Subscript normal upper L normal upper S of the responses predicted by ordinary least squares regression; the X-scores are the projections of the Y-scores bold upper Y bold q Subscript i onto the X space.

Relationships between Methods

(View the complete code for this example.)

When you develop a predictive model, it is important to consider not only the explanatory power of the model for current responses, but also how well sampled the predictive functions are, since this affects how well the model can extrapolate to future observations. All of the techniques implemented in the PLS procedure work by extracting successive factors, or linear combinations of the predictors, that optimally address one or both of these two goals—explaining response variation and explaining predictor variation. In particular, principal components regression selects factors that explain as much predictor variation as possible, reduced rank regression selects factors that explain as much response variation as possible, and partial least squares balances the two objectives, seeking for factors that explain both response and predictor variation.

To see the relationships between these methods, consider how each one extracts a single factor from the following artificial data set consisting of two predictors and one response:

data data;
   input x1 x2 y;
   datalines;
    3.37651  2.30716        0.75615
    0.74193 -0.88845        1.15285
    4.18747  2.17373        1.42392
    0.96097  0.57301        0.27433
   -1.11161 -0.75225       -0.25410
   -1.38029 -1.31343       -0.04728
    1.28153 -0.13751        1.00341
   -1.39242 -2.03615        0.45518
    0.63741  0.06183        0.40699
   -2.52533 -1.23726       -0.91080
    2.44277  3.61077       -0.82590
;
proc pls data=data nfac=1 method=rrr;
   model y = x1 x2;
run;
proc pls data=data nfac=1 method=pcr;
   model y = x1 x2;
run;
proc pls data=data nfac=1 method=pls;
   model y = x1 x2;
run;

The amount of model and response variation explained by the first factor for each method is shown in Figure 10 through Figure 12.

Figure 10: Variation Explained by First Reduced Rank Regression Factor

The PLS Procedure

Percent Variation Accounted for by Reduced
Rank Regression Factors
Number of
Extracted
Factors
Model Effects Dependent Variables
Current Total Current Total
1 15.0661 15.0661 100.0000 100.0000


Figure 11: Variation Explained by First Principal Components Regression Factor

The PLS Procedure

Percent Variation Accounted for by Principal
Components
Number of
Extracted
Factors
Model Effects Dependent Variables
Current Total Current Total
1 92.9996 92.9996 9.3787 9.3787


Figure 12: Variation Explained by First Partial Least Squares Regression Factor

The PLS Procedure

Percent Variation Accounted for by Partial
Least Squares Factors
Number of
Extracted
Factors
Model Effects Dependent Variables
Current Total Current Total
1 88.5357 88.5357 26.5304 26.5304


Notice that, while the first reduced rank regression factor explains all of the response variation, it accounts for only about 15% of the predictor variation. In contrast, the first principal components regression factor accounts for most of the predictor variation (93%) but only 9% of the response variation. The first partial least squares factor accounts for only slightly less predictor variation than principal components but about three times as much response variation.

Figure 13 illustrates how partial least squares balances the goals of explaining response and predictor variation in this case.

Figure 13: Depiction of First Factors for Three Different Regression Methods

Depiction of First Factors for Three Different Regression Methods


The ellipse shows the general shape of the 11 observations in the predictor space, with the contours of increasing y overlaid. Also shown are the directions of the first factor for each of the three methods. Notice that, while the predictors vary most in the x1 = x2 direction, the response changes most in the orthogonal x1 = -x2 direction. This explains why the first principal component accounts for little variation in the response and why the first reduced rank regression factor accounts for little variation in the predictors. The direction of the first partial least squares factor represents a compromise between the other two directions.

Last updated: December 09, 2022