The HPPLS Procedure

Partial Least Squares

Partial least squares (PLS) works by extracting one factor at a time. Let bold upper X equals bold upper X 0 be the centered and scaled matrix of predictors, and let bold upper Y equals bold upper Y 0 be the centered and scaled matrix of response values. The PLS method starts with a linear combination bold t equals bold upper X 0 bold w of the predictors, where bold t is called a score vector and bold w is its associated weight vector. The PLS method predicts both bold upper X 0 and bold upper Y 0 by regression on bold t:

StartLayout 1st Row 1st Column ModifyingAbove bold upper X With caret Subscript 0 2nd Column equals 3rd Column bold t bold p Superscript prime Baseline comma 4th Column where 5th Column bold p prime 6th Column equals 7th Column left-parenthesis bold t prime bold t right-parenthesis Superscript negative 1 Baseline bold t prime bold upper X 0 2nd Row 1st Column ModifyingAbove bold upper Y With caret Subscript 0 2nd Column equals 3rd Column bold t bold c Superscript prime Baseline comma 4th Column where 5th Column bold c prime 6th Column equals 7th Column left-parenthesis bold t prime bold t right-parenthesis Superscript negative 1 Baseline bold t prime bold upper Y 0 EndLayout

The vectors bold p and bold c are called the X- and Y-loadings, respectively.

The specific linear combination bold t equals bold upper X 0 bold w is the one that has maximum covariance bold t prime bold u with some response linear combination bold u equals bold upper Y 0 bold q. Another characterization is that the X-weight, bold w, and the Y-weight, bold q, are proportional to the first left and right singular vectors, respectively, of the covariance matrix bold upper X prime 0 bold upper Y 0 or, equivalently, the first eigenvectors of bold upper X prime 0 bold upper Y 0 Superscript Baseline bold upper Y prime 0 bold upper X 0 Superscript and bold upper Y prime 0 bold upper X 0 Superscript Baseline bold upper X prime 0 bold upper Y 0 Superscript, respectively.

This accounts for how the first PLS factor is extracted. The second factor is extracted in the same way by replacing bold upper X 0 and bold upper Y 0 with the X- and Y-residuals from the first factor:

StartLayout 1st Row 1st Column bold upper X 1 2nd Column equals 3rd Column bold upper X 0 minus ModifyingAbove bold upper X With caret Subscript 0 2nd Row 1st Column bold upper Y 1 2nd Column equals 3rd Column bold upper Y 0 minus ModifyingAbove bold upper Y With caret Subscript 0 EndLayout

These residuals are also called the deflated bold upper X and bold upper Y blocks. The process of extracting a score vector and deflating the data matrices is repeated for as many extracted factors as are wanted.

Last updated: December 09, 2022