Partial Least Squares (PLS) code and basics

There are various ways to implement PLS, including the NIPALS, SIMPLS and the bi-diagonalizaton method of Rolf Manne. Here I provide a code based Wold’s 2001 paper:  Chemometr. Intell. Lab. 58(2001)109-130. 

Input

X: data matrix of size n x p

Y: response variable of size n x 1

A: the number of PLS components to extract, which is usually optimized by cross validation.

Output:

B:  a p-dimensional regression vector, where p equals the number of columns in X. If you want to add an intercept in your model, just add an additional column of ones to X.

T: PLS component or score matrix of size n x A. Can be thought of dimension-reduced representation of X. Similar to principal components in PCA but obtained in a different way.

Wstar: [Wstar1, Wstar2,…,WstarA], weight matrix to calculate T from original input X. Mathematically, T=XWstar.

W: [W1, W2,…,WA], weight matrix to calculate T from the residual-X at each iteration. Note that W is different from Wstar in addition to W1=Wstar1.

P: Loading matrix. X=TP’+E

R2X: a A-dimensional vector,  records the explained variance of X by each PLS component

R2Y: a A-dimensional vector,  records the explained variance of Y by each PLS component

Code: copy the whole below and save as a function.

function [B,Wstar,T,P,Q,W,R2X,R2Y]=pls_basic(X,Y,A)

%+++ The NIPALS algorithm for both PLS-1 (a single y) and PLS-2 (multiple Y)

%+++ The model is assumed to be: Y=XB+E,where E is random errors.

%+++ X: n x p matrix

%+++ Y: n x m matrix

%+++ A: number of latent variables

%+++ Code: Hongdong Li, lhdcsu@gmail.com, Feb, 2014

%+++ reference: Wold, S., M. Sj?str?m, and L. Eriksson, 2001. PLS-regression: a basic tool of chemometrics,

%               Chemometr. Intell. Lab. 58(2001)109-130.

varX=sum(sum(X.^2));

varY=sum(sum(Y.^2));

for i=1:A

    error=1;

    u=Y(:,1);

    niter=0;

    while (error>1e-8 && niter<1000)  % for convergence test

        w=X’*u/(u’*u);

        w=w/norm(w);

        t=X*w;

        q=Y’*t/(t’*t);  % regress Y against t;

        u1=Y*q/(q’*q);

        error=norm(u1-u)/norm(u);

        u=u1;

        niter=niter+1;

    end

    p=X’*t/(t’*t);

    X=X-t*p’;

    Y=Y-t*q’;

    

    %+++ store

    W(:,i)=w;

    T(:,i)=t;

    P(:,i)=p;

    Q(:,i)=q;

    

end

%+++ calculate explained variance

R2X=diag(T’*T*P’*P)/varX;

R2Y=diag(T’*T*Q’*Q)/varY;

Wstar=W*(P’*W)^(-1);

B=Wstar*Q’;

Q=Q’;

%+++

 

6 thoughts on “Partial Least Squares (PLS) code and basics

  1. EC

    Hi – since I recently upgraded to R 3.2.2 I can no longer use your CARSPLS package (as it was built prior to R 3.0.0) . Will you be updating this (great!) package in the near future so it is usable again?
    Thanks.

    Reply
  2. Dixon

    HI, I tried to install the package give in the above link to R in windors. but I couldn’t do that . here is the error I got
    Error in read.dcf(file.path(pkgname, “DESCRIPTION”), c(“Package”, “Type”)) :
    cannot open the connection
    In addition: Warning messages:
    1: In unzip(zipname, exdir = dest) : error 1 in extracting from zip file
    2: In read.dcf(file.path(pkgname, “DESCRIPTION”), c(“Package”, “Type”)) :
    cannot open compressed file ‘carspls_1.0.001.tgz/DESCRIPTION’, probable reason ‘No such file or directory’

    Can you please guide me where can I find the correct package ?

    Reply
    1. L Post author

      Hi Dixon, please use the libPLS in MATLAB. The R package still needs development, and I haven’t maintain it for a long time.

      Reply

Leave a Reply to Xu Cancel reply

Your email address will not be published. Required fields are marked *