This library provides a whole set of easy-to-use functions for building partial least squares (PLS) regression (PLSR) and discriminant analysis (PLS-DA) models as well as predictive performance evaluation. Towards building a reliable model, we also implemented a number of commonly used *outlier detection* and *variable selection* methods that can be used to *"clean"* your data by removing potential outliers and using only a sub-set of selected variables.

The algorithms in the current version cover:

**Data pretreat**:

Centering, autoscaling

O-PLS

direct OSC

OSC, work of Tom Fearn

OSC, work of Swante Wold et al

**Data partition**:

Kennard-Stone algorithm. (ks.m)

**Model building**:

Paritial Least Squares (the NIPALS algorithm for PLS-1 and PLS-2). (pls.m or plslda.m)

Linear Discriminant Analysis. (ldapinv.m)

**Model assessment**:

leave-one-out cross validation(LOOCV)

K-fold cross validation

double cross validation (DCV)

Monte Carlo cross validation (MCCV)

repeated double cross validation (RDCV)

Using an independent test set

**Outlier detection**:

The Monte Carlo method. (mcs.m)

**Variable selection**:

Variable importance in projection(VIP). (inside pls.m or plslda.m)

Target Projection (TP). (inside pls.m or plslda.m)

Uninformative Variable Elimination (UVE, also MC-UVE). (mcuvepls.m or mcuveplslda.m)

Competitive Adaptive Reweighted Sampling (CARS-PLS, CARS-PLSDA). (carpls.m or carsplslda.m)

Random Frog (coupled with PLS or PLS-DA). (randomfrog_pls.m or randomfrog_plslda.m)

interval Random Frog (coupled with PLS). (irf.m)

Subwindow Permutation Analysis (coupled with PLS-DA). (spa.m)

Moving Window Partial Least Squares(MWPLS). (mwpls.m)

the Phase Diagram algorithm (PHADIA,coupled with PLS-DA). (phadia.m)

Iteratively Retain Informative Variables (IRIV, coupled with PLS). (iriv.m)

Variable Complementary Network (VCN, coupled with PLS-DA) firstly introduced complementary information between variables. (vcn.m)

**How to cite?** if you use this library, please cite it as: *Li H.-D., Xu Q.-S., Liang Y.-Z. (2014) libPLS: An Integrated Library for Partial Least Squares Regression and Discriminant Analysis. PeerJ PrePrints 2:e190v1*, source codes available at www.libpls.net.

To build a credible model for a given chemical or biological or clinical data, it may be helpful to first get somewhat better insight into the data itself before modeling and then to present the statistically stable results derived from a large number of sub-models established only on one dataset with the aid of Monte Carlo Sampling (MCS). We proposed a new concept Model Population Analysis (MPA), which is a general framework for designing new data analysis methods by statistically analyzing user-interested outputs (regression coefficients, prediction errors etc) of a number of sub-models generated by introducing data variation in sample- or vairable-direction or both. New methods are expected to be developed by making full use of the interesting parameter in a novel manner. As described in the left figure, the output of a population of sub-models can be put into four spaces: sample space, variable space, parameter space and model space, which could serve as a guide for algorithm development.

The concept of MPA was originally proposed in J. Chemometr., 24 (2009) 418, and systmatically elucidated and reviewed in TrAC 38 (2012)154-162.

1. Wold, S., M. Sjöström, and L. Eriksson, 2001. PLS-regression: a basic tool of chemometrics. Chemometr. Intell. Lab. 58 (2001)109-130. PDF

2. Kennard, R.W. and L.A. Stone, 1969. Computer aided design of experiments. Technometrics 11 (1969)137-148. PDF

3. Shao, J., 1993. Linear Model Selection by Cross-Validation. J Am. Stat. Assoc. 88 (1993)486-494. PDF

4. Xu, Q.-S. and Y.-Z. Liang, 2001. Monte Carlo cross validation. Chemometr. Intell. Lab. 56 (2001)1-11. PDF

5. Filzmoser, P., B. Liebmann, and K. Varmuza, 2009. Repeated double cross validation. J Chemometr 23 (2009)160-171. PDF

6. Cao, D.S., Y.Z. Liang, Q.S. Xu, H.D. Li, and X. Chen, A New Strategy of Outlier Detection for QSAR/QSPR. J Comput Chem 31 592-602.PDF

7. Centner, V., D.-L. Massart, O.E. de Noord, S. de Jong, B.M. Vandeginste, and C. Sterna, 1996. Elimination of Uninformative Variables for Multivariate Calibration. Anal. Chem. 68 (1996)3851-3858. PDF

8. Cai, W., Y. Li, and X. Shao, 2008. A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra. Chemometr. Intell. Lab. 90 (2008)188-194. PDF

9. Rajalahti, T., R. Arneberg, A.C. Kroksveen, M. Berle, K.-M. Myhr, and O.M. Kvalheim, 2009. Discriminating Variable Test and Selectivity Ratio Plot: Quantitative Tools for Interpretation and Variable (Biomarker) Selection in Complex Spectral or Chromatographic Profiles. Anal. Chem. 81 (2009)2581-2590. PDF

10. Li, H.-D., Y.-Z. Liang, Q.-S. Xu, and D.-S. Cao, 2009. Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. Anal. Chim. Acta 648 (2009)77-84. PDF

11. Li, H.-D., Q.-S. Xu, and Y.-Z. Liang, 2012. Random Frog: an efficient reversible jump Markov Chain Monte Carlo-like approach for gene selection and disease classification. Anal Chim Acta 740 (2012)20-26. PDF

12. Jiang, J.-H., R.J. Berry, H.W. Siesler, and Y. Ozaki, 2002. Wavelength Interval Selection in Multicomponent Spectral Analysis by Moving Window Partial Least-Squares Regression with Applications to Mid-Infrared and Near-Infrared Spectroscopic Data. Anal. Chem. 74 (2002)3555-3565. PDF

13. Li, H.-D., Y.-Z. Liang, and Q.-S. Xu, 2010. Uncover the path from PCR to PLS via elastic component regression. Chemometr. Intell. Lab. 104 (2010)341-346. PDF

14. Li, H.-D., Y.-Z. Liang, Q.-S. Xu, and D.-S. Cao, 2009. Model population analysis for variable selection. J. Chemometr. 24 (2009)418-423. PDF

15. Li, H.-D., Y.-Z. Liang, Q.-S. Xu, and D.-S. Cao, 2012. Model population analysis and its applications in chemical and biological modeling. TrAC 38 (2012)154-162. PDF

16. Li H-D, Liang Y-Z, Xu Q-S et al. (2011) Recipe for Uncovering Predictive Genes using Support Vector Machines based on Model Population Analysis. IEEE/ACM T Comput Bi 8: 1633-1641.PDF

17. YH Yun, HD Li et al, An efficient method of wavelength interval selection based on random frog for multivariate spectral calibration, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 111, 2013,31-36. PDF

18. YH Yun, WT Wang et al, A strategy that iteratively retains informative variables for selecting optimal variable subset in multivariate calibration, Analytica chimica acta 807, 2014, 36-43. PDF

19. HD Li, QS Xu, YZ Liang, A phase diagram for gene selection and disease classification, bioRxivdoi: 10.1101/002360. PDF

20. HD Li, QS Xu, W Zhang, YZ Liang, (2012) Variable Complementary Network: a novel approach for identifying biomarkers and their mutual associations. Metabolomics 8, 1218-1226 PDF

Author: Hongdong Li(lhdcsu@gmail.com), Advisor: Yizeng Liang (yizeng_liang@263.net), College Of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China. If any comments or questions, please let us know.

updated May093102,Aug083102, Jan024102

Back to top