libPLS: an Integrated Library for Partial Least Squares Regression and Discriminant Analysis

Featured in Model Population Aanlysis (MPA) approaches

1. Overview

This library provides a whole set of easy-to-use functions for building partial least squares (PLS) regression (PLSR) and discriminant analysis (PLS-DA) models as well as predictive performance evaluation. Towards building a reliable model, we also implemented a number of commonly used outlier detection and variable selection methods that can be used to "clean" your data by removing potential outliers and using only a sub-set of selected variables.

The algorithms in the current version cover:

Data pretreat:
           Centering, autoscaling
           direct OSC
           OSC, work of Tom Fearn
           OSC, work of Swante Wold et al
Data partition:
           Kennard-Stone algorithm. (ks.m)
Model building:
           Paritial Least Squares (the NIPALS algorithm for PLS-1 and PLS-2). (pls.m or plslda.m)
           Linear Discriminant Analysis. (ldapinv.m)
Model assessment:
           leave-one-out cross validation(LOOCV)
           K-fold cross validation
           double cross validation (DCV)
           Monte Carlo cross validation (MCCV)
           repeated double cross validation (RDCV)
           Using an independent test set
Outlier detection:
           The Monte Carlo method. (mcs.m)
Variable selection:
           Variable importance in projection(VIP). (inside pls.m or plslda.m)
           Target Projection (TP). (inside pls.m or plslda.m)
           Uninformative Variable Elimination (UVE, also MC-UVE). (mcuvepls.m or mcuveplslda.m)
           Competitive Adaptive Reweighted Sampling (CARS-PLS, CARS-PLSDA). (carpls.m or carsplslda.m)
           Random Frog (coupled with PLS or PLS-DA). (randomfrog_pls.m or randomfrog_plslda.m)
           interval Random Frog (coupled with PLS). (irf.m)
           Subwindow Permutation Analysis (coupled with PLS-DA). (spa.m)
           Moving Window Partial Least Squares(MWPLS). (mwpls.m)
           the Phase Diagram algorithm (PHADIA,coupled with PLS-DA). (phadia.m)
           Iteratively Retain Informative Variables (IRIV, coupled with PLS). (iriv.m)
           Variable Complementary Network (VCN, coupled with PLS-DA) firstly introduced complementary information between variables. (vcn.m)

How to cite? if you use this library, please cite it as: Li H.-D., Xu Q.-S., Liang Y.-Z. (2014) libPLS: An Integrated Library for Partial Least Squares Regression and Discriminant Analysis. PeerJ PrePrints 2:e190v1, source codes available at www.libpls.net.

2. Model Population Analysis (MPA)


To build a credible model for a given chemical or biological or clinical data, it may be helpful to first get somewhat better insight into the data itself before modeling and then to present the statistically stable results derived from a large number of sub-models established only on one dataset with the aid of Monte Carlo Sampling (MCS). We proposed a new concept Model Population Analysis (MPA), which is a general framework for designing new data analysis methods by statistically analyzing user-interested outputs (regression coefficients, prediction errors etc) of a number of sub-models generated by introducing data variation in sample- or vairable-direction or both. New methods are expected to be developed by making full use of the interesting parameter in a novel manner. As described in the left figure, the output of a population of sub-models can be put into four spaces: sample space, variable space, parameter space and model space, which could serve as a guide for algorithm development.

The concept of MPA was originally proposed in J. Chemometr., 24 (2009) 418, and systmatically elucidated and reviewed in TrAC 38 (2012)154-162.

3. References

1. Wold, S., M. Sjöström, and L. Eriksson, 2001. PLS-regression: a basic tool of chemometrics. Chemometr. Intell. Lab. 58 (2001)109-130. PDF
2. Kennard, R.W. and L.A. Stone, 1969. Computer aided design of experiments. Technometrics 11 (1969)137-148. PDF
3. Shao, J., 1993. Linear Model Selection by Cross-Validation. J Am. Stat. Assoc. 88 (1993)486-494. PDF
4. Xu, Q.-S. and Y.-Z. Liang, 2001. Monte Carlo cross validation. Chemometr. Intell. Lab. 56 (2001)1-11. PDF
5. Filzmoser, P., B. Liebmann, and K. Varmuza, 2009. Repeated double cross validation. J Chemometr 23 (2009)160-171. PDF
6. Cao, D.S., Y.Z. Liang, Q.S. Xu, H.D. Li, and X. Chen, A New Strategy of Outlier Detection for QSAR/QSPR. J Comput Chem 31 592-602.PDF
7. Centner, V., D.-L. Massart, O.E. de Noord, S. de Jong, B.M. Vandeginste, and C. Sterna, 1996. Elimination of Uninformative Variables for Multivariate Calibration. Anal. Chem. 68 (1996)3851-3858. PDF
8. Cai, W., Y. Li, and X. Shao, 2008. A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra. Chemometr. Intell. Lab. 90 (2008)188-194. PDF
9. Rajalahti, T., R. Arneberg, A.C. Kroksveen, M. Berle, K.-M. Myhr, and O.M. Kvalheim, 2009. Discriminating Variable Test and Selectivity Ratio Plot: Quantitative Tools for Interpretation and Variable (Biomarker) Selection in Complex Spectral or Chromatographic Profiles. Anal. Chem. 81 (2009)2581-2590. PDF
10. Li, H.-D., Y.-Z. Liang, Q.-S. Xu, and D.-S. Cao, 2009. Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. Anal. Chim. Acta 648 (2009)77-84. PDF
11. Li, H.-D., Q.-S. Xu, and Y.-Z. Liang, 2012. Random Frog: an efficient reversible jump Markov Chain Monte Carlo-like approach for gene selection and disease classification. Anal Chim Acta 740 (2012)20-26. PDF
12. Jiang, J.-H., R.J. Berry, H.W. Siesler, and Y. Ozaki, 2002. Wavelength Interval Selection in Multicomponent Spectral Analysis by Moving Window Partial Least-Squares Regression with Applications to Mid-Infrared and Near-Infrared Spectroscopic Data. Anal. Chem. 74 (2002)3555-3565. PDF
13. Li, H.-D., Y.-Z. Liang, and Q.-S. Xu, 2010. Uncover the path from PCR to PLS via elastic component regression. Chemometr. Intell. Lab. 104 (2010)341-346. PDF
14. Li, H.-D., Y.-Z. Liang, Q.-S. Xu, and D.-S. Cao, 2009. Model population analysis for variable selection. J. Chemometr. 24 (2009)418-423. PDF
15. Li, H.-D., Y.-Z. Liang, Q.-S. Xu, and D.-S. Cao, 2012. Model population analysis and its applications in chemical and biological modeling. TrAC 38 (2012)154-162. PDF
16. Li H-D, Liang Y-Z, Xu Q-S et al. (2011) Recipe for Uncovering Predictive Genes using Support Vector Machines based on Model Population Analysis. IEEE/ACM T Comput Bi 8: 1633-1641.PDF
17. YH Yun, HD Li et al, An efficient method of wavelength interval selection based on random frog for multivariate spectral calibration, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 111, 2013,31-36. PDF
18. YH Yun, WT Wang et al, A strategy that iteratively retains informative variables for selecting optimal variable subset in multivariate calibration, Analytica chimica acta 807, 2014, 36-43. PDF
19. HD Li, QS Xu, YZ Liang, A phase diagram for gene selection and disease classification, bioRxivdoi: 10.1101/002360. PDF
20. HD Li, QS Xu, W Zhang, YZ Liang, (2012) Variable Complementary Network: a novel approach for identifying biomarkers and their mutual associations. Metabolomics 8, 1218-1226 PDF

4. Contact

Author: Hongdong Li(lhdcsu@gmail.com), Advisor: Yizeng Liang (yizeng_liang@263.net), College Of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China. If any comments or questions, please let us know.

5. History

updated May093102,Aug083102, Jan024102

Back to top