Join our mailing list for getting updates.
NEW: deliver your comments/questions to my Blog.
Updates:
This library provides a whole set of easy-to-use functions for building partial least squares (PLS) regression (PLSR) and discriminant analysis (PLS-DA) models as well as predictive performance evaluation. Towards building a reliable model, we also implemented a number of commonly used outlier detection and variable selection methods that can be used to "clean" your data by removing potential outliers and using only a sub-set of selected variables.
The algorithms in the current version cover:
Types | Methods | Abbreviations | Codes | Notes |
---|---|---|---|---|
Data pretreat | mean-centering | pretreat.m | ||
autoscaling | pretreat.m | |||
Orthogonal Projection to Latent Structures | OPLS | opls.m | ||
Orthogonal Signal Correctionof Tom Fearn | OSC | oscfearn.m | ||
Orthogonal Signal Correction of Swante Wold | OSC | oscwold.m | ||
Sample partition | Kennard-Stone algorithm | KS | ks.m | |
Model building | Partial Least Squares | PLS | pls.m | |
Linear Discriminant Analysis | LDA | ldapinv.m | ||
Partial Least Squares-Linear Discriminant Analysis | PLS-DA | plslda.m | ||
Elastic Component Regression | ECR | ecr.m | ||
Model assessment | leave-one-out cross validation | LOOCV | plscv.m, plsldacv.m | |
K-fold cross validation | K-fold CV | plscv.m, plsldacv.m, ecrcv.m | ||
double cross validation | DCV | plsdcv.m, plsldadcv.m | ||
Monte Carlo cross validation | MCCV | plsmccv.m, plsldamccv.m | ||
Using an independent test set | ||||
Outlier detection | The Monte Carlo method | mcs.m | ||
Variable selection | Variable Importance in Projection | VIP | inside pls.m or plslda.m | |
Target Projection | TP | inside pls.m or plslda.m | ||
Uninformative Variable Elimination | UVE | mcuvepls.m, mcuveplslda.m | ||
Competitive Adaptive Reweighted Sampling | CARS | carspls.m, carsplalda.m | ||
Random Frog | randomfrog_pls.m, randomfrog_plslda.m | |||
interval Random Frog | iRF | irf.m | ||
Subwindow Permutation Analysis | SPA | spa.m | ||
Moving Window Partial Least Squares | MWPLS | mwpls.m | ||
the Phase Diagram algorithm | PHADIA | phadia.m | ||
Iteratively Retain Informative Variables | IRIV | iriv.m | ||
Variable Complementary Network | VCN | vcn.m |
To build a credible model for a given chemical or biological or clinical data, it may be helpful to first get somewhat better insight into the data itself before modeling and then to present the statistically stable results derived from a large number of sub-models established only on one dataset with the aid of Monte Carlo Sampling (MCS). We proposed a new concept Model Population Analysis (MPA), which is a general framework for designing new data analysis methods by statistically analyzing user-interested outputs (regression coefficients, prediction errors etc) of a number of sub-models generated by introducing data variation in samples or variables or both. New methods are expected to be developed by making full use of the interesting parameter in a novel manner. As described in the left figure, the output of a population of sub-models can be put into four spaces: sample space, variable space, parameter space and model space, which could serve as a guide for algorithm development.
The concept of MPA was originally proposed in J. Chemometr., 24 (2009) 418, and systmatically elucidated and reviewed in TrAC 38 (2012)154-162.
A series of MPA-based methods are available in the libPLS package, which include:
A systematic introduction of the MPA idea can be found in our presentation [PDF] .
1. Wold, S., M. Sjöström, and L. Eriksson, 2001. PLS-regression: a basic tool of chemometrics. Chemometr. Intell. Lab. 58 (2001)109-130. PDF
2. Kennard, R.W. and L.A. Stone, 1969. Computer aided design of experiments. Technometrics 11 (1969)137-148. PDF
3. Shao, J., 1993. Linear Model Selection by Cross-Validation. J Am. Stat. Assoc. 88 (1993)486-494. PDF
4. Xu, Q.-S. and Y.-Z. Liang, 2001. Monte Carlo cross validation. Chemometr. Intell. Lab. 56 (2001)1-11. PDF
5. Filzmoser, P., B. Liebmann, and K. Varmuza, 2009. Repeated double cross validation. J Chemometr 23 (2009)160-171. PDF
6. Cao, D.S., Y.Z. Liang, Q.S. Xu, H.D. Li, and X. Chen, A New Strategy of Outlier Detection for QSAR/QSPR. J Comput Chem 31 592-602.PDF
7. Centner, V., D.-L. Massart, O.E. de Noord, S. de Jong, B.M. Vandeginste, and C. Sterna, 1996. Elimination of Uninformative Variables for Multivariate Calibration. Anal. Chem. 68 (1996)3851-3858. PDF
8. Cai, W., Y. Li, and X. Shao, 2008. A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra. Chemometr. Intell. Lab. 90 (2008)188-194. PDF
9. Rajalahti, T., R. Arneberg, A.C. Kroksveen, M. Berle, K.-M. Myhr, and O.M. Kvalheim, 2009. Discriminating Variable Test and Selectivity Ratio Plot: Quantitative Tools for Interpretation and Variable (Biomarker) Selection in Complex Spectral or Chromatographic Profiles. Anal. Chem. 81 (2009)2581-2590. PDF
10. Li, H.-D., Y.-Z. Liang, Q.-S. Xu, and D.-S. Cao, 2009. Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. Anal. Chim. Acta 648 (2009)77-84. PDF
11. Li, H.-D., Q.-S. Xu, and Y.-Z. Liang, 2012. Random Frog: an efficient reversible jump Markov Chain Monte Carlo-like approach for gene selection and disease classification. Anal Chim Acta 740 (2012)20-26. PDF
12. Jiang, J.-H., R.J. Berry, H.W. Siesler, and Y. Ozaki, 2002. Wavelength Interval Selection in Multicomponent Spectral Analysis by Moving Window Partial Least-Squares Regression with Applications to Mid-Infrared and Near-Infrared Spectroscopic Data. Anal. Chem. 74 (2002)3555-3565. PDF
13. Li, H.-D., Y.-Z. Liang, and Q.-S. Xu, 2010. Uncover the path from PCR to PLS via elastic component regression. Chemometr. Intell. Lab. 104 (2010)341-346. PDF
14. Li, H.-D., Y.-Z. Liang, Q.-S. Xu, and D.-S. Cao, 2009. Model population analysis for variable selection. J. Chemometr. 24 (2009)418-423. PDF
15. Li, H.-D., Y.-Z. Liang, Q.-S. Xu, and D.-S. Cao, 2012. Model population analysis and its applications in chemical and biological modeling. TrAC 38 (2012)154-162. PDF
16. Li H-D, Liang Y-Z, Xu Q-S et al. (2011) Recipe for Uncovering Predictive Genes using Support Vector Machines based on Model Population Analysis. IEEE/ACM T Comput Bi 8: 1633-1641.PDF
17. YH Yun, HD Li et al, An efficient method of wavelength interval selection based on random frog for multivariate spectral calibration, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 111, 2013,31-36. PDF
18. YH Yun, WT Wang et al, A strategy that iteratively retains informative variables for selecting optimal variable subset in multivariate calibration, Analytica chimica acta 807, 2014, 36-43. PDF
19. HD Li, QS Xu, YZ Liang, A phase diagram for gene selection and disease classification, bioRxivdoi: 10.1101/002360. PDF
20. HD Li, QS Xu, W Zhang, YZ Liang, (2012) Variable Complementary Network: a novel approach for identifying biomarkers and their mutual associations. Metabolomics 8, 1218-1226 PDF
if you use this library, please cite it as: Li H.-D., Xu Q.-S., Liang Y.-Z., libPLS: an integrated library for partial least squares regression and discriminant analysis. Chemom. Intell. Lab. Syst, 2018, 176,34-43
Please drop me a line at lhdcsu@gmail.com, if any questions.