|
Partial Least Squares (PLS) is a quantitative spectral decomposition technique that is closely related to Principal Component Regression (PCR). However, in PLS, the decomposition is performed in a slightly different fashion. Instead of first decomposing the spectral matrix into a set of eigenvectors and scores, and regressing them against the concentrations as a separate step, PLS actually uses the concentration information during the decomposition process. This causes spectra containing higher constituent concentrations to be weighted more heavily than those with low concentrations. Thus, the eigenvectors and scores calculated using PLS are quite different from those of PCR. The main idea of PLS is to get as much concentration information as possible into the first few loading vectors.
In actuality, PLS is simply taking advantage of the correlation relationship that already exists between the spectral data and the constituent concentrations. Since the spectral data can be decomposed into its most common variations, so can the concentration data! In effect, this generates two sets of vectors and two sets of corresponding scores; one set for the spectral data, and the other for the constituent concentrations. Presumably, the two sets of scores are related to each other through some type of regression, and a calibration model is constructed.
 |
| PLS is similar to PCA/PCR. However, in PLS the constituent concentration data is included in the decomposition process. In fact, both the spectral and concentration data are decomposed simultaneously, and the scores (S and U) are "exchanged" as each new factor is added to the model. |
Now this is actually a bit of an over-simplification. Unlike PCR, PLS is a one step process. In other words, there is no separate regression step. Instead, PLS performs the decomposition on both the spectral and concentration data simultaneously. As each new factor is calculated for the model, the scores are "swapped" before the contribution of the factor is removed from the raw data. The newly reduced data matrices are then used to calculate the next factor, and the process is repeated until the desired number of factors is calculated. Unfortunately this makes the model equations for PLS significantly more complex than those of PCR. For those who are interested, the algorithms for calculating the PLS model eigenvectors and scores are shown in a later section.
As mentioned previously, one of the main advantages of PLS is that the resulting spectral vectors are directly related to the constituents of interest. This is entirely unlike PCR, where the vectors merely represent the most common spectral variations in the data, completely ignoring their relation to the constituents of interest until the final regression step.
 |
 |
| The vectors generated by PLS (especially PLS-1) are more directly related to the constituents of interest than those from PCA. The left column shows the spectra of the "pure" constituents used to construct the data set in the Top image. The center column shows the first PLS-1 vector for each constituent calculated from the data set, while the right column shows the first two PCA vectors for the same data. |
There are actually two versions of the PLS algorithm; PLS-1 and PLS-2. The differences between these methods are subtle but have very important effects on the results. Like the PCR method, PLS-2 calibrates for all constituents simultaneously. In other words, the results of the spectral decomposition for both of these techniques give one set of scores and one set of eigenvectors for calibration. Therefore, the calculated vectors are not optimized for each individual constituent. This may sacrifice some accuracy in the predictions of the constituent concentrations, especially for complex sample mixtures. In PLS-1, a separate set of scores and loading vectors is calculated for each constituent of interest. In this case, the separate sets of eigenvectors and scores are specifically tuned for each constituent, and therefore, should give more accurate predictions than PCR or PLS-2.
There is, however, a minor disadvantage in using the PLS-1 technique: the speed of calculation. Since a separate set of eigenvectors and scores must be generated for every constituent of interest, the calculations will take more time. For training sets with a large number of samples and constituents, the increased time of calculation can be significant.
PLS-1 may have the largest advantage when analyzing systems that have constituent concentrations that are widely varied. For example, a set of calibration spectra contains one constituent in the concentration range of 50 to 70% and a second constituent in the range of 0.1 to 0.5%. In this case, PLS-1 will almost certainly predict better than the other techniques. If the concentration ranges of the constituents are approximately the same, PLS-1 may have less of an advantage over PLS-2 and will definitely take longer to calculate.
PLS Advantages · Combines the full spectral coverage of CLS with partial composition regression of ILS. · Single step decomposition and regression; eigenvectors are directly related to constituents of interest rather than largest common spectral variations. · Calibrations are generally more robust provided that calibration set accurately reflects range of variability expected in unknown samples. · Can be used for very complex mixtures since only knowledge of constituents of interest is required. · Can sometimes be used to predict samples with constituents (contaminants) not present in the original calibration mixtures. While all of these techniques have been successfully applied for spectral quantitative analysis, the arguments in the literature generally show that PLS has superior predictive ability. In most cases, PLS methods will give better results than PCR, and PLS-1 will be more accurate than PLS-2. However, there are many documented cases in the literature where certain calibrations have performed better by using PCR or PLS-2 instead of PLS-1. Unfortunately, there are no definite rules, and only good research practices can determine the best model for each individual system.
PLS Disadvantages · Calculations are slower that most Classical methods, especially PLS-1. · Models are more abstract, thus more difficult to understand and interpret. · Generally, a large number of samples are required for accurate calibration. · Collecting calibration samples can be difficult; must avoid collinear constituent concentrations.
Calculating PLS Eigenvectors and Scores This section is for those who are interested in knowing the mechanics of the PLS calculation. As mentioned above, there are two variants of this algorithm known as PLS-1 and PLS-2. In fact, PLS-1 is a reduced subset of the full PLS-2. The algorithms have been combined here, with appropriate notes on where they differ. Note that a PLS-2 model of a training set with only one constituent is identical to a PLS-1 model for the same data.
The main difference between PLS and PCR is that the concentration information is included in the calculations during the spectral decomposition. This results in two sets of eigenvectors; a set of spectral "loadings" (Bx) which represent the common variations in the spectral data, and a set of spectral "weights" (W) which represent the changes in the spectra that correspond to the regression constituents. Correspondingly, there are two sets of scores: one for the spectral data (S) and another for the concentration data (U).
The following description assumes that the matrices involved have the following dimensions: A is an n by p matrix of spectral absorbances, C is an n by m matrix of constituent concentrations, S is an f by n matrix of spectral scores, U is an f by n matrix of concentration weighted scores , Bx is an f by p matrix of spectral loading vectors, W is an f by p matrix of spectral weighting vectors, By is an f by m matrix of the constituent loading vectors and V is a 1 byf vector of the PLS model cross products. In this case, n is the number of samples (spectra), p is the number of data points (wavelengths), m is the number of constituents, and f is the number PLS eigenvectors. When used, the subscripts on the matrices indicate a matrix row.
| 1. |
Set the weighting scores to a starting value: Ui = C'1 (For PLS-1 or single constituent models, use the desired constituent vector. For PLS-2, use the first constituent column vector.) |
| 2. |
Calculate the spectral weighting vector: Wi = Ui' A |
| 3. |
Normalize the weighting vector to unit length: Wi = Wi / (Wi Wi') |
| 4. |
Calculate the spectral scores: Si = A Wi (For PLS-1 or single constituent model, set Byi = 1 and skip to step 9.) |
| 5. |
Calculate the concentration loading vector: Byi = Si C |
| 6. |
Normalize the concentration loading vector to unit length: Byi = Byi / (Byi Byi') |
| 7. |
Calculate new weighting scores: Ui = Byi C' |
| 8. |
Check for convergence by comparing new Ui scores to the previous pass for this vector. If this is the first pass for the current vector, or the scores are not the same, go back to step 2. If the scores are effectively the same, continue with step 9. |
| 9. |
Calculate the PLS cross product for this vector: Vi = Si Ui' / (Si Si') |
| 10 |
Calculate the spectral loading vector: Bxi = Si A |
| 11. |
Normalize the spectral loading vector by the spectral scores: Bxi = Bxi / (Si Si') |
| 12 |
Remove contribution of the vector from the spectral data: A = A - Si' Bxi |
| 13 |
Remove contribution of the vector from the concentration data: C = C - (Si' Byi) Vi |
| 14 |
Increase vector counter, i = i + 1 and go back to step 1. Continue until all desired factors are calculated (i = f). |
| 15. |
If performing PLS-1, reset A back to the original training set values and redo all steps using a different constituent in step 1. Note that this generates a completely different set of S, U, W, Bx, By and V matrices for every constituent! |
Predicting Samples with a PLS Model The following are the calculational steps used to predict a spectrum against a PLS model. The variable descriptions are as above, except that Au is the 1 byp vector of the spectral responses of the sample being predicted, and Cu is the 1 by m vector of the predicted constituent concentrations. Initially, Cu is set to zero, and the vector counteri to one.
| 1. |
Calculate the unknown spectral score for a weighting vector: Si = Wi' Au |
| 2. |
Calculate the concentration contribution for the vector: Cu = Cu + (Byi' Si Vi) |
| 3. |
Remove the spectral contribution of the vector: Au = Au - (Si Bxi') |
| 4. |
Increment the vector counter i = i + 1 and go back to step 1. Continue until all desired factors are calculated (i = f). |
| 5. |
If performing PLS1, reset the data in Au back to the original unknown spectrum values and repeat from step 1 with the next set of constituent vectors.
Note that the data remaining in the Au vector after all factors have been removed is the residual spectrum. |
|