|
There are many advantages of using spectroscopy as a detection technique for quality control of complex samples. It is fast, requires little or no sample preparation for most types of samples, and can be implemented at or near the source of the samples. However, many times, quantitative methods are employed to simply gauge the suitability of the material being measured. In a significant number of cases, the only result that is desired is to know whether the sample falls within a defined range of allowed variability to determine if the material is of the desired quality. It is not always necessary to measure the quantities of the constituents in the sample to meet this goal.
Multivariate quantitative models such as Principal Component Regression (PCR) and Partial Least Squares (PLS) generally require a large number of training samples to build accurate calibrations. In turn, this requires a lot of initial work collecting all the samples and measuring the concentrations of the constituents by the primary method before the data can even be used for model building. The accuracy of the calibration is limited by the accuracy of the primary method used to get the concentration values. If the primary method is not very good, the multivariate model will not be very good either. If merely knowing that a sample is of a given quality is required and the quantity of the constituents is not needed, using a quantitative model adds a lot of extra work to simply determine if the sample is the same as the training set data.
In addition, the quantities of the constituents are usually not the whole story when measuring product quality. Sometimes samples can be contaminated with other compounds and impurities. Generally, quantitative models will always predict reasonable values for the calibrated constituents, provided the spectra of the unknown samples are fairly similar to the training set. But the reported concentrations alone will not indicate if the samples are contaminated.
In some cases, the constituent information is simply not available for the samples to be calibrated. There may not be a primary calibration method available for the constituent(s) of interest, or the samples may be simply too complex. Another very likely possibility is that it is possible collect all the primary constituent information, but the work involved to actually do it would be prohibitively expensive. However, the spectrum of a sample is unique to the composition of its constituents. Samples of the same or similar composition quality should have spectra that are very similar as well. Theoretically, it should be possible to tell the difference between a "good" sample and a "bad" one by only comparing their spectra.
Unfortunately, the tolerances required for determining the differences between spectra in quality control applications cannot usually be met by simple methods such as visual inspection or Spectral Subtraction. In addition to requiring user interaction (and they are therefore subjective methods inappropriate for quality control), they cannot be easily used by anyone other than a trained spectroscopist. What is needed instead is an unambiguous mathematical method for spectral matching.
What has been described here is the basis of discriminant analysis. The primary purpose is to classify samples into well defined groups or categories based on a training set of similar samples without prior or with limited knowledge of the composition of the group samples. The ultimate aim of discriminant analysis is to unambiguously determine the identity or quality of an unknown sample. A good discriminant algorithm is one that can "learn" what the spectrum of a sample looks like by "training" it with spectra of the same material. For this reason, discriminant analysis is sometimes called pattern recognition.
There are two basic applications for spectroscopic discriminant analysis: sample purity/quality and sample identification/screening. In the capacity of sample quality checking, discriminant analysis methods can replace many quantitative methods currently used. In effect, the algorithm gives an indication of whether the spectrum of the "unknown" sample matches the spectra from samples taken previously that were known to be of "good" quality. Some algorithms can even give statistical measurements of the quality of the match.
 |
| Quality control/assurance application of spectroscopic discriminant analysis. The spectrum of the sample is compared against the model to determine if it matches the training data for the model. If the training set was constructed from spectra of samples that were of known quality, the model can accurately predict if the sample is of the same quality by matching the spectrum and giving a "yes" or "no" answer. |
When discriminant analysis is used in a product identification or product screening mode, the spectrum of the "unknown" is compared against multiple models. The algorithm will give an indication of the likelihood of the spectrum matching a model and the product can then be identified as a particular material. This mode of discriminant analysis is sometimes used for grading materials as well. For this application, each model is built from a set of samples that represent a particular grade/purity/quality of the material. When the unknown spectrum is predicted against the models, the material is classified as the closest match (or no match at all).
The analyst can control how the discrimination is calculated. Any samples in the training set become representative of the allowed form of the spectrum. For example, discriminant analysis could be also used to classify samples into chemical classes by making training sets of spectra of different compounds that share similar functional groups. As long as enough samples are used to represent the range of variability found in those types of compounds, "unknowns" could be chemically classified by comparing them to all the training sets and looking for a match.
 |
| Sample identification/screening application of spectroscopic discriminant analysis. The spectrum of the sample is compared to multiple models of different materials or different levels of quality of the same material. The models can predict the likelihood that the sample matches the training spectra they were constructed from, again giving a "yes" or "no" answer. |
There are a vast number of useful analyses that can be solved by discriminant analysis. The main advantage these methods have is that they are generally easier to apply to spectroscopic analysis than quantitative methods since they do not need any primary calibration data to build the model. They give simple "pass" or "fail" answers as to how well the samples match by comparing them to training sets of the desired quality samples. They learn to recognize the spectra of materials based entirely on the spectral data itself without any other external information other than the analyst’s logical grouping of the spectra into training sets.
Many different methods have been developed for performing discriminant analysis on spectra. One class of algorithm that is already familiar to many spectroscopists is Spectral Library Seasching using Euclidean Distance. In these algorithms, the spectrum of an unknown sample is compared against many different spectra in a library of known compounds. By comparing the responses at all wavelengths in the "unknown" spectrum to the corresponding responses in a series of known (or "library") spectra, a list of the closest matches can be identified by ranking the known spectra by a calculated "Hit Quality Index".
Many commercially available library search programs use these techniques to generate a list of the most likely matches of the unknown sample. However, there are many problems with this technique. First, search techniques simply identify samples as the materials from the closest matching spectrum in the library. If the library does not contain any spectra of the "true" compound, it will just report the best match it found regardless of whether it is really even the same class of material.
In addition, the spectral library search technique is only sensitive to the general spectral shapes and patterns, and not to very subtle variations within the sample. If the variations in between the spectra of a "good" sample and "bad" sample cannot be easily seen by visual inspection, chances are a spectral library search will not be able to do it either. Typically Spectral Library Search algorithms cannot be trained to recognize a range of variability in the data since the spectrum of the unknown is only compared to a single representative spectrum for each different class of material.
Another problem is that the spectra must have flat baselines in order for these methods to work properly. As seen earlier in the discussions of preprocessing methods for quantitative spectroscopy, there are methods for accomplishing this with little or no user interaction. However, these methods are very sensitive to baseline instabilities, and the correction applied must be very good to have any degree of success with these methods.
Finally, the "Hit Quality Index" does not provide any absolute measure of the probability that the sample actually is the same as the library sample. The arbitrary scale of the Hit Quality values (0 to 1) does not give a very good statistical measure of the similarity of the spectra. In short, using only a single training spectrum to represent all possible samples in the future does not give the analyst any statistical assurance that the spectra are truly the same or different. It only provides a relative measure for all the library samples. For anyone who has tried simple library search techniques for spectrally similar samples, this result is all too obvious.
There have been many other methods put forth in the literature including K-nearest neighbor, Cluster analysis, PCA Factorial Discriminant Analysis, SIMCA, BEAST, and others (see Algorithm References).
Many of these methods use Principal Componant Analysis as a spectral data compression technique. PCA decomposes a set of spectra into their most common variations (factors) and produces a small set of well defined numbers (scores) for each sample that represent the amount of each variation present in the spectrum. Similar to using the way PCR and PLS quantitation methods use these scores for creating a calibration equation, they can also be used for discrimination since they provide an accurate description of the entire set of training spectra. However, many of the methods listed above only utilize the first few significant factors for discrimination. In many cases, only the first two factors are used. Thus, only a limited portion of the total spectral information available for a class of material is actually used; the rest is simply discarded. |