Reconstruct IR Spectra Based on PLS Model

chemometricspythonscikit learn

I am currently using the scikit-learn package in python to setup PLS models (sklearn.cross_decomposition.PLSRegression) to predict the concentration of different substances based on IR spectra. In this regard I would be interested if it also possible to go the other way round. Like to predict the concentration of a certain substance and then print out the corresponding spectrum as how it would look according to the setup PLS model for this concentration? Is there way to this in python?
The goal would be obtain a "clean" spectrum for just the substance even though the original sample might sometimes include impurities.

Best Answer

You can reconstruct the part of a spectrum explained by the PLS model.
That is often something useful to do with spectra used for prediction by the model. In particular, it can be useful to check what part of the spectrum is not explained by the model (out-of-model error/residuals), and whether that is unusually large for some sample you want to predict. Depending on the application/scenario/data it can also be insightful to check whether the reconstructed spectrum has higher intensity than the acutally measured spectrum.

And yes, it can also be useful for model interpretation to have a look how the pure analyte spectrum is thought to look by the model.

The goal would be obtain a "clean" spectrum for just the substance

You won't get a pure component spectrum, though.

Consider a particular analyte signal/band that is overlaid by some strong interferent signal in your application, and that in consequence has low (or even no) correlation with the analyte concentration. This band should stay in the unexplained X variance* - that's the point of PLS regularization.
Reconstructing explained spectra from concentrations should thus also not explain this band (i.e. give just average/center intensity).


*Assuming you use not too many latent variables, but after all PLS is used because it allows to predict with few latent variables.
If you go for the full PLS model, eventually also this band will be modeled.

Related Question