I am computing a standard linear regression subject to a positivity constraint using non-negative least squares (lsqnonneg
in Matlab, actually). Is it possible to compute formal errors from a non-negative least squares problem, and if so how to do it? If no analytical form of the errors exists, I could just bootstrap, I suppose.
For a regular problem I would use something like:
mcov = sigma^2*inv(G'*G);
where sigma is the data variance and G is the system matrix. But, this is very close to singular for my problem, and thus mcov blows up.
Best Answer
If you would be OK using R I think you could also use
bbmle
'smle2
function to optimize the least squares likelihood function and calculate 95% confidence intervals on the nonnegative nnls coefficients. Furthermore, you can take into account that your coefficients cannot go negative by optimizing the log of your coefficients, so that on a backtransformed scale they could never become negative.Here is a numerical example illustrating this approach, here in the context of deconvoluting a superposition of gaussian-shaped chromatographic peaks with Gaussian noise on them : (any comments welcome)
First let's simulate some data :
Let's now deconvolute the measured noisy signal
y
with a banded matrix containing shifted copied of the known gaussian shaped blur kernelbM
(this is our covariate/design matrix).First, let's deconvolute the signal with nonnegative least squares :
Now let's optimize the negative log-likelihood of our gaussian loss objective, and optimize the log of your coefficients so that on a backtransformed scale they can never be negative :
I haven't tried to compare the performance of this method relative to either nonparametric or parametric bootstrapping, but it's surely much faster.
I was also inclined to think that I should be able to calculate Wald confidence intervals for the nonnegative nnls coefficients based on the information matrix, calculated at a log transformed scale to enforce the nonnegativity constraints and evaluated at the nnls estimates. I think this goes like this :
The results of these calculations and the ones returned by
mle2
are nearly identical (but much faster), so I think this is right, and would correspond that what we were implicitly doing withmle2
...Just refitting the covariates with positive coefficients in an
nnls
fit using a regular linear model fit btw does not work, since such a linear model fit would not take into account the nonnegativity constraints and so would result in nonsensical confidence intervals that could go negative. This paper "Exact post model selection inference for marginal screening" by Jason Lee & Jonathan Taylor also presents a method to do post-model selection inference on nonnegative nnls (or LASSO) coefficients and uses truncated Gaussian distributions for that. I haven't seen any openly available implementation of this method for nnls fits though - for LASSO fits there is the selectiveInference package that does something like that. If anyone would happen to have an implementation, please let me know!