Solved – How to propagate uncertainty into the prediction of a neural network

error-propagationmachine learningneural networkspredictionpredictive-models

I have inputs $x_1\ldots x_n$ that have known $1\sigma$ uncertainties $\epsilon_1 \ldots \epsilon_n$. I am using them to predict outputs $y_1 \ldots y_m$ on a trained neural network. How can I obtain 1$\sigma$ uncertainties on my predictions?

My idea is to randomly perturb each input $x_i$ with normal noise having mean 0 and standard deviation $\epsilon_i$ a large number of times (say, 10000), and then take the median and standard deviation of each prediction $y_i$. Does this work?

I fear that this only takes into account the "random" error (from the measurements) and not the "systematic" error (from the network), i.e., each prediction inherently has some error to it that is not being considered in this approach. How can I properly obtain $1\sigma$ error bars on my predictions?

Best Answer

$\newcommand{\bx}{\mathbf{x}}$ $\newcommand{\by}{\mathbf{y}}$

I personally prefer the Monte Carlo approach because of its ease. There are alternatives (e.g. the unscented transform), but these are certainly biased.

Let me formalise your problem a bit. You are using a neural network to implement a conditional probability distribution over the outputs $\by$ given the inputs $\bx$, where the weights are collected in $\theta$:

$$ p_\theta(\by~\mid~\bx). $$

Let us not care about how you obtained the weights $\theta$–probably some kind of backprop–and just treat that as a black box that has been handed to us.

As an additional property of your problem, you assume that your only have access to some "noisy version" $\tilde \bx$ of the actual input $\bx$, where $$\tilde \bx = \bx + \epsilon$$ with $\epsilon$ following some distribution, e.g. Gaussian. Note that you then can write $$ p(\tilde \bx\mid\bx) = \mathcal{N}(\tilde \bx| \bx, \sigma^2_\epsilon) $$ where $\epsilon \sim \mathcal{N}(0, \sigma^2_\epsilon).$ Then what you want is the distribution $$ p(\by\mid\tilde \bx) = \int p(\by\mid\bx) p(\bx\mid\tilde \bx) d\bx, $$ i.e. the distribution over outputs given the noisy input and a model of clean inputs to outputs.

Now, if you can invert $p(\tilde \bx\mid\bx)$ to obtain $p(\bx\mid\tilde \bx)$ (which you can in the case of a Gaussian random variable and others), you can approximate the above with plain Monte Carlo integration through sampling:

$$ p(\by\mid\tilde \bx) \approx \sum_i p(\by\mid\bx_i), \quad \bx_i \sim p(\bx\mid\tilde \bx). $$

Note that this can also be used to calculate all other kinds of expectations of functions $f$ of $\by$:

$$ f(\tilde \bx) \approx \sum_i f(\by_i), \quad \bx_i \sim p(\bx\mid\tilde \bx), \by_i \sim p(\by\mid\bx_i). $$

Without further assumptions, there are only biased approximations.

Related Question