Solved – Why is reparameterization trick necessary for variational autoencoders

autoencodersneural networks

I know it is said that we do the reparameterization trick so we can do back-propagation and back-propagation cant be applied on random sampling!
However, I don't precisely understand the last part. Why cant we do that?
We have a mu and a std and our z is directly sampled from them. What is the naive method that we don't do ?

I mean what were we supposed to do that didn't work and instead made us do reparameterization?

we had to sample from the distribution using mean and std, any way, and we are doing it now, whats changed?

I dont get why z in is considered a random node previously, and not now!?
I'm trying to see how the first way is different than what we are doing in the reparameterization and I cant seem to find anything!

Best Answer

After $\epsilon$ is sampled, it is completely known; we can treat it the same way as any other data (image, text, feature vector) that's input to a neural network. Just like your input data, $\epsilon$ is known and won't change after you sample it.

This means that the expression $ z = \mu + \sigma \odot \epsilon $ has no random components after sampling: you know $\mu,\sigma$ because you obtained them from the encoder, and you know $\epsilon$ because you've sampled it. As a result of sampling, $\epsilon$ is known and fixed at a particular value. This means that you can backprop $\mu + \sigma \odot \epsilon$ with respect to $\mu, \sigma$ because all of its elements are known and fixed.

By contrast, the expression $z \sim \mathcal{N}(\mu,\sigma^2)$ is not deterministic in $\mu, \sigma$, so you can't write a backprop expression with respect to $\mu, \sigma$ for it. Even though $\mu, \sigma$ are fixed, you can obtain any real number as an output.

Best Answer

Related Solutions

Solved – Why do we need the temperature in Gumbel-Softmax trick

Solved – Why is random sampling a non-differentiable operation

Undifferentiable expectations

Related Question