I'm trying to understand how NMF is derived, and I got the basic idea of NMF,
that is, it tries to approximate the original matrix $V$ with $WH$, where $V$ are non-negative, and $W,H$ are constrained to be non-negative.
Question 1
When I approach this problem, it's pretty intuitive for me that, I should try to minimize the reconstruction error,
$$Err(W,H) = \min_{W,H}\|V-WH\|\,,$$
if say $Err(W,H)$ is the objective function, then I have a problem to define the constraints. How to constrain $W,H$ to be non-negative?
Question 2
The paper I'm reading, it doesn't take the reconstruction error as the objective function, instead it takes the perspective that NMF tries to reconstruct the original $V$ from $WH$ adding some Poisson noise (last 2 paragraphs of page 3).
This confuses me badly, why Poisson? What about Gaussian or gamma or beta?
Question 3
No matter what the objective function is, I think I can always solve the problem by gradient descent or newton's method, can I?
And the reason why the multiplicative update method proposed is because it has better speed than gradient descent?
Best Answer
Question 3 :
an important reason is that you want non-negative entires in your matrix. If performing classical gradient descent on your objective function, you face two problems :
Also, the update rules (as mentionned by the paper) allow to find a local optimum of the chosen objective function while remaining positive. I assume the local optimum is often good enough.
Question 2 : using a poisson model is convenient because the error terms are positive, and the likelihood is simple. The update rule is compatible with the resulting objective function and is simple to implement.
Question 1 : this is sort of answered above. The update rules ensure that positivity is conserved during training.