I am trying to build a regression model using a neural network. The final cost measure is the mean absolute error (MAE) on the output (one output unit, 200 input units).
Right now all my hidden units have rectifier activation. The output unit is just a linear unit with pass-through activation. It seems the network cannot learn efficiently, the error (even on can't find a value that makes the error go down monotonically).
I suspect the cost function (L1-norm) might be the culprit. Right now, when taking the gradient, I either pass 1 or -1 depending on predicted value vs actual output value. Is this the right way? (Since L1 is not smooth at 0, would this be the reason why the learning is not smooth/effective?) What is the right way to handle a L1-norm cost function?
Thanks, any help is appreciated!
Best Answer
I am pretty sure it is not the L1 cost function. Neural nets are pretty robust when it comes to only locally differentiable things. To make really sure, you can use the L2 loss and see if it has similar problems. That being said, my experience is that L1 and L2 based objectives find similar solutions most of the time.
Here are some things that you should investigate: