Solved – Weight Decay in Neural Neural Networks Weight Update and Convergence

convergencemachine learningneural networks

I have a neural network (That I created using java) for a class assignment that is working when I do not use any weight decay value, but when I use a value greater than or equal to .001, my accuracy drops greatly. The data is normalized. I am not sure if it is how I am calculating the convergence condition, or if my weight updates with weight decay is incorrect. I am using a sigmoid activation function. My classifier is binary 0 or 1, and when classifying if my output is > .5, the example is 1, and <= .5, the example is 0.

In my test I am using 5 hidden neurons + 1 bias, and 11 input neruons + 1 bias, and 1 output neuron. When running with 0 weight decay i am getting 99% accuracy, however when i use a value of .001 I am getting 56% accuracy. The accuracy I am using is TP + TN / (TP + TN + FP + FN)

My weight update right now is

Weight = Weight – LearningRate * WeightChange – Weight * WeightDecay

My convergence test is if the absolute difference in the sum of the current weights and the sum of the previous weights is < 0.00001 I say that the network has converged. Is this correct in thinking so?

Let me know if there is any more information needed.

Best Answer

It is not surprising that weight decay will hurt performance of your neural network at some point. Let the prediction loss of your net be $\mathcal{L}$ and the weight decay loss $\mathcal{R}$. Given a coefficient $\lambda$ that establishes a tradeoff between the two, one optimises $$ \mathcal{L} + \lambda \mathcal{R}. $$ At the optimium of this loss, the gradients of both terms will have to sum up to zero: $$ \triangledown \mathcal{L} = -\lambda \triangledown \mathcal{R}. $$ This makes clear that we will not be at an optimium of the training loss. Even more so, the higher $\lambda$ the steeper the gradient of $\mathcal{L}$, which in the case of convex loss functions implies a higher distance from the optimum.

Related Question