Solved – Sigmoid activation hurts training a NN on pyTorch

backpropagationmachine learningneural networkstorch

I'm a beginner in the field of Machine Learning and I'm currently trying to get my hands "dirty" for the first time with some code after completing a course in that field.

I'm using pyTorch to train a simple NN with one hidden layer.
This is the code of my class:

class smallLayerNet(torch.nn.Module):

    def __init__(self, D_in, H, D_out):
        super(smallLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        sigmoid = torch.nn.Sigmoid()
        z1 = self.linear1(x)
        a1 = sigmoid(z1) # sigmoid activation
        z2 = self.linear2(a1)
        return z2

I'm using MSE for the loss function and Stochastic Gradient Descent for the optimization.

When running on 500 iterations on some random initialization I get a loss value of: 0.27523577213287354

However, if I remove the sigmoid activation, and the forward function looks as follows:

def forward(self, x):
        z1 = self.linear1(x)
        z2 = self.linear2(z1)
        return z2

after 500 iterations I get a loss value of 1.4318013788483519e-11 which is extremely better.

When I studied ML, I've learned that we want to use an activation function on the neurons, such as Sigmoid/ReLU/tanh. So – what am I missing here? Am I doing something wrong or am I wrong in my assumption?

Thanks!

Best Answer

If you are trying to make a classification then sigmoid is necessary because you want to get a probability value. But if you are trying to make a scalar estimate then you would want not want to have a sigmoid since this would limit the output values btw 0 and 1.

Related Question