Solved – Regression with neural network

multiple regressionneural networksregression

I tried to create a neural network for regression. In order to test the concept, I created a dataset the following way:

x1 = random('Normal',0,5,500,1);
x2 = random('Normal',0,5,500,1);
y = x1 + 2*x2;
X = [x1 x2];

So x1 and x2 are vectors of random numbers with mean value 0 and standard deviation 5.

The neural network therefore had 2 inputs, 2 or 3 units in hidden layer and one output unit. In hidden layer units I implemented sigmoid activation function and in the output layer linear function:

h(y) = 1*k20 + a1*k21+ a2*k22 + a3*k23

and for hidden layer outputs are calculated the following way:

a1 = sigmoid(1*k10 + x1*k11 + x2*k12)
(and similarly a2 = sigmoid(1*l10 + x1*l11 + x2*l12)

Edit:
The Backpropagation algorithm is implemented the following way:

Delta_1=0;
Delta_2=0;

for t=1:m
    currentSampleSize = sampleSize(t,:);
    for i = 1 : currentSampleSize
        % Step 1
        a_1 = X(t,:)';
        z_2 = Theta1*a_1;
        a_2 = sigmoid(z_2);
        a_2 = [1; a_2];
        z_3 = Theta2*a_2;
        a_3 = z_3;

        % Step 2
        delta_3 = (a_3 - y_new(t,:)');

        % Step 3
        grad = m^(-1) * X' * (a_3-y);
        delta_2 = Theta2'*delta_3.*[1;sigmoidGradient(z_2)];
        delta_2 = delta_2(2:end);

        %Step 4
        Delta_2 = Delta_2 + delta_3*a_2';
        Delta_1 = Delta_1 + delta_2*a_1';
    end
end

Theta1_grad = Delta_1/m;
Theta2_grad = Delta_2/m;

reg_1 = lambda/m.*Theta1(:,2:end);
reg_2 = lambda/m.*Theta2(:,2:end);

Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + reg_1;
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + reg_2;

As the predictions are not accurate I wonder what could be done in order to improve the regression performance. I tried to change the combinations of number of neurons in hidden layer and different regularization parameter values but the performance wasn't good enough.

I would be very thankful if anyone could describe the appropriate design of neural network for regression problems – which activation functions are most appropriate and how to diagnose training and test error for choosing the optimal number of hidden units.

Best Answer

Unless you restrict the range of your inputs, the sigmoid may be giving you a problem. You won't even be able to learn the function $y=x$ if you have a sigmoid in the middle.

If you have a restricted range, then the input-hidden weights could scale the input values so that they hit the sigmoid at the part of the graph which looks like a line i.e. the part around 0:

Sigmoid with linear part circled

Once you have the linear behavior, the hidden-output weights could re-scale the values back. The training process will take care of all this for you - but to test this hypothesis you could just train (and validate) with input values in $[-1,1]$.

Do you know what the network's function looks like? If you plot $(x_1, x_2) \rightarrow y$, does it look anything like a plane? At least in parts?

You could also try training with more data and seeing if the graph gets closer to $x_1+2x_2$.

Related Question