Solved – Neural network weights – Math Solves Everything

This is going to be a long question :

I have written a code in MATLAB for updating the weights of MLP with one hidden layer . Here is the code :

weights_1 : weight matrix for input to hidden layer
weights_2 : weight matrix for hidden to output layer

function [ weights_1,weights_2 ] = changeWeights( X,y,weights_1,weights_2,alpha )
%CHANGEWEIGHTS updates the weight matrices 
%   This function changes the weight of the weight matrix
%   for a given value of alpha using the back propogation algortihm

m = size(X,1) ;     % number of samples in the training set

for i = 1:m
    % Performing the feed-forward step 
    X_i  = [1 X(i,1:end)]    ;   
    z2_i = X_i*weights_1'    ;     
    a2_i = sigmoid(z2_i)     ;     
    a2_i = [1 a2_i]          ;     
    z3_i = a2_i*weights_2'   ;     
    h_i  = sigmoid(z3_i)     ;     

% Calculating the delta_output_layer 
    delta_output_layer = ( y(i)' - h_i' )...
        .*sigmoidGradient(z3_i')  ; % 3-by-1 matrix

% Calculating the delta_hidden_layer 
    delta_hidden_layer =  (weights_2'*delta_output_layer)...
        .*sigmoidGradient([1;z2_i']) ; % 5-by-1 matrix 
    delta_hidden_layer = delta_hidden_layer(2:end) ; 

% Updating the weight matrices
    weights_2 = weights_2 + alpha*delta_output_layer*a2_i ; 
    weights_1 = weights_1 + alpha*delta_hidden_layer*X_i  ;
end

end

Now I wanted to test it on the fisheriris dataset given in MATLAB which can be accesed by load fisheriris command . I renamed meas to X and changed species to a 150-by-3 matrix where each row depicts the name of species (as for example first row is [1 0 0])

I compute error of the output layer using the following function :

function [ g ] = costFunction( X,y,weights_1,weights_2 )
  %COST calculates the error 
  %   This function calculates the error in the 
  %   output of the neural network 

  % Performing the feed-forward propogation
  m = size(X,1) ; 
  X_temp  = [ones([m 1]) X]   ;  % 150-by-5 matrix 
  z2 = X_temp*weights_1'       ; % 150-by-5-by-5-by-4 
  a2 = sigmoid(z2)       ; 
  a2 = [ones([m 1]) a2]            ; % 150-by-5
  z3 = a2*weights_2'     ; % 150-by-3
  h  = sigmoid(z3)       ; % 150-by-3

  g = 0.5*sum(sum((y-h).^2)) ; 
  g = g/m ; 
end

Now in the course the prof gave an example of toy network with 3 iterations , I tested this on that network and it gives the right values but when I test it on the fisheriris data the cost keeps on increasing . And I am not able to understand where it is going wrong .

Here is the toy network for which it runs fine :
enter image description here

there is only training example for this set .

PS : Ignore the comments ( they are size of matrix used for checking the validity of matrix multiplication for a sample case )

Finally here is the test-bench execute.m , sigmoid.m and sigmoidGradient.m which I have shared just in case to run the functions and test them

Best Answer

My first questions/thoughts are:

I use a different training method. I like the Eric Wan diagrammatic method[1]. It allows me to compute a "gradient" for my network using an adjoint network. I can detect errors in one using the other and basic test cases. Coupled debugging.
Have you compared your results to the built-ins? Did you make a technical error in the implementation of the code that you can detect there? I don't have resources right now to test this.
I'm sure someone has made a NN that tests the Fischer Iris data using a NN. Have you looked in peer-reviewed published literature? The reason I ask is I was once tasked with something that sounded clever in Q-learning, but when I reviewed the literature, it was shown to be impossible. That went over very well with my professor. Never be afraid to stand on the shoulders of giants.
Make a linear (planar) fit first, and then train the NN on the variation from the plane. This keeps your values closer to the origin, and makes your learning rates higher. It also makes the Neural Network do the heavy lifting of fitting the nonlinear part instead of wasting the training/learning on the planar part. A pseudo-inverse is computationally cheap, comparatively speaking.
I didn't see any scaling or centering on the inputs. If you have input values in the millions and initial weights in the ones, then you are going to spend all your training iterations increasing the scale for the inputs. Subtracting the mean, and then dividing by something like the standard variation can help.

[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.5262

Best Answer

Related Solutions

Solved – Does a Neural Network actually need an activation function or is that just for Back Propagation

Neural Networks – Characteristics of Intralayer (Lateral) Connections During Training

Related Question