Solved – Neural Net Matrix Multiplication

machine learningmatrixneural networkspython

I'm trying to figure out the matrix multiplications for the implementation of a single hidden layer neural net for MNIST digit recognition in Python.

Like the following:

               h1             
x1                            z1
               h2
x2                            z2
 1             h3

                1

I'm using a hidden layer of size 200.

The number of features for the digits is 784.

The number of classes is 10.

Each label is transformed to a vector of length 10 which has a single 1 in the position of the true class and 0 elsewhere.

Between the input and the hidden layer, I'm going to use a 200 by 785 matrix V.

Matrix V: the i, j – entry represents the weight connecting the jth unit in the input layer to the ith unit in the hidden layer. The ith row of V represents the ensemble of weights feeding into the ith hidden unit.

Between the hidden the the output layer, I'm going to apply a matrix W, which is 10 by 201.

Matrix W: the i, j – entry represents the weight connecting the jth unit in the hidden layer to the ith unit in the output layer. The ith row of W is the emsemble of weights feeding into the ith output unit.

So I start with the input matrix, which is n by 784. Can someone explain what to do? What do I need to multiply it by, and then what/how do I multiply the result by? I'm not sure how exactly to multiply these matrices.

(Let's just call the activation functions f().)

I'm a bit confused by the dimensions of the matrices and not sure when /where/ how exactly to use V and W.

Best Answer

I assume you are just asking how to perform a feedforward pass.

Let's say your input matrix is X [n by 784]

  1. add a column of ones on the left of X to make it [n by 785] for the biases
  2. Hidden weighted input are Z = X*V' [n by 200]
  3. Apply non-linearity to all elements of H to get hidden activations A = f(Z)
  4. Get output as f(A*W') [n by 10] where this last f is probably a softmax output layer computation to get posterior probabilites for every class
Related Question