I'm trying to figure out the matrix multiplications for the implementation of a single hidden layer neural net for MNIST digit recognition in Python.
Like the following:
h1
x1 z1
h2
x2 z2
1 h3
1
I'm using a hidden layer of size 200.
The number of features for the digits is 784.
The number of classes is 10.
Each label is transformed to a vector of length 10 which has a single 1 in the position of the true class and 0 elsewhere.
Between the input and the hidden layer, I'm going to use a 200 by 785 matrix V.
Matrix V: the i, j – entry represents the weight connecting the jth unit in the input layer to the ith unit in the hidden layer. The ith row of V represents the ensemble of weights feeding into the ith hidden unit.
Between the hidden the the output layer, I'm going to apply a matrix W, which is 10 by 201.
Matrix W: the i, j – entry represents the weight connecting the jth unit in the hidden layer to the ith unit in the output layer. The ith row of W is the emsemble of weights feeding into the ith output unit.
So I start with the input matrix, which is n by 784. Can someone explain what to do? What do I need to multiply it by, and then what/how do I multiply the result by? I'm not sure how exactly to multiply these matrices.
(Let's just call the activation functions f().)
I'm a bit confused by the dimensions of the matrices and not sure when /where/ how exactly to use V and W.
Best Answer
I assume you are just asking how to perform a feedforward pass.
Let's say your input matrix is X [n by 784]