Behaviour of intralayer (lateral) connections in neural network during training

I am interested in experimenting with some very simple neural network architectures by building them in numpy.

I have followed a course in how to build a feed-forward multilayer neural network in numpy (e.g. Andrew Ng's brilliant course on Coursera), however, I am wondering what would happen if I were to add some intralayer (lateral) connections within one of the layers. I'd like to continue building my intuition in numpy, and so I am wondering how this might work on the forward pass and backward pass mathematically.

I understand and can implement the forward and backward passes for the fully connected network. And I understand that to introduce dropout the relevant column in the interlayer adjacency matrix needs to be zeroed. However, the intralayer adjacency matrix does not feature in this process as (usually) all the weights are 0.

My intuition tells me that the lateral connections should be treated the same as the interlayer connections.

In my example below, would the forward sequence go:

adjust input-hidden weight
adjust hidden lateral weights
adjust hidden-output weights

With the reverse sequence on the backward step? In this case, I guess this would look like creating a copy of the hiden layer, with only a connection between the 3rd and 4th node.

Or I could be completely wrong!

Edited to include modified diagram as per Sycorax's suggestion:

Best Answer

What you're calling an "intra-layer" connection can be re-written so that the network still has strictly left-to-right flow for the forward pass, and uses skip connections; you just pin most of the weights to 0 in some layers. Since from this perspective, your proposed network is just a special case of a FFN with more than 1 hidden layer, the most common approach would be to just make the network deeper and let ordinary backprop update the weights in the usual way. But I guess you could include binary masks to enforce a certain kind of network connections, if you wish.

The second diagram that you've included shows that there's 2 different hidden layers, and several skip connections (from input to hidden 2, or hidden 1 to output). This makes it clear that this is a special kind of FFN, with a constrained kind of wiring.

Best Answer

Related Solutions

Solved – Problem with getting neural network learned to calculate XOR

Solved – Unexpectedly slow convergence when classifying MNIST digits with a Neural Network

Related Question