Solved – Interpreting neural network weights

categorical dataneural networksweights

I created a neural network model for a classification task based on 14 variables, 1 hidden layer of size 8. Outputs give 3 possible classes.

The weights I got from input (left) and output (right) layers:

enter image description here

I struggle to interpret the relevance of weights attributed to the variables (v1, v2, …) among the neurons (n1, n2,…).

The variables v4 and v9 are initially guessed as most relevant for classifying. However, they have very distinct weights among the neurons (17 or 10 to -12).

Are negative weights as important as positive weights?

In the output layer, v7 always has negative weights, which reduces the importance of the highest weight of v4 on n7.

I also did repeated runs of independent training sessions to check the behavior of the model, and then I summed the total positive and negative weights of each variable from the input layer each time. The results showed that v4 and v7 are apparently always top ranked (see below). But considering the output layer, and the distribution of negative weights, I realized that v4 and v7 are perhaps not the best variables for classification.
enter image description here

Best Answer

The interpretation of weights is generally a challenging task.

Regarding your specific concern:

  • The negative weights are as important as positives. The fact that a weight is negative means that most of the data records have "voted" for having it negative to have a better fit. If you started the learning from values close to zero, then it was the data who made the weight negative by backpropagation.
  • Considering your v4 and v7 as favorites, I would recommend you to repeat the learning process with these variables only and compare the quality (accuracy/recall/precision) of the new model with model based on all variables.