Neural Networks – Binary vs Discrete/Continuous Input in Neural Networks

neural networks

Are there any good reasons for preferring binary values (0/1) over discrete or continuous normalized values, e.g. (1;3), as inputs for a feedforward network for all input nodes (with or without backpropagation)?

Of course, I'm only talking about inputs that could be transformed into either form; e.g., when you have a variable that can take several values, either directly feed them as a value of one input node, or form a binary node for each discrete value. And the assumption is that the range of possible values would be the same for all input nodes. See the pics for an example of both possibilities.

While researching on this topic, I couldn't find any cold hard facts on this; it seems to me, that – more or less – it'll always be "trial and error" in the end. Of course, binary nodes for every discrete input value mean more input layer nodes (and thus more hidden layer nodes), but would it really produce a better output classification than having the same values in one node, with a well-fitting threshold function in the hidden layer?

Would you agree that it's just "try and see", or do you have another opinion on this?
Possibility one: direct input of the possible values {1;3}
Possibility two: get each input value a binary node

Best Answer

Whether to convert input variables to binary depends on the input variable. You could think of neural network inputs as representing a kind of "intensity": i.e., larger values of the input variable represent greater intensity of that input variable. After all, assuming the network has only one input, a given hidden node of the network is going to learn some function $f(wx + b)$. where $f$ is the transfer function (e.g. the sigmoid) and $x$ the input variable.

This setup does not make sense for categorical variables. If categories are represented by numbers, it makes no sense to apply the function $f(wx + b)$ to them. E.g. imagine your input variable represents an animal, and sheep=1 and cow=2. It makes no sense to multiply sheep by $w$ and add $b$ to it, nor does it make sense for cow to be always greater in magnitude than sheep. In this case, you should convert the discrete encoding to a binary, 1-of-$k$ encoding.

For real-valued variables, just leave them real-valued (but normalize inputs). E.g. say you have two input variables, one the animal and one the animal's temperature. You'd convert animal to 1-of-$k$, where $k$=number of animals, and you'd leave temperature as-is.

Related Question