MATLAB: How to work with categorical variables in Neural Network Toolbox 6.0 (R2008a)

classificationDeep Learning Toolboxnominalordinal

I want to use the NEWSOM neural network for clustering. My dataset contains both numerical and categorical variables; in the documentation, however, I can only find how to work with numerical variables.

Best Answer

The ability to work with categorical variables is not available in the Neural Network Toolbox. It is supported with the Statistics Toolbox.

To work around this issue, you need to represent your categories as numerical values. In general there are two approaches:

1. Represent each category as an integer. For example if you have categories 'small', 'medium' and 'large' you could say 'small' = 0, 'medium' = 1, 'large' = 2. Although in general this idea should work fine when you only have categorical variables, you may need to pay attention to scaling when you also have numerical variables. The idea is that a numeric change for one input of the net (for example integers representing one of your categorical variables) should have roughly the same importance as the same numeric change to another input (for example one of your numerical variables). There are several tools in the Neural Network Toolbox that help you scale your inputs, see the "Processing Functions" section in the Neural Network Toolbox documentation.

Normally start with MAPMINMAX if this does not perform well you could try MAPSTD or possibly PROCESSPCA. You can apply these functions manually, or you can apply them automatically to an input by setting the 'processFcns' property for the input.

2. Use 1-of-N encoding. For example if you once again have the 'small', 'medium' and 'large' categories you could say 'small' = [1 0 0], 'medium' = [0 1 0], 'large' = [0 0 1]. When you use this method scaling should not be as much as an issue as in the other method. Also this type of encoding might work better for "unordered" categories. For small,medium,large assigning 0,1,2 in the other method seemed pretty straightforward but what if you have 'red', 'green', 'blue'? Then (for argument's sake ignoring frequency of light) there is no natural ordering so it is difficult to say whether 'red' = 0, 'green' = 1, 'blue' = 2 is better (will work better) than 'red' = 2, 'green' = 1, 'blue' = 0. The 1-of-N encoding 'red' = [1 0 0], 'green' = [0 1 0], 'blue'= [0 0 1], should not have this problem.

Related Solutions

MATLAB: How to code Categorical Variables in NARX neural network data input

T.MONTH_C = categorical(T.MONTH, {'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'}, 'ordinal', false);
T.HE_C = categorical(T.HOUR, 1:24, {'01:00', '02:00', '03:00', '04:00', ....... '24:00'}, 'ordinal', false);
T.WEEKDAY_C = categorical(T.WEEKDAY, 1:7, {'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'}, 'ordinal', false);

I prefer not to convert Months into 1-12 as Matlab will assume some scale (Month 12 is higher than Month 6, etc)

I do not know what part of the world you live in, but in the part of the world that I live in, the electrical demands between adjacent calendar months are strongly correlated. The relationship between the demands for January and February are much stronger than the relationship between the demands between January and June.

MATLAB: Neural network fitting for PWM synthesis

According to your code:

net = network( ...
1, ... % numInputs, number of inputs,
2, ... % numLayers, number of layers
[1; 0], ... % biasConnect, numLayers-by-1 Boolean vector,
[1; 0], ... % inputConnect, numLayers-by-numInputs Boolean matrix,
[0 0; 1 0], ... % layerConnect, numLayers-by-numLayers Boolean matrix
[0 1] ... % outputConnect, 1-by-numLayers Boolean vector
);

The network has only 2 layers.

>> view(net)
>> net.layers
ans =
  2×1 cell array
    {1×1 nnetLayer}
    {1×1 nnetLayer}

Refer to the following for more information: network, Neural Network Object Properties, Neural Network Subobject Properties

Best Answer

Related Solutions

MATLAB: How to code Categorical Variables in NARX neural network data input

MATLAB: Neural network fitting for PWM synthesis

Related Question