MATLAB: Are the sample input and output matrices stored in the model itself (net.input​s{1}.examp​leInput) in Neural Network Toolbox 6.0 (R2008a)

Deep Learning Toolbox

When initializing a neural network with the NEWFF function, I noticed that the sample input and output matrices are stored in the model itself (net.inputs{1}.exampleInput). When having huge training data sets, this would consume too much memory. The data and models should be decoupled and not stored together.

Best Answer

Sample data is needed by NEWFF to create an initialized network and that can be reinitialized with INIT properly. The sample data is used to configure data ranges, processing function settings, etc.
You should use a small subset of the data for creating the network and then use rest of the data to train the network.
Changing or removing the sample data will automatically change the pre- and post-processing settings which are calibrated by that data and hence not recommended.
The processing settings are also updated for any changes to the list of processing functions or parameters and having the data stored in the network object allows this to happen reliably.
Currently, the sample input and target data are stored in the network so that the network’s input and output processing functions can have their processing settings automatically reconfigured if you makes changes to those processing functions or their parameters before training.
If you want to limit the amount of data stored in the network, for better memory efficiency, the recommended work around is to create the network using a subset of the data (i.e. only supply a subset of the columns of inputs and targets). As long as the data is still representative of value ranges and the presence of NaN’s (only a concern in applications where unknown inputs are being used) then the network will still train well.
In applications where there are no unknown input values, the ranges of inputs and targets could be used instead, alongside a third vector. (Having at least 2 vectors in inputs and targets is important to distinguish calls which supply input and target data from old calls to NEWFF that only supplied input ranges followed by layer sizes, etc.)
inputs2 = [minmax(inputs) inputs(:,1)];
targets2 = [minmax(targets) targets(:,1)];
Net = newff(inputs2,targets2, )
This workaround is only needed if memory efficiency of the network object is a concern.