Solved – How to combine two models (neural network and KNN) in Matlab

ensemble learningMATLABneural networks

I am conducting research on credit risk prediction using neural networks and K-Nearest Neighbour models (K=10) in Matlab. The dataset has 490 samples with 19 features, from which I want to predict 1 binary output variable (the credit risk of companies, Bankrupt/Non-Bankrupt). Data is split into 75% training and 30% validation and testing.

Now I want to combine both models so get one accuracy rate, as I guess it will be higher than using only one of the NN or KNN models. My question therefore is: how to combine both to give me one accuracy rate in Matlab? I know stacking and bagging techniques exist – how to use/implement them in Matlab (and test them for their real performance)?

Simple example of the neural network setup:

  1. input layer: 19 input variables (X1—X19), accounting ratios (liquidity profitability … ratios), for 420 sample (companies across different years).

  2. hidden layer: 2 hidden layers with 10 neurons each, sigmoid function, training based on percentage training and validation. Backpropagation algorithm for learning and adjusting the weights.

  3. output layer: Y (the company status: $<0.5$ indicates bankrupt, $>0.5$ indicates non bankrupt).

Simple example of the KNN setup:

I use the KNNclassify function with K=5 and Euclidean distance. Input and output are the same as with the ANN exmaple.

I see that can use as you said Bagging or stacking, I may try do both since Matlab has already a ready-to-use function for both. My main problem is that I cannot find a guide to combine both models to give me ONE prediction and its accuracy so my ensemble model want to do in Matlab is as follows:

  1. NN –> output
  2. KNN –> output
  3. Stacking or bagging
  4. get final output

How to achieve this?

Best Answer

Independently of the exact setup you have (model types, amount of samples and features), you could use a number of ensemble techniques. Note that you don't necessarily need to use the provided APIs in your ML tool/language for this - e.g. model avearging, bagging, and stacking can usually be implemented with a few extra lines of code.

  • Model averaging: you train $N$ models with training data, then use all $N$ trained models on new samples to obtain $N$ predictions per sample. Per sample, the $N$ predictions are usually averaged to obtain the scalar ensemble prediction (therefore the name "model averaging"). For classification, class probability metrics could also be derived from the amount of votes for each class (e.g. 4 votes / 10 models = 0.4). You can easily do model averaging yourself: it does not need an modified training procedure, so you can use any amount and type of ready trained models you already have - just average the output as mentioned above.

  • Bagging: is nearly the same as model averaging, but requires a slightly modified training procedure, as it uses a subset of samples to train each model$^1$. You will therefore want to use more than 1 KNN and 1 ANN model therefore. Like with averaging, prediction outputs over all $N$ models are averaged to obtain a scalar ensemble output. Like model averaging, you could easily implement this yourself: select a subset of samples, train one model, and repeat the process $N$ times until you have the desired amount of models to average predictions from afterwards.

  • Stacking: also requires a slightly modified training procedure. You train $N$ models to predict the output for a new sample. You then use the $N$ predicted outputs for all training samples as input for another model that is "stacked" upon the other models (so it becomes a layered chain of models). This final models predicts the actual output for new samples. Again, you can easily implement this yourself: use your $N$ models to generate $N$ predictions for training samples, then train another final model using the $N$ outputs as inputs. Note that you will likely need more than 2 models to base the final model on to notice a significant boost in results.

There would also be Boosting, but this is a bit more complex and probably not what you are aiming for right now.

$^1$ Note that bagging can also be applied on features ("random subspace").

Related Question