Solved – What are good techniques for modeling small datasets

logisticneural networkssmall-sample

I’m working on a classification problem. However, my training dataset is very small (just 800 items in training dataset) and each data item contains a small number of features (just 5 features). Firstly, I used Logistic Regression to create a model for this dataset. Unfortunately, prediction accuracy of my model was very bad. Next, I used Neural Network model, but could not see any progress.

I suspect, number of training data items and number of features in each item are not enough for training Logistic Regressing and Neural Network.

So my question is what are the good techniques for modeling small datasets?

Best Answer

Neural networks can be notoriously difficult to work with (too many potential difficulties, such as local minima, over-fitting etc.). A kernel method is likely to be easier to work with, such as kernel ridge regression, the support vector machine, kernel logistic regression or Gaussian process classifiers, using a radial basis function kernel/covariance function. As long as the hyper-parameters are tuned using a sensible procedure (e.g. cross-validation), they are likely to provide good results with much less effort/risk than neural networks.

Related Question