Solved – how to handle small datasets with large dimensions

datasetdeep learningdeep-belief-networksmachine learningneural networks

I have 48 samples which are case and control and 27000 features for each sample so my matrix is [48 X 27000]and I am using Deep belief networks(DBN) as my algorithm to predict the accuracy of the datasets. but when I load the datasets in DBN my results are random. When ever I run the DBN on those samples with same parameter values it just gives me different accuracy. Is there any reason behind it can any one tell me the reason. Can I concatenate the same dataset multiple times and rerun.? Is there a way to do like that.?concatenating in the sense adding same data set multiple times like 48 + 48 = 96 samples. If concatenating is possible can any one give me reference paper.

Best Answer

What you have, is not a problem suitable for DBN. There is no way not to overfit those data. You need to use linear and strongly regularized models. Linear SVM are often used. There is whole chapter in elements of statistical learning about dealing with similar problems, you might check it out (it's for free).

Edit: whole point of MLP and DBN is that they can learn complex, nonlinear features from the data. You don't have enough data, therefore any method that allow complex models, will just overfit.

Another issue is that in such a high dimensional problem, there is always some hyperplane that can separate your classes, therefore there is no need for nonlinear methods.

Related Question