Solved – How To Determine The Number Of Dimensions To A Machine Learning Problem

dimensionality reductionhigh-dimensionalmachine learning

I have a bit to learn about machine learning, so please pardon me if I am asking the wrong type of question. I have read some about neural networks and SVMs, so I'm not completely in the dark.

I am wondering how to tell how may dimensions a problem has: is it the number of possible outcomes or the total number of inputs? Does it depend on the type of machine learning algorithm or only on the particular problem at hand? Or am I missing something entirely?

*Most literature I have read refers to 'high-dimensionality', but I am wondering if an exact number of dimensions can be calculated. I am then hoping to use this when trying to reduce the number of dimensions (when I get that far) to judge the overall effectiveness of a strategy. But first I must better understand the dimensions of a machine learning problem.

**If necessary, please use neural networks or SVMs as a reference point, but I am also interested in hearing about genetic algorithms and anything else you might like to mention.

Best Answer

Generally the dimensionality of the problem is, as you suspected, equal to the number of inputs ( also known as, features, measurement variables ).

So in the NN model, that would be the number of nodes in the input layer. There may be unmeasured features from the problem, but normally dimensionality only refers to the measurements you have.

You may synthesise extra features from the ones you have, perhaps choosing to square one of the existing features, to make a new one, if you think this might be helpful. That would add one extra feature to the dimensionality.

On a separate point, if you are looking at SVMs you may encounter something called the Vapnik–Chervonenkis dimension ( VC dimension ). This is generalised concept which refers to the 'power', or expressiveness, of a learning algorithm. It is based on the number of points that an algorithm, with a certain set of parameters, can 'shatter" ( separate ). It is not directly related to the dimensionality of the learning problem.

Related Question