Solved – What’s the meaning of dimensionality and what is it for this data

classificationdata miningdatasetdescriptive statistics

I'm doing my assignment for my "Modeling and Optimization" course, and now I have doubts on the first question:

What is the dimensionality of the data? What are the min, median, max,
mean, standard deviation and percentage missing data of each feature?

I can calculate those, but I'm not sure about the "dimensionality" of the data. Here's a sample of my dataset:

Sample  mcg   gvh   alm   mit   erl pox vac   nuc   Class1  Class2
1       0.58  0.61  0.47  0.13  0.5 0   0.48  0.22  MIT     non-CYT
2       0.43  0.67  0.48  0.27  0.5 0   0.53  0.22  MIT     non-CYT
3       0.64  0.62  0.49  0.15  0.5 0   0.53  0.22  MIT     non-CYT
4       0.58  0.44  0.57  0.13  0.5 0   0.54  0.22  NUC     non-CYT
5       0.42  0.44  0.48  0.54  0.5 0   0.48  0.22  MIT     non-CYT
6       0.51  0.4   0.56  0.17  0.5 0.5 0.49  NA    CYT     CYT

I've been told that dimensionality is usually referred to attributes or columns of the dataset. But in this case, does it include Class1 and Class2? and does dimensionality mean, the number of columns or, does it mean the names of columns?

Best Answer

Your assumption is correct, and you are also noticing subtleties. In a perfect world, the number of columns is the number of dimensions of a data set. However, some columns are similar, some are correlated, some are duplicates in some way, some are junk, some are useless, etc. so the actual number of dimensions can be unknown. Its a knotty problem. In your case I would go with your first assumption.

Related Question