Solved – Clustering a dataset with both discrete and continuous variables

clusteringcontinuous datadiscrete datagaussian mixture distributionk-means

I have a dataset X which has 10 dimensions, 4 of which are discrete values.
In fact, those 4 discrete variables are ordinal, i.e. a higher value implies a higher/better semantic.

2 of these discrete variables are categorical in the sense that for each of these variables, the distance e.g. from 11 to 12 is not the same as the distance from 5 to 6. While a higher variable value implies a higher in reality, the scale is not necessarily linear (in fact, it is not really defined).

My question is:

  • Is it a good idea to apply a common clustering algorithm (e.g. K-Means and then Gaussian Mixture (GMM)) to this dataset which contains both discrete and continuous variables?

If not:

  • Should I remove the discrete variables and focus only on the continuous ones?
  • Should I better discretize the continuous ones and use a clustering algorithm for discrete data?
Related Question