Solved – Dimension reduction using space filling curve to avoid “Curse of dimensionality”

high-dimensionalinformation theorymachine learningmanifold-learningmathematical-statistics

In machine learning, we want to train a model. While training, if the dimension of data is high, we have a problem (Curse of Dimensionality), so we want to reduce the dimension of our data.

Since we know $\mathbb{R}^n$ and $\mathbb{R}$ have the same cardinality. So we always have some space-filling curve that maps each point uniquely in both directions. i.e. we can always bijectively map any n-dimensional data to 1-dimension. So what is the problem we have in the first place?

I come up with two problems that can still be there:

  1. With this space-filling curve map, we can't reduce the size of data. i.e., we have to increase the precision when we are writing it in 1-dimension.
  2. This is the place where I have doubts. I am thinking that while representing data in $\mathbb{R}^n$ we have more information than representing it in $\mathbb{R}$. There is some structure when we write data in $\mathbb{R}^n$ like which point is near to which point. That is lost in this map (space-filling curve), i.e, the map is not homomorphic.

My question:

  1. Is it right what I am thinking?
  2. I don't know what information I am talking about in point 2. could you help me make it more rigorous is.
  3. Is there any other problem?

Example:

Suppose we have a training data with $x^i \in \mathbb{R}^n $ with label $y^i \in \mathbb{R}$ where $i \in \{1,2,..,N\}$. When we are training a neural network to fit this data. If we change the order of the bases i.e.

$$x^i = (x_1^i,x_2^i,..,x_n^i)$$

if we take

$$\tilde{x}^i = (x_{\rho(1)}^i,x_{\rho(2)}^i,..,x_{\rho(n)}^i)$$

where $\rho$ is some permutation function(same for all $i$). then the training of neural network doesn't affect. So when order doesn't matter. What if I transform all the data from $\mathbb{R}^n$ to $\mathbb{R}$? It should also don't matter. But it did matter. Otherwise, there is nothing like the curse of dimensions.

I think When we transform the data from $\mathbb{R}^n$ to $\mathbb{R}$, we lose some information or do something wrong. What is that?

Best Answer

I think your intuition is right; moving from $\mathbb{R}^n$ to an affine parameter along a space-filling curve will discard information about what points are close to one another in the high-dimensional space. Points in the same neighborhood can be separated by arbitrarily large distances along the curve.

Consider, as an example, a problem where your prediction targets lie in a compact region in $\mathbb{R}^n$. Your machine learning task is to find a way to characterize that region. In the space-filling curve representation, the curve likely dips in and out of that region for an infinite number of ranges of the affine parameter $\lambda$. Finding these segments of the curve is not only much harder than finding the boundaries of the region in $\mathbb{R}^n$, it is likely impossible because you probably have arbitrary large $\lambda$ that lie in the region. Your generalization error will be terrible, since any new case that lies along a segment of the curve you haven't explored yet will generate a missed prediction, even if it differs imperceptibly from a training point in the $\mathbb{R}^n$ representation.

Dimension reduction does have a place in machine learning, but the trick is to discard dimensions that are not providing useful information for your prediction problem. Just forcing everything into one dimension using a construct like a space-filling curve doesn't accomplish that.