How does hyper-plane equation divide vector space into cells convex polyhedrons

decision treesgeometrylinear algebratreesvector-spaces

I came across some specific algorithm that divides high-dimensional vector space into non-overlapping cells of convex polyhedron.

It does this by using tree based binary partitions (which might not be precisely relevant to this site), but in order to divide vector space into cells of convex polyhedrons, random tests are utilized at each node of the tree, until it reaches a leaf node.

What's intersting, is what happens in the random tests itself.


Random tests are based on subspace projections of high dimensional vectors (variable $d$ is dimension of vector space, variable $r$ is the split ratio of the tree):

The test is based on subspace projections of high dimensional
features. Suppose the node $l_i$ contains $n_i$ data points $x_0, x_1,
… ,x_{n_i}$
where $x_j=(x_{j0}, x_{j1}, …, x_{jd-1})$. We randomly
select an index set $I_i=\{d_{i0}, d_{i1}, … , d_{ik-1}\}$ of size
$K$ ($K \leq d$) where $d_{ij} \in (0, 1, …, d-1)$ and random
coefficients $\Im{}_i=\{\xi_{i0}, \xi_{i1},…,\xi_{ik-1}\}$ where
$\xi_{ij} \in [0, 1]$, and compute for every feature vector $x_j$ a
scale value $y_j=\sum^{K-1}_{k=0} x_{jd_{ik}} \xi_{ik}$ for
$j=0,…,n_i-1$. We then sort out the $y_j{s}$ and denote the sorted
sequence as $\{\tilde{y_j}\}$. Randomly select $\psi_i$ between the
$100*r$ and $100*(1-r)$ percentile of points $\{\tilde{y_j}\}$, i.e.,
$\{\tilde{y_j}\} \in [\tilde{y}_{n_i * r}, \tilde{y}_{n_i * (1-r)}]$.

The random test (equation 1):

$$t_i(x)=\sum^{K-1}_{k=0} x_{d_{ik}} {\xi_{ik}}-\psi_i \geq 0$$

defines a hyper-plane which splits the polyhedron at node $l_i$ into
two polyhedrons each containing no less than $r*n_i$ data points.

I do not have much knowledge of geometry, but I know from my linear algebra textbook that convex sets are sets such that every point of every line segment in the set belongs in the set.

Also, as explained in this brilliant answer, convex polyhedron equals the intersection of finitely many closed half-spaces and is not contained in a plane, and in three dimensions, can be represented by $ax+by+cz \geq 0$, which is very similar to the equation 1.

What would be the simple explanation of this process? let's say hyper-plane was a line (if it could be), how would it divide convex polyhedron? Furthermore, what do subspace projections have to do with this?

Thank you very much!

Best Answer

Note that you can calculate $y_i$ as $$\bbox{ y_i = \sum_{k=0}^{d} x_{i k} \zeta_k }$$ and $t_i(x)$ as $$\bbox{ t_i(x) = -\psi_i + \sum_{k=0}^{d} x_{i k} \zeta_k = y_i - \psi_i }$$ where $\zeta$ is a $d$-dimensional vector with $d-K$ components zero, and all other components between $0$ and $1$.

This means that the underlying idea is to project the $n_i$ data points along a random, essentially $K$-dimensional vector; sort the $n_i$ data points according to their signed distance to origin along that vector, $y_i$; and pick $\psi_i$ so that there are at least $p n_i$ points with larger, and at least $p n_i$ points with smaller signed distance to origin.

(In practice, this can be done using quickselect or some other selection algorithm; if the selection algorithm modifies (partially sorts) the data, use tuples $(i, y_i)$, so that you do not need to recompute the $y_i$ to partition the data.)

The reason for the index set and coefficients is to reduce the computational complexity, by using a subspace of $K$ dimensions, instead of the full $d$-dimensional vectors. When $d$ is very large, using a smallish $K$ (say a dozen or so) makes a huge difference: instead of $d$ multiplications and $d-1$ additions per point to compute $y_i$, only $K$ multiplications and $K-1$ additions are needed.

This works as long as the points $n_i$ are not planar with respect to vector $\zeta$. (If they are, then all $y_i$ are the same.)

Consider the following four-level 2D tree:
2D subdivision
At the top level, the vector $\zeta$ is perpendicular to the red line marked 1. At least fraction $p$ of the points are on the right side, and $p$ on the left side. $\psi$ is where that vector and the red line intersect.

At the second level, the vectors $\zeta$ are perpendicular to the reddish-brown lines marked 2; at the third level, perpendicular to the brown lines marked 3; and at the fourth level, perpendicular to the green lines marked 4. In all cases, $\psi$ is where $\zeta$ and the shown line intersect.

This four-level, 15-node tree divides the space into 16 convex polygons. In general, an $N$-level tree has $2^N - 1$ nodes, and divides the space into $2^N$ convex polygons.

The cells are convex, because when subdividing, only the points inside a cell are subdivided into two subcells. In geometric terms, the dividing line (hyperplane) does not cross existing lines (hyperplanes), and each one only splits one cell into two (unless subdivided further).

If you have a 2D point, and you compute the four $t(x)$ tests for each level in the tree to descend into a leaf node, you arrive in the cell that contains your 2D point.

In the above diagram, $K = d = 2$, but as I already said, the splitting vector $\zeta$ is still random, just chosen in a way that reduces the computational complexity when $d \gg K$.

Related Question