Metric Geometry – Partitioning Unit Square with Equal Frequency Rectangles

discrete geometrymg.metric-geometryoc.optimization-and-control

If I had to partition the unit square $[0,1]\times[0,1]$ into $k^2$ rectangles such that the sum of their diagonals is minimum possible, I would simply choose the $k \times k$ grid of squares. Now suppose we also have a collection of $nk^2$ points in general position inside the unit square and impose the additional requirement that each rectangle in the partition should contain exactly $n$ points.

  1. Is there an efficient algorithm to determine an optimal/near-optimal partition?

  2. What if we slightly relax the assumption that each rectangle should contain an equal number of points?(as is necessary when the total number of points is not divisible by $k^2$)

This question is related to a procedure called data discretization in the field of data mining and statistics. See below.

[1] James Dougherty, Ron Kohavi, Mehran Sahami, Supervised and Unsupervised Discretization of Continuous Features, Machine Learning Proceedings 1995

Best Answer

Here is an answer inspired by redistricting and the shortest split line algorithm.

For any rectangle with $mn$ points, consider the $2m-2$ ways of dividing it horizontally or vertically into two rectangles with an integral multiple of $n$ points in each. Among these possible divisions of the rectangle into $R$ and $R’$, we can choose the one for which $E[R]+E[R’]$ is minimal, where $E$ is an appropriate scoring function.

Now apply this technique to divide the unit square into two rectangles, and apply it recursively to each of the rectangles that result.

For the scoring function of a rectangle $I\times J$ with $mn$ points, we can take $E[I\times J]=\sqrt{A^2 m^2/i^2+B^2 i^2}$ where $A=|I|$, $B=|J|$, $i=\max(1,\min(m,\sqrt{mA/B}))$. This gives an optimal sum of diagonal lengths if the distribution of points is uniform and $i$ is integral. Then the scoring is quick, and the total algorithm requires a number of steps which is $O(k^4)$.