MATLAB: Check if 2D points are evenly distributed

image processingMATLABspatial statisticsstatisticsuniformity

Hi,
I have a plane and a few data points (2 to 20) on that plane. Now I want to know how evenly they are distributed.
For example if I have 4 points and one is in every corner it would be perfect. If all 4 points are in the same spot it would be worst case.
I am not really interested in the perfect way to calculate this. I want to do this very often so my main focus is on speed.
I tried to find a solution to this problem but I think I don't know what I am even looking for. I hope someone can give me the right idea.
Best Regards

Best Answer

Your goal is a measure of the how evenly points are distributed in what I assume is a rectangular domain. The problem is, you need to explain your goals here. How do you define goodness?
Suppose we had two points that were perfectly coincident, but all other points in the rectangle were perfectly evenly distributed. Would this be worse than having 3 points that were all moderately close together, and the remainder of the points all again perfectly evenly distributed? Or perhaps you might have two separate pairs that are each close together, but not coincident?
In order to define some measure of even-ness, it seems that you need to explain your goals far more clearly. How do we know that something is bad?
My suggestion, having now heard more of what you want, is to compute the interpoint distance matrix between all points in the set. I've got a tool on the file exchange that does this (IPDM), but it is trivial to compute, especially for such a small set of points.
For example, given a set of points in the unit square, lets see what we can do.
N = 20;
% the first column here is x, the second column y.
P = rand(N,2);
% D will be an NxN matrix.
D = sqrt(bsxfun(@minus,P(:,1),P(:,1).').^2 + bsxfun(@minus,P(:,2),P(:,2).').^2);
% compute the minimum interpoint distance, for
% each point. This is essentially the distance for
% each point to its nearest neighbor.
Dmin = min(D + realmax*eye(N));
Dmin
Dmin =
Columns 1 through 11
0.20294 0.11087 0.29086 0.11087 0.0066339 0.10327 0.13755 0.26768 0.23623 0.244 0.25469
Columns 12 through 20
0.10327 0.0066339 0.10841 0.13755 0.2456 0.13776 0.20294 0.10841 0.11378
As you can see, the minimum distance will in general appear twice in that list, because point 5 is closest to point 13, and point 13 is closest to point 5.
Alternatively, if you have the stats toolbox, you can use the function pdist. It too gives us the set of interpoint distances, although it returns only what is essentially the upper triangle of that distance matrix.
min(pdist(P))
ans =
0.0066339
One idea you might consider is to compare that overall minimum distance to the some measure of a best case possible. For this, think about the idea of circle packing in a rectangular region. While I could probably do some online research on the best case circle packing for N circles in a unit square, a simple measure is given by a ratio of areas.
% area of the rectangle, length*height
Arect = 1*1;
% for N circles, we assign each circle 1/N of that area,
% therefore compute an effective radius, IF we could pack
% N circles perfectly with no overlap.
Crad = sqrt(Arect/N/pi)
Crad =
0.12616
So if we could have N perfectly evenly distributed points (here, N=20 over a unit square) then the minimum interpoint distance could at best be roughly TWICE that radius. (This argument has some flaws in it, which are worst for a small number of points, where we would want to consider the optimum circle packing in some domain. I could spend some time and improve that estimate, were it of interest. This would take only a wee bit of algebra.)
The point is, we can now compare the minimum interpoint distance to that value. A truly even sampling of points will have no pair of points that are significantly closer together than twice Crad. As you can see, the random sampling I generated above is flawed in that respect, since
min(Dmin)/(2*Crad)
ans =
0.026292
is quite small. This is actually to be expected from the "uniform" random sampling, for anyone who has seen the birthday paradox in action. In fact, the closest point from such a random sampling will be much smaller than that predicted for an "even" sampling as I am choosing to define it.
As I said, if it is of interest, I can give you a better value to compare against.
Edit: I decided that I had some time to do this right.
Given a rectangle of length L, height H, with N points to be distributed around the rectangle. We want to find a measure of the best possible circle packing inside that rectangle, where the center of any circle can go as far as a corner of the rectangle, but no circle is allowed to overlap. Assume the circles have radius r.
syms N L H r
rmax = solve((L+2*r)*(H + 2*r) - N*(2*r)^2)
rmax =
(H + L - (H^2 + L^2 - 2*H*L + 4*H*L*N)^(1/2))/(4*(N - 1))
(H + L + (H^2 + L^2 - 2*H*L + 4*H*L*N)^(½))/(4*(N - 1))
I'll let the reader figure out how the expression I solve for arises. It seems clear to me. :)
As it turns out, only the second of those solutions is positive. I'll substitute 20 for N, and do it over the unit square, so L=H=1 here.
rmax = rmax(2);
subs(rmax,{N,L,H},[20 1 1])
ans =
80^(1/2)/76 + 1/38
vpa(ans)
ans =
0.14400357776314682612679861414375
So twice that radius would correspond to a maximally evenly packed set of 20 points over the unit square.
There are still some flaws with the above argument, wherein a truly optimal circle packing for N points would be a somewhat better result, but all we want is a basic measure to compare against. So it should be entirely adequate, although it will be only approximate for very small sets of points, like 2. So in fact, for two points in the unit square, the best packing would allow circles of radius sqrt(2)/2 = 0.7071...
subs(rmax,{N,L,H},[2 1 1])
ans =
8^(1/2)/4 + 1/2
vpa(ans)
ans =
1.2071067811865475244008443621048
It is only a simple measure as I said, and we need not have it be perfect.