How we can find a straight line so that at least p percent of the points are exactly on that line

algorithmsgeometry

For N points on the plane, how we can determine that there is a straight line such that at least $p(20<=p<=100)$ percent of the points are exactly on that line?

For example: With $n=5,p=55$ and coordinates of $5$ points:

  • 0 0
  • 10 10
  • 10 0
  • 0 10
  • 3 3

The answer is possible (it means existing a straight line satisfied problem, that's line: $y=x$,because we have $(0,0),(10,10),(3,3)$ are in this line and these points occupies $60\text{%}>55\text{%}$ of total points)

This is my try: I try build all lines that through two points of $n$ points and compute each case, but it's too waste time when $n$ is large!

Best Answer

We can solve this in expected $O(n/p^2)$ time with a randomized algorithm. Pick two distinct points at random, and see if the line through them passes through at least $p$ fraction of the points. If not, repeat. The check takes $O(n)$ time to visit all the points, and the probability that both of the chosen points are on the line is $p^2$, so we will iterate the procedure an expected $1/p^2$ times.

This algorithm only returns true positives, but if you don't know a priori that such a line exists, you can cut it off after some number of iterations to get high confidence that no such line exists.


The problem of finding a short algorithm to the nonexistence of a "heavy line" among the points can be connected to some graph theory as follows. Suppose the algorithm decides to visit the pairs $(p_i,p_j)\in E$ for some set $E$, and check that none of the resulting lines are heavy (that is, they do not hit $p$ fraction of all the points). This takes time $O(n|E|)$ to check by the obvious method. This forms a graph $G=(V,E)$ with the points as vertices and $E$ as the edges.

But now consider the complement graph $G^c$ which uses all the edges not in $E$. This is the graph of all pairs that were skipped in the algorithm. If the algorithm did not detect a heavy line even though one exists, then since every pair of points in the line would witness it they must all be in $G^c$. Thus $G^c$ contains a clique of size $pn$. So it suffices to ensure that $G^c$ has no cliques of size $pn$ to ensure correctness.

By Turán's theorem, a graph with no cliques of size $r+1$ (where $r=pn-1$) has at most $(1-1/r)n^2/2$ edges, hence $G$ has at least $n^2/2r=n/2p+O(1)$ edges. The theorem also gives a concrete graph, the Turán graph, that achieves this bound. So we can find a heavy line or prove nonexistence deterministically in $O(n^2/p)$ time.

Related Question