Finding a subset minimizing distance of the mean from a specific point.

algorithmscomputational complexitycomputational mathematicsdiscrete optimizationoptimization

Let $X := \{x_1, x_2,…, x_n\} \subset \mathbb{R}$, $d\in \mathbb{R}$, $k \in \mathbb{N}$. I am interested in the following optimization problem

$$\min_{A \subseteq X, |A|=k} \left|\sum_{x_i\in A}\frac{x_i}{k}-d\right|.$$

In other words, I want to find such a subset $A \subset X$ having $k$ points for which the distance between the mean and $d$ is minimal. Numbers $k$ and $d$ are given.

Are there any (efficient) algorithms for finding $A$?

Best Answer

You can solve the problem via integer linear programming as follows. For $x_i \in X$, let binary decision variable $y_i$ indicate whether $x_i \in A$. Let $z$ be a nonnegative variable that represents the absolute value. The problem is to minimize $z$ subject to linear constraints \begin{align} \sum_{x_i \in X} y_i &= k \tag1\\ \sum_{x_i \in X} \frac{x_i}{k} y_i - d &\le z \tag2\\ \sum_{x_i \in X} \frac{x_i}{k} y_i - d &\ge -z \tag3 \end{align} Constraint $(1)$ enforces $|A|=k$. Constraints $(2)$ and $(3)$ linearize the absolute value.