Minimizing distance of the mean from a point

algorithmscomputational complexitydiscrete optimizationnonlinear optimizationoptimization

This is similar to this question.

Let $X := \{x_1, x_2,…, x_n\} \subset \mathbb{R}$, $d\in \mathbb{R}$. I am interested in the following optimization problem

$$\min_{A \subseteq X} \left|\sum_{x_i\in A}\frac{x_i}{|A|}-d\right|,$$

i.e. finding a subset of $A$ which minimizes the distance of the mean from $d$.

Please note that right now I do not specify number of points in $A$ and it seems to me that it cannot be re-expressed as a linear programming problem.

Is there any efficient way to solve this problem?

Best Answer

As in my answer to the linked question, let binary decision variable $y_i$ indicate whether $x_i \in A$. You want to minimize $$\left|\frac{\sum\limits_{x_i \in X} x_i y_i}{\sum\limits_{x_i \in X} y_i} - d\right|.$$ Let $z$ be a nonnegative variable that represents the absolute value. The problem is to minimize $z$ subject to constraints \begin{align} \frac{\sum\limits_{x_i \in X} x_i y_i}{\sum\limits_{x_i \in X} y_i} - d &\le z \tag1\\ \frac{\sum\limits_{x_i \in X} x_i y_i}{\sum\limits_{x_i \in X} y_i} - d &\ge -z \tag2 \end{align} Equivalently (because $\sum_i y_i > 0$), \begin{align} \sum_{x_i \in X} (x_i - d) y_i &\le z \sum_{x_i \in X} y_i \tag3\label3\\ \sum_{x_i \in X} (x_i - d) y_i &\ge -z \sum_{x_i \in X} y_i \tag4\label4 \end{align} The LHS of constraints \eqref{3} and \eqref{4} is linear. It remains to linearize the RHS. Introduce decision variable $w_i$ to represent the product $z y_i$. Assuming $0 \le z \le M$, the following linear constraints do the job: \begin{align} 0\le w_i &\le M y_i &&\text{for all $i$} \tag5\label{5} \\ -M (1-y_i) \le w_i - z &\le 0 &&\text{for all $i$} \tag6\label{6} \\ \end{align} Constraint \eqref{5} enforces $y_i = 0 \implies w_i = 0$. Constraint \eqref{6} enforces $y_i = 1 \implies w_i = z$. Now rewrite \eqref{3} and \eqref{4} as \begin{align} \sum_{x_i \in X} (x_i - d) y_i &\le \sum_{x_i \in X} w_i \tag7\label7\\ \sum_{x_i \in X} (x_i - d) y_i &\ge -\sum_{x_i \in X} w_i \tag8\label8 \end{align}

In summary, a linear formulation is to minimize $z$ subject to \eqref{5} through \eqref{8}.

Related Question