Optimise allocation to minimise variance

linear algebralinear programmingnumerical optimizationoptimizationproblem solving

Background

I am trying to allocate customers $C_i$ to financial advisers $P_j$. Each customer has a policy value $x_i$. I'm assuming that the number of customers ($n$) allocated to each adviser is the same, and that the same customer cannot be assigned to multiple advisers. Therefore each partner will have an allocation of policy values like so:

$P_1 = [x_1,x_2,x_3]$,
$P_2 = [x_4,x_5,x_6]$,
$P_3 = [x_7,x_8,x_9]$

etc.

The Problem

After allocating customers, the average policy value is calculated for each adviser. I want to allocate customers to advisers in a way that minimises the variance in average policy values between each of the advisers.

What I have tried

My current algorithm randomly samples $n$ customers without replacement from the dataset and assigns each sample to an adviser. Once the allocations are made, the average policy value for each adviser is calculated and I compute the variance between each of the advisers. I repeat this over a defined number of iterations and return the allocation with the lowest variance. As expected, when dealing with larger volumes of advisers and customers this is not the most efficient process.

My Question

  • Is there an algorithm that converges towards the optimal allocation rather than randomly allocating on every iteration?
  • How can I frame this problem as an objective function to minimise the variance?
  • Any links to implementations (preferably Python) would be appreciated if possible

Best Answer

Here's a formulation via mixed integer programming (MIP). Define the following decision variables:

  • $y_{ij} = 1$ if we allocate customer $C_i$ to adviser $P_j$, $0$ otherwise
  • $z^\max = $ the maximum policy value among all advisers
  • $z^\min = $ the minimum policy value among all advisers

Then the formulation is: $$\begin{align} \text{minimize} \quad & z^\max - z^\min \\ \text{subject to} \quad & \sum_{i\in C} x_iy_{ij} \le z^\max \quad \forall j\in P \\ & \sum_{i\in C} x_iy_{ij} \ge z^\min \quad \forall j\in P \\ & \sum_{j\in P} y_{ij} = 1 \quad \forall i \in C \\ & \sum_{i\in C} y_{ij} = n \quad \forall j\in P \\ & y_{ij} \in \{0,1\} \quad \forall i \in C, j\in P \end{align}$$ The objective function calculates the spread between the min and max policy value. The first constraint says that $z^\max$ has to be greater than or equal to each policy value; since the objective function encourages smaller values of $z^\max$, this means that $z^\max$ will equal the largest policy value. Similarly, the second constraint sets $z^\min$ equal to the smallest policy value. The third constraint says that each customer must be assigned to exactly one adviser. The fourth says that each adviser must have $n$ customers assigned to him/her. The last constraint says the $y$ variables must be binary.

If you're using Python, then I would implement this using the package PuLP, or one of the solver-specific packages such as gurobipy or docplex.

Related Question