Find a float matrix with given row and column sums and predetermined arbitrary values

linear algebramatricessystems of equations

Sorry for the naivety and poor exposition of the question but I've been banging my head against this one for way too long and, as you'll be able to tell, math is not my forte.

My problem is similar to this one, with some additional information.

The simple version of it can be stated like so: given an empty matrix with known row and column equal sums, find a matrix that satisfies the aforementioned sum constraints. Since there is no constraints for the numbers to be integers, a possible solution can be found using 𝑟𝑖𝑐𝑗/𝑇 in position (𝑖,𝑗), where 𝑟𝑖 is the sum for row 𝑖, 𝑐𝑗 the sum for column 𝑗 and 𝑇 the total. For instance:

\begin{array}{cc|c}
a & b & 2 \\
c & d & 4 \\
\hline
5 & 1 & 6 \\
\end{array}

gives us:

\begin{array}{cc|c}
1.667 & 0.333 & 2 \\
3.333 & 0.667 & 4 \\
\hline
5 & 1 & 6 \\
\end{array}

So far, so good. Now my actual problem is more complex as arbitrary values are set to 0 in the starting matrix, for instance:

\begin{array}{ccc|c}
a & 0 & b & 4 \\
0 & c & d & 8 \\
\hline
4 & 2 & 6 & 12\\
\end{array}

This is where I'm stuck. I can get the columns to add up but then the rows stop adding up, or vice-versa, doing e.g. something like: $r_ic_j/(T-x)$ where $x$ is the column sum corresponding to the 0 value in the row. Note that I would ideally keep with that kind of proportional approach as it fits the real-world scenario behind it best. I've looked into solving this using linear programming but to no avail.

Let me know if anything needs clarifying.

Best Answer

$ \def\o{{\tt1}} $The standard iterative proportional fitting (IPF) algorithm in Julia/Matlab looks like

function ipf(A,x,y)
   u,v = ones(size(x)...), ones(size(y)...)
   for k = 1:2000
       B = A .* (x*v') ./ ((A*v)*v')
       A = B .* (u*y') ./ (u*(u'*B))
       if norm(A-B) < 1e-6; break; end
   end
   return A
end

A,x,y = [1 0 1; 0 1 1], [4;8],  [4;2;6];

ipf(A,x,y)
 4.0  0.0  0.00133422
 0.0  2.0  5.99867

where the $(x,y)$ vectors contain the marginal totals,
$(u,v)$ are all-ones vectors the same size as $(x,y)$, and
$A$ is an initial (seed) matrix.

In matrix notation, the algorithm is $$\eqalign{ x &= \pmatrix{ 4 \\ 8},\quad y=\pmatrix{ 4 \\ 2 \\ 6},\qquad u=\pmatrix{\o \\ \o},\quad v=\pmatrix{\o \\ \o \\ \o} \\ A_0 &= \pmatrix{\o&0&\o \\ 0&\o&\o} \\ B_{k+\o} &= \left(\frac{A_k\odot xv^T}{A_k\cdot vv^T}\right) \\ A_{k+\o} &= \left(\frac{uy^T\odot B_{k+\o}}{uu^T\cdot B_{k+\o}}\right) \\ }$$ The symbol $\odot$ denotes a Hadamard product, so any zeros in the seed matrix are propagated through each iteration and into the solution.