My data is categorized by two different parameters (say F having n groups and S having m groups) and I want to get a relationship between the two. For example $F = ${$f_1 , f_2 , f_3$} = {$ 10,10,5$} and $S = ${$s_1, s_2 , s_3, s_4$} = {$8,8,8,4$}. (Read $f_1$ has 10 elements in it, $s_1$ has 8 elements in it.)
The problem is to get a relationship like $f_1 = p_{11}.s_1 + p_{12}.s_2 + p_{13}.s_3$ for all groups in F. The exact relationship is not possible (due to constraint 1 below), so we have to find the most approximate solution. The constraints are:
-
Sum of all the numbers in F (10+10+5) < sum of all the numbers in S
(8+8+8+4). Note that, this is a default property of data and we don't have to apply this. -
Total contribution of any $s_i$ can not be more than 100%. That is if $s_1$ contribute 40% in all 3 equations its total contribution will be 120% which should not happen.
-
Contribution can not be negative.
I framed this as an optimization problem as
Given:
$F = (f_1, f_2, … f_n)$
$S = (s_1, s_2, … s_m)$
Each $f_i$ can be represented as: $f_i = p_{i1} s_1 + p_{i2}s_2 + … + p_{im}s_m + \epsilon_i$
That is: $P = ${$p_{ij}$} is a $n * m$ matrix of row vectors $P_1, P_2, …, P_n$ then $f_i = P_i.S+ \epsilon_i$
where . is the dot product (element by element multiplication).
I have to minimize $\sum_{i} \epsilon_i^2$, with following constraints:
-
$p_{11} + p_{12}+ … + p_{1m}<=1$ That is some of each row of
matrix P is less than 1. -
$p_{ij} >=0 $ for all i,j
I am trying to use constrOptim
in R to solve this problem, however I am getting stuck with following:
-
Framing the constraints
-
How to create a code in which I just pass F and S and it gives me the matrix P and error vector as output.
Below is my code:
F = c(10,10,5)
S = c(8,8,8,4)
n = length(F)
m = length(S)
P_init = matrix(rep(0,m*n),nrow=n, ncol=m)
loss_fun <- function(P){
T = S*P #proportion matrix * S
F2 = rowSums(T) # Predicted values of F
E = F - F2 # Error
return(sum(E*E))
}
x = loss_fun(P_init)
z = constrOptim(P_init,loss_fun,NULL,ui=c(rep(-1,n)), ci=rep(-1,m))
Since the constraints are defined by: ui %*% theta - ci >= 0
, I believe in my case ui = {-1,-1,-1}
, theta = P_init
( a 3 x 4 matrix) and ci = {-1,-1,-1,-1}
. However I get the following if I run the code. Error in ui %\*% theta : non-conformable arguments
. Is theta = P_init is not true? Or there is some other error. As this is just one approach, I can also explore other approaches like optim
or any other function.
Best Answer
Using answer given by @crayfish and a detailed answer on how to put construct constraints here, I was able to come up with a solution.
Since creation of constraints may not be very easy to understand from the code, here is a visualization of constraints when n = 3 and m = 4