Solved – How to experiment with Lagrange multiplier in PCA optimization

machine learningoptimizationpcar

Suppose we want to solve following optimization problem (it is a PCA problem in this post)

$$
\underset{\mathbf w}{\text{maximize}}~~ \mathbf w^\top \mathbf{Cw} \\
\text{s.t.}~~~~~~ \mathbf w^\top \mathbf w=1
$$

As mentioned in the linked post, using the Lagrange multiplier, we can change the problem into

$$
{\text{minimize}} ~~ \mathbf w^\top \mathbf{Cw}-\lambda(\mathbf w^\top \mathbf w-1))
$$
Differentiating, we obtain $\mathbf{Cw}-\lambda\mathbf w=0$, which is the eigenvector equation. Problem solved and $\lambda$ is the largest eigenvalue.

I am trying to do a numerical example here to understand more about how Lagrange multiplier changed the problem, but not sure my validation process is correct.

I experimented with the iris data's co-variance matrix (see code). The figure shows geometric solution to the problem, where the black curve are the contours of the objective function, and green curve is the constraints. The red curve shows the optimal solution that can maximize the objective and satisfy constraints.

In my code, I am trying to use optimx to minimize a unconstrained objective function. I am replacing the $\lambda$ with the solution from eigen decomposition.

X=iris[,c(1,3)]
X$Sepal.Length=X$Sepal.Length-mean(X$Sepal.Length)
X$Petal.Length=X$Petal.Length-mean(X$Petal.Length)
C=cov(X)
r=eigen(C)

obj_fun<-function(x){
  w=as.matrix(c(x[1],x[2]),ncol=1)
  lambda=r$values[1]
  v=t(w) %*% C %*% w + lambda *(t(w) %*% w -1)
  return(as.numeric(v))
}

gr<-function(w) {
  lambda=r$values[1]
  v=2* C %*% w + 2*lambda* w
  return(v)
}

res=optimx::optimx(c(1,2), obj_fun,gr, method="BFGS")

I am getting following results, where the objective function is negative of the optimal value to the graphical solution. And two parameters p1 and p2 are 0.

My question is that is such validation method right? i.e., can we replace $\lambda$ with largest eigen value and minimize the objective function $\mathbf w^\top \mathbf{Cw}-\lambda(\mathbf w^\top \mathbf w-1))$ to get a solution?

Best Answer

I think I got the answer by myself but wish some experts can confirm.

The confusion is that, in CVX book we are converting one optimization problem with constraints to another optimization problem without constraints and solve the dual problem. But in PCA optimization we cannot.

For example, page 227, we convert

$$ \underset{x}{\text{minimize}}~~ x^\top x \\ \text{s.t.}~~~~~~ Ax=b $$

into maximize the dual function $g(v)=-(1/4)v^\top A A^\top v -b^\top v$, which is

$$ \underset{x}{\text{maximize}}~~\left(-(1/4)v^\top A A^\top v -b^\top v \right)\\ $$

In PCA optimization problem, problem has Lagrangian (for equality constraint we can use $-\lambda$)

$$ \mathcal{L}(\mathbf w,\lambda)=\mathbf w^\top \mathbf{Cw}-\lambda(\mathbf w^\top \mathbf w-1) $$

For fixed $\lambda$, we get partial derivative and set to $\mathbf 0$.

$$ \frac{\partial \mathcal{L}}{\partial \mathbf w}=\mathbf 0=2\mathbf {Cw}-2\lambda\mathbf w $$

which is the eigenvector equation

$$ \mathbf {Cw}=\lambda\mathbf w $$

As pointed out by Matthew Gunn in the comment, PCA problem the objective is not convex see this discussion. Therefore we should not try to minimize dual function to solve the original problem.

Related Solutions

Solved – Is the Lagrange function objective plus lambda times constraints or objective minus lambda times constraints

If you have an optimization problem of the form:

$$ min_{x} f(x) \\ s.t. \\ g(x) = 0 \\ h(x) \leq 0 \\ $$

Then the Lagrangian is

$$L(x,\lambda,\nu) = f(x) + \lambda g(x) + \nu h(x) $$

with $\lambda \in \mathbb{R}$ and $\nu \geq 0$.

The original optimization problem is equivalent then to $$ \min_{x} \max_{\lambda \in \mathbb{R}, \nu \geq 0} L(x,\lambda,\nu)$$.

Why? Because for any point $x_0$ with $g(x_0) \neq 0$, then the inner maximization will cause the term $\lambda g(x_0)$ to "blow up." The same is true for any $x_0$ with $h(x_0) \geq 0$.

So, long story short, for exact equality constraints, the langrange multiplier is a real number, so it doesn't matter if you do plus or minus.

For inequality constraints of the form $h(x) \leq 0$, do plus.

See Boyd's notes on duality for more background: http://www.stanford.edu/class/ee364a/lectures/duality.pdf

Solved – How to work with binary contraints in linear optimization

I think lpSolve package can solve your question. lp(direction, objective.in, const.mat, const.dir, const.rhs) optimizes objective.in %*% param under const.mat %*% param. Your objective function is equivalent to c(dt %*% wt) %*% Par, so c(dt %*% wt) is the objective.in (I named it f.obj).

(1) lp(..., all.bin=T) can solve it.

(2) It is equivalent to cbind(diag(10), diag(10), diag(10)) %*% Par = matrix(1, nrow=10)

(3) It equialent to diag(3)[,rep(c(1,2,3), each=10)] %*% Par <= matrix(c(5,6,4), ncol=1)

(4) In your example, X+Y+Z is clearly > 0, so I supposed that X+Y+Z > 0. The former constraint X/(X+Y+Z) < 0.35 can be rearranged to -0.65*X + 0.35*Y + 0.35*Z > 0. So, it is equivalent to c(-0.65 * f.obj[1:10], 0.35 * f.obj[11:20], 0.35 * f.obj[21:30]) %*% Par > 0. The latter can be processed in the same way.

To wrap up, below f.con can expresses all constraints.

library(lpSolve)

set.seed(1); dt <- matrix(runif(30),10, 3)
wt <- matrix(c(1,0,0,0,2,0,0,0,1),3,3) #weights

f.obj <- c(dt%*%wt)
f.con <- rbind(cbind(diag(10), diag(10), diag(10)),         # constraint (2)
               diag(3)[,rep(c(1,2,3), each=10)],            # constraint (3)
               f.obj * rep(c(-0.65, 0.35, 0.35), each=10),  # constraint (4-1)
               f.obj * rep(c(0.4, -0.6, 0.4), each=10))     # constraint (4-2)
f.dir <- c(rep("=", 10), rep("<=", 3), ">", ">")
f.rhs <- c(rep(1, 10), 5, 6, 4, 0, 0)

res <- lp("max", f.obj, f.con, f.dir, f.rhs, all.bin=T)
Par <- res$solution

P <- matrix(Par, nrow=10)
P
#      [,1] [,2] [,3]
# [1,]    0    0    1
# [2,]    1    0    0
# [3,]    0    0    1
# [4,]    1    0    0
# [5,]    0    1    0
# [6,]    1    0    0
# [7,]    1    0    0
# [8,]    0    1    0
# [9,]    0    0    1
#[10,]    0    0    1

check the assumptions

X = t((dt%*%wt)[,1])%*%P[,1]
Y = t((dt%*%wt)[,2])%*%P[,2]
Z = t((dt%*%wt)[,3])%*%P[,3]

rowSums(P)   # [1] 1 1 1 1 1 1 1 1 1 1
colSums(P)   # [1] 4 2 4
X/(X+Y+Z)    # [1,] 0.3307523
Y/(X+Y+Z)    # [1,] 0.3731207   # no problem

Best Answer

Related Solutions

Solved – Is the Lagrange function objective plus lambda times constraints or objective minus lambda times constraints

Solved – How to work with binary contraints in linear optimization

Related Question