You already know a lot. Two observations.
Take linear regression. Minimizing the squared error turns out to be equivalent to maximizing the likelihood. Loosely one could say that minimizing the squared error is an intuitive method, and maximizing the likelihood a more formal approach that allows for proofs using properties of for example the normal distribution. The outcomes can overlap.
Second minimizing or maximizing is often AFAIK arbitrary. Minimizing the negative is the same as maximizing the positive. There are a lot of routines that are written in the minimization mode: this is sort of coincidence. For reasons of parsimony/readability this has become standard.
I think I got the answer by myself but wish some experts can confirm.
The confusion is that, in CVX book we are converting one optimization problem with constraints to another optimization problem without constraints and solve the dual problem. But in PCA optimization we cannot.
For example, page 227, we convert
$$
\underset{x}{\text{minimize}}~~ x^\top x \\
\text{s.t.}~~~~~~ Ax=b
$$
into maximize the dual function $g(v)=-(1/4)v^\top A A^\top v -b^\top v$, which is
$$
\underset{x}{\text{maximize}}~~\left(-(1/4)v^\top A A^\top v -b^\top v \right)\\
$$
In PCA optimization problem, problem has Lagrangian (for equality constraint we can use $-\lambda$)
$$
\mathcal{L}(\mathbf w,\lambda)=\mathbf w^\top \mathbf{Cw}-\lambda(\mathbf w^\top \mathbf w-1)
$$
For fixed $\lambda$, we get partial derivative and set to $\mathbf 0$.
$$
\frac{\partial \mathcal{L}}{\partial \mathbf w}=\mathbf 0=2\mathbf {Cw}-2\lambda\mathbf w
$$
which is the eigenvector equation
$$
\mathbf {Cw}=\lambda\mathbf w
$$
As pointed out by Matthew Gunn in the comment, PCA problem the objective is not convex see this discussion. Therefore we should not try to minimize dual function to solve the original problem.
Best Answer
Consider a function $f$ which for the purpose of this discussion, I am going to take to be positive. The aim with optimization is to find the values of the argument(s) that make the function achieve its 'best' value (if you're maximizing, to find the argmax).
Here's an example showing an $f$ we want to maximize, along with $\log f$, $-f$ and $1/f$. Now, any time you move an $x$ with some value of $f$ to another $x$ with a higher value of $f$, any strictly monotonic increasing transformation of $f$ (e.g. $\log$) will also increase, and any monotonic decreasing transformation of $f$ will decrease (e.g. $-f$, $1/f$). That is, if $f(x_2)>f(x_1)$, then $\log f(x_2)> \log f(x_1)$, and $-f(x_2)< -f(x_1)$, and so on (as long as they're all finite).
As long as all the transformed values must also be finite, if $f$ achieves some optimum at $x^*$, any monotonic increasing function will, too. So $f$ and $\log f$ will have maxima at the same $x$-value, and $-f$ and $1/f$ with have minima there:
The value of $x$ that causes $f$ to be at its maximum is marked in ($x=5$), and the corresponding position is marked for $\log f$, $-f$ and $1/f$; the same value of $x$ ($x=5$) is the argmax or argmin as appropriate.
(In this example, $f$ happened to be a logistic density, with mean $5$.)