I wrote the post you linked to so I'll take a stab at this.
Short answer: In terms of $\phi$, the dual problem is to maximize $-\phi^*(0,z)$ with respect to $z$. But
\begin{align*}
-\phi^*(0,z) &= - \sup_{x,y} \, \langle 0,x \rangle + \langle z, y \rangle - \phi(x,y) \\
&= \inf_x \, \inf_y \, \phi(x,y) - \langle z, y \rangle.
\end{align*}
The "Lagrangian" is the function
\begin{equation*}
L(x,z) = \inf_y \, \phi(x,y) - \langle z, y \rangle.
\end{equation*}
The "dual function" $G(z) = -\phi^*(0,z)$ is obtained by minimizing $L(x,z)$ with respect to $x$, as we are accustomed to.
This expression for the Lagrangian appears on p. 54 of Ekeland and Temam.
That's what the Lagrangian looks like in the general setting. The significance of the Lagrangian is that for convex problems, under mild assumptions, $x$ and $z$ are primal and dual optimal if and only if $(x,z)$ is a saddle point of the Lagrangian. We sometimes think of convex optimization problems as coming in pairs (primal and dual problem), but actually they come in triples if we remember to include the saddle point problem. If we only knew about the primal problem and the dual problem, but not the saddle point problem, then we'd be missing $1/3$ of the picture.
Longer answer:
The function $\phi$ is the minimal notation we need to describe the perturbation viewpoint. The Lagrangian only appears later, when we write the dual problem more explicitly.
The perturbation viewpoint provides one possible explanation (my favorite explanation) of where the Lagrangian comes from, where the dual problem comes from, and why we expect strong duality to hold for convex problems. In the thread you linked to, note that the Lagrangian appears naturally at the very end. (The appearance of the Lagrangian should have been highlighted more explicitly.)
I'll repeat myself a bit here but I will add something new -- a derivation of the Lagrangian when we have equality constraints in addition to inequality constraints. I've been meaning to write this down anyway. I'll work in $\mathbb R^N$ rather than an abstract real vector space, and I'll use slightly different notation.
The primal problem is to minimize $\phi(x,0)$ with respect to $x$ (where $\phi:\mathbb R^n \times \mathbb R^m \to \mathbb R$). The perturbed problems are to minimize $\phi(x,y)$ with respect to $x$. We should also introduce the "value function" $v(y) = \inf_x \, \phi(x,y)$. ($v$ is called $h$ in the other thread.) So the primal problem is to evaluate $v(0)$. Now, using some highly intuitive facts about the convex conjugate, we note that $v(0) \geq v^{**}(0)$, and that when $\phi$ is convex we typically have equality. Thus, the dual problem is to evaluate $v^{**}(0)$. This is a very clear way of understanding what the dual problem is and why we care about it. And we can express this dual problem in terms of $\phi$: the dual problem is to maximize $-\phi^*(0,z)$ with respect to $z$. (Here $\phi^*$ is the convex conjugate of $\phi$. See other thread for details.)
We haven't yet mentioned the Lagrangian. But the Lagrangian will appear when we work these ideas out in detail for the optimization problem
\begin{align*}
\text{minimize} & \quad f_0(x) \\
\text{subject to} & \quad f(x) \leq 0,\\
& \quad h(x) = 0,
\end{align*}
where $f:\mathbb R^n \to \mathbb R^m$ and $h:\mathbb R^n \to \mathbb R^p$.
The inequality $f(x) \leq 0$ should be interpreted component-wise.
We can perturb this problem as follows:
\begin{align*}
\text{minimize} & \quad f_0(x) \\
\text{subject to} & \quad f(x) + y_1 \leq 0,\\
& \quad h(x) +y_2= 0.
\end{align*}
This perturbed problem can be expressed as minimizing $\phi(x,y_1,y_2)$ with respect to $x$, where
\begin{align*}
\phi(x,y_1,y_2) =
\begin{cases}
f_0(x) & \quad \text{if } f(x) + y_1 \leq 0 \text{ and } h(x) + y_2 = 0, \\
\infty & \quad \text{otherwise}.
\end{cases}
\end{align*}
To find the dual problem, we need to evaluate $-\phi^*(0,z_1,z_2)$, which is a relatively straightforward calculation.
\begin{align*}
-\phi^*(0,z_1,z_2) &= - \sup_{\substack{f(x) + y_1 \leq 0 \\ h(x) + y_2 = 0}} \langle 0,x\rangle + \langle z_1,y_1 \rangle + \langle z_2,y_2 \rangle - f_0(x) \\
&= -\sup_{q \geq 0} \, \langle z_1,-f(x) - q \rangle + \langle z_2,-h(x)\rangle - f_0(x) \\
&= \begin{cases}
\inf_x \, f_0(x) + \langle z_1,f(x) \rangle + \langle z_2, h(x) \rangle
& \quad \text{if } z_1 \geq 0 \\
-\infty & \quad \text{otherwise}.
\end{cases}
\end{align*}
In the final step, the Lagrangian
\begin{equation}
L(x,z_1,z_2) = f_0(x) + \langle z_1,f(x) \rangle + \langle z_2,h(x) \rangle
\end{equation}
has appeared, as has the usual description of the dual function (where you minimize the Lagrangian with respect to $x$). Up until this point, we didn't know what the Lagrangian was or why it would be relevant.
I'll assume that $H$ is symmetric positive definite (as stated in the question).
The Lagrangian is
\begin{align}
L(x,\lambda) &= \frac12 x^T Hx + g^T x + \lambda^T (A^Tx - b) \\
&= \frac12 x^T H x+ x^T (A \lambda + g) - \lambda^T b.
\end{align}
The dual function is
\begin{align}
g(\lambda) &= \inf_x \, L(x,\lambda) \\
&= -\frac12 (A \lambda + g)^T H^{-1} (A \lambda + g) - \lambda^T b.
\end{align}
The dual problem is to maximize $g(\lambda)$ (with no constraints on $\lambda$). Is this equivalent to the dual problem given in the question?
From the constraint
$Hx + g - A \lambda = 0$, it follows that $x = H^{-1}(A \lambda - g)$.
Plugging into the objective function, the stated dual problem reduces to
\begin{align}
\text{maximize} & \quad -\frac12 (A \lambda - g)^T H^{-1}(A \lambda - g) + b^T \lambda \\
\text{subject to} & \quad \lambda \geq 0.
\end{align}
This is similar to the dual problem I derived, except that we have $-g$ in place of $g$ and we have an extra constraint $\lambda \geq 0$.
It seems like the dual problem given in the question is wrong.
Best Answer
First, setting the stage: typically $f$ is assumed to be convex, $l_i$ is assumed to be convex, and $h_j$ is assumed to be affine. If you do not assume this, then (P2) will be hard to solve but will still exist nevertheless. Now to the specific points,
Yes, it does. $L(x, \alpha, \lambda)$ is convex in $x$ and concave in $(\alpha, \lambda)$. That $L(\cdot, \alpha, \lambda)$ is convex follows directly by the convexity of $f$, $h_i$, and $l_i$ for all $i$. Let's make this clear: fix $a \in \mathbb{R}^m$ and $b \in \mathbb{R}^p$ and define $r(x) = L(x, a, b)$, let $x \in \mathbb{R}^d$ and $t \in [0,1]$, then \begin{align} r(tx + (1-t)y) &= f(tx + (1-t)y) + \sum_{i=1}^{m} \alpha_i l_i (tx + (1-t)y) + \sum_{j=1}^{p} \lambda_j h_j (tx + (1-t)y) \end{align} By the convexity of $f$, we have $$f(tx + (1-t)y) \leq t f(x) + (1-t) f(y)$$ Similarly, by the convexity of $l_i$ we have $$ l_i (tx + (1-t)y) \leq t l_i (x) + (1-t) l_i (y)$$ and since $\alpha_i$ is assumed positive multiplying by it does not change this inequality. Finally, since $h_j$ is affine, then we have equality $$\lambda_j h_j (tx + (1-t)y) = t \lambda_j h_j (x) + (1-t) \lambda_j h_j (y)$$ Combining the last three equations we can easily see that $$ r(tx + (1-t)y) \leq t r(x) + (1-t) r(y). $$ Hence, $L(\cdot, a, b)$ is convex. Similarly, if you fix $x$, then $L(x, \xi)$ is affine in $\xi$, and is therefore both convex and concave (as all affine functions are).
The definition of the dual function $g(\alpha, \lambda)$ minimizes $L(x, \alpha, \lambda)$ over $x$, and this is convex minimization since as shown, $L$ is convex in $x$. Therefore, this minimization is well-defined.
Optimizing (P1) is a different problem from the problems (P2) and (P3). These problems are generally not equivalent, but under certain conditions (such as Slater's condition) they are. In which case you can either try to solve the primal problem directly (optimize (P1) by using some form of projected gradient descent, for example) or solve the dual problem by trying to solve (P3) and in the process solve (P2), possibly multiple times. The dual optimization problem is this $$ \max_{\alpha, \lambda} \min_{x} L(x, \alpha, \lambda). $$ If you assume access to $g(\alpha, \lambda) = \min_{x} L(x, \alpha, \lambda)$ then this is a concave maximization problem, since $g(\alpha, \lambda)$ is concave as it is a minimum of affine functions regardless of the convexity of $L(\cdot, \alpha, \lambda)$. Of course, if $L(\cdot, \alpha, \lambda)$ is not convex then it will be hard to obtain the value of $g(\alpha, \beta)$ for any $\alpha, \beta$, and the dual problem may be hard.
No.
I assume you meant $h_i$ linear and $l_j$ nonlinear but convex, because $h_i$ must be linear in this formulation. In which case, you will try to solve (P3) which will require solving (P2) possibly many times for different values of $(\alpha, \beta)$. Since $f$ is non-convex, Solving (P2) cannot generally be done and you must exploit the problem structure in some way. If you can find a closed form solution for P2 in terms of $(\alpha, \beta)$ then you would use it to solve P3.
I hope this answers your questions.