Suppose your underdetermined system looks like this:
$$Ax=y$$
The least squares solution can be determined using the Moore-Penrose pseudoinverse:
$$x=A^T(AA^T)^{-1}y$$
where it is assumed that the inverse of $AA^T$ exists. Royi's answer discusses the case when $AA^T$ is singular.
In any case, you do not need an initial guess. The solution you'll get is the solution with the smallest norm of all possible solutions.
Let
$$
f_1(x) = b\cdot b ~,~~ f_2(x) = (A^T b)\cdot x ~,~~ f_3(x) = x\cdot(A^T A x)
$$
We can write these in terms of components (with summation over repeated indices implied) as
$$
f_1(x) = b_i b_i ~,~~ f_2(x) = A_{ij} b_i x_j ~,~~ f_3(x) = A_{ij} A_{ik} x_j x_k \,.
$$
Gradient
The gradient is defined as
$$
[\nabla f(x)]_p = \frac{\partial f}{\partial x_p}
$$
Therefore
$$
\begin{align}
[\nabla f_1(x)]_p = \frac{\partial f_1}{\partial x_p} &= 0 \\
[\nabla f_2(x)]_p = \frac{\partial f_2}{\partial x_p} &= A_{ij} b_i \frac{\partial x_j}{\partial x_p} = A_{ij} b_i \delta_{jp} = A_{ip} b_i \\
[\nabla f_3(x)]_p = \frac{\partial f_2}{\partial x_p} &= A_{ij} A_{ik} \left[\frac{\partial x_j}{\partial x_p} x_k + x_j \frac{\partial x_k}{\partial x_p}\right]= A_{ij} A_{ik} \left[ \delta_{jp} x_k + x_j \delta_{kp}\right] = A_{ip} A_{ik} x_k + A_{ij} A_{ip} x_j \\
&= 2A_{ip} A_{ij} x_j
\end{align}
$$
Going back to matrix notation,
$$
\nabla f_1(x) = 0 ~,~~ \nabla f_2(x) = A^T b ~,~~ \nabla f_3(x) = 2 A^TAx \,.
$$
Therefore
$$
\nabla f(x) = -2 A^T b + 2 A^T A x \,.
$$
Hessian
The Hessian is defined as
$$
[\nabla\nabla f(x)]_{pq} = \frac{\partial^2 f}{\partial x_p \partial x_q} = \frac{\partial}{\partial x_q}\left(\frac{\partial f}{\partial x_p}\right)
$$
Repeating the process that we used for the gradient,
$$
\begin{align}
[\nabla\nabla f_2(x)]_{pq} &= \frac{\partial}{\partial x_q} (A_{ip} b_i) = 0 \\
[\nabla\nabla f_3(x)]_{pq} &= \frac{\partial}{\partial x_q} (
2A_{ip} A_{ij} x_j) = 2A_{ip} A_{ij} \delta_{jq} = 2A_{ip} A_{iq}
\end{align}
$$
Therefore
$$
\nabla\nabla f(x) = 2 A^T A \,.
$$
Positive definite
You have to show that
$$
t = z \cdot ( A^T A z) > 0
$$
Once again, going to component notation,
$$
t = z_p A_{ip} A_{iq} z_q = (Az)\cdot(Az) = \lVert Az \rVert^2
$$
which is positive unless $Az = 0$. You now have to exclude the case where $Az = 0$, which you have done by invoking linear independence. Therefore, the Hessian is positive definitive.
Best Answer
The squared magnitude of the vector $y-\Phi\theta$ is
$$\lVert y - \Phi \theta \rVert^2=(y-\Phi\theta)^T(y-\Phi\theta)=(y^T-\theta^T\Phi^T)(y-\Phi\theta)=y^Ty-y^T\Phi\theta-\theta^T\Phi^Ty+\theta^T\Phi^T\Phi\theta$$
But $y^T\Phi\theta$ is a scalar, so $y^T\Phi\theta=(y^T\Phi\theta)^T=\theta^T\Phi^Ty$. So the righthand side is
$$y^Ty-2\theta^T\Phi^Ty+\theta^T\Phi^T\Phi\theta$$
Differentiating with respect to $\theta$ now gives $-2\Phi^Ty+2\Phi^T\Phi\theta$, and scaling by $1/2$ gives the result.