J.M. is correct that the difficulty is that C does not necessarily have full column rank, causing your augmented system to be singular. The standard approach in this case would be to regularize your problem, i.e., replace the constraint $C^T q + c_0 = 0$ with $C^T q + \delta r + c_0 = 0$ for some small value of $\delta > 0$ (e.g., between $10^{-6}$ and $10^{-8}$) and a new variable $r$ representing the residual of your constraints. Your constraint matrix is now
$$
\begin{bmatrix}
C^T & \delta I
\end{bmatrix}
$$
which always has full row rank. At the same time, you'll want to add a term $\delta^2 \|r\|^2$ to your objective so as to minimize this residual.
If everything is nonsingular, you have several options to solve the problem. One is a symmetric indefinite factorization of the augmented system (the one you call the "normal equations"). This can be done very efficiently and in parallel, if the system is so large. In Matlab, look into the lbl function. In Fortran, look into the HSL library: http://www.hsl.rl.ac.uk/academic.html (the relevant packages are MA27, MA47, MA57 and MA86).
Regarding iterative methods, aside from MINRES, you can also try SYMMLQ if the system is nonsingular. See Mike Saunders' website for links to those codes: http://www.stanford.edu/group/SOL/software.html
There is an alternative if you have a means to identify efficiently a particular solution $\bar{q}_0$ of $C^T q + c_0 = 0$ (by factorization or using some special structure of the problem). You can "shift" your linear system by defining $\bar{q} := q - \bar{q}_0$:
$$
\begin{bmatrix}
M & C \\
C^T & 0
\end{bmatrix}
\begin{bmatrix}
\bar{q} \\ \lambda
\end{bmatrix}
=
\begin{bmatrix}
M(q_0 - \bar{q}_0) \\ 0
\end{bmatrix}.
$$
These are the optimality conditions of the (unconstrained) linear least-squares problem
$$
\min_{\lambda} \ \tfrac{1}{2} \| C \lambda - M(q_0 - \bar{q}_0) \|_{M^{-1}}^2.
$$
NOW you can use the conjugate gradients, or better yet, LSQR (still from Michael Saunders' page). LSQR is a variant of CG which will be more accurate if $C$ is ill conditioned. Once you have found $\lambda$, you can recover $\bar{q} = q_0 - \bar{q}_0 - M^{-1} C \lambda$ and your solution is $q = \bar{q} + \bar{q}_0$.
For more general $M$ (non diagonal), there is a nice hybrid alternative. It consists in interpreting the shifted linear systems as the optimality conditions of the quadratic program
$$
\min_{\bar{q}} \ -(q_0 - \bar{q}_0)^T M \bar{q} + \tfrac{1}{2} \bar{q}^T M \bar{q} \quad
\text{s.t.} \ C^T \bar{q} = 0.
$$
The method is a hybrid between a factorization and CG. It needs to factorize the matrix
$$
\begin{bmatrix}
I & C \\
C^T & 0
\end{bmatrix}
$$
and applies CG to $M$. At each iteration, the search direction is projected into the nullspace of $C^T$, ensuring that all iterates satisfy $C^T \bar{q} = 0$. The main reference is http://epubs.siam.org/sisc/resource/1/sjoce3/v23/i4/p1376_s1
Clearly, for diagonal $M$, there's no advantage versus a direct factorization of your original system. But in many applications, replacing $M$ by $I$ leads to a very sparse matrix. Also $M$ need not be positive definite any more but only positive definite on the nullspace of $C^T$.
Your problem can be formulated as convex quadratic programming problem with a linear equality constraint and nonnegativity constraints on $x$.
$\min \frac{1}{2}\left( x^{T}(A^{T}A)x - 2(b^{T}A)x + b^{T}b \right)$
subject to
$\sum_{i=1}^{n} x_{i}=1$
$x \geq 0$.
As a practical matter, you're probably best off using a well-written library routine for linearly constrained quadratic programming. There are many of them available in a variety of programming languages. If you need to implement this yourself, then implementing an active set QP solver is relatively straight forward.
Best Answer
Since you need help on the "$\Leftarrow$" part, this is how it can be done.
Suppose that there exists $z\in\mathbb R^{p}$ satisfying \begin{align} A^TAx^* + C^Tz &= A^Tb\tag{1}\label{1}\\ Cx^* &= d\tag{2}\label{2}. \end{align}
We need to show that $x^*$ solves $(\star)$, or equivalently \begin{equation} \|Ax-b\|_2^2 \ge \|Ax^*-b\|_2^2 \quad\mbox{for all } x \mbox{ such that } Cx=d. \end{equation} We know the equality will occur when $x=x^*$, so it is natural to decompose the LHS as \begin{align} \|Ax-b\|_2^2 &= \|Ax-Ax^* + Ax^* - b\|_2^2 \\ &= \|A(x-x^*)\|_2^2 + \|Ax^* - b\|_2^2 + 2(x-x^*)^TA^T(Ax^* - b). \end{align} The last term vanishes: \begin{align} (x-x^*)^TA^T(Ax^* - b) &= (x-x^*)^T(A^TAx^* - A^Tb)\\ &=-(x-x^*)^TC^Tz\\ &=-z^T(Cx-Cx^*)\\ &=0. \end{align} The conclusion follows.
By the way, you should view the original linear constraint as $d=Cx$ to have the Lagrangian $L(x,z) = f(x) + z^T(d-Cx)$, which helps avoid the change of variables $z=-\lambda$.