When you use Einstein Summation Convention you sum over repeated indices on each of the terms. And in truth we really say that the index to be summed over must appear once upstairs and once downstairs (I'll tell you the reason in a moment).
If we insist to force summing over repeated indices even though they're all downstairs like yours, the second way to think about it is correct: you must think of each term independent of the other. For example, if I give you the relation $A_i B_i + A_j C_j B_i D_i$ this should mean the summation in each term itself: $\sum_i A_i B_i + \sum_j \sum _i A_iC_jB_iD_i$.
Now, when I've learned this notation I found pretty confusing, so even though the question has already been addressed I'll just give a quick overview of it below.
In my opinion this notation becomes really good just when we sum over indices upstairs and downstairs only. The point is: given a set of vectors (and I say vectors, and not their components) we can index the vectors. The convention is:
Notation: If we are given a set of $n$ vectors to index, we index the vectors downstairs, i.e.: we write the set as $\{v_1, \dots, v_n\}$
Now, the second convention is that when we index the coeficients of a linear combination they sould be written upstairs:
Notation: If we are given a set of $n$ scalars and $n$ vectors and we are to form the linear combination of the vectors with the scalars, we write each scalar with index upstairs, so the set of scalars will be $\{a^1, \dots, a^n\}$.
Now the convention is:
Einstein Summation Convention: if in a summation an index appears once upstairs and once downstairs, it must be summed over without needing to explicit the sum.
In this case, look that the linear combination of vectors $v_i$ with coeficients $a^i$ will be simply denoted $a^iv_i$ and the summation is being understood. Now there's just a another thing:
Notation: If we are given a set of $n$ linear functionals, i.e.: linear functions defined on a vector space with values in the scalar field, then the functionals are indexed with upstairs indexes and the set becomes $\{\omega^1, \dots, \omega^n\}.$
And the convention to write the scalars when linearly combining functionals is just the opposite again:
Notation: If we are given a set of $n$ scalars and $n$ linear functionals and we are to form the linear combination of the linear functionals with the scalars, we write each scalar with index downstairs, so the set of scalars will be $\{a_1, \dots, a_n\}$.
The consequence is that the combination of linear functionals will be $a^i\omega_i$. Now the convention extends to tensor directly from this: covariant tensors will have indexes downstairs and contravariant tensors will have indexes upstairs. And writing linear combination and so on becomes natural. Just to finish:
Notation: When giving coordinates to a point or indexing components of a function, we write the indexes upstairs. So if $a \in \mathbb{R}^n$ is a point we write $a = (a^1, \dots, a^n)$ and if $f : \mathbb{R}^n \to \mathbb{R}^m$ we write it's components as $f^i$. These notations extends to manifolds in general.
If you understand all of this and can get along without being confused go ahead and use the notation, it makes life easier (specially in differential geometry).
Calculate $\Gamma^\nu_{\mu\nu}=\frac{1}{2}g^{\nu\kappa}(\partial_\mu g_{\nu\kappa}+\partial_\nu g_{\mu\kappa}-\partial_\kappa g_{\mu\nu})=\frac{1}{2}g^{\nu\kappa}\partial_\mu g_{\nu\kappa}$. Using the well known formula $$ \det A^{-1}\frac{d}{dt}\det A=\text{Tr}\left(A^{-1}\frac{d A}{dt}\right), $$ we obtain $$ \Gamma_\mu\equiv\Gamma^\nu_{\mu\nu}=\frac{1}{2}\frac{1}{g}\partial_\mu g=\frac{1}{2}\partial_\mu\ln|g|=\partial_\mu\ln \sqrt{|g|}, $$ where $g$ denotes the determinant of the matrix with elements $g_{\mu\nu}$.
The expression $\rho=\sqrt{|g|}$ transforms under a change of chart as follows: We have $g_{\mu^\prime \nu^\prime}=\partial_{\mu^\prime}x^\mu\partial_{\nu^\prime}x^\nu g_{\mu\nu}=J^\mu_{\mu^\prime}J^\nu_{\nu^\prime}g_{\mu\nu}$, taking determinants gives $$ g^\prime=\det J^2g \\ \sqrt{|g^\prime|}=|\det J|\sqrt{|g|}. $$ This object is called a "scalar density of weight 1". It makes sense in a coordinate-free manner too as a section of the density bundle, but whatever. One can show that the components of the covariant derivative of such an object is $$\nabla_\mu\rho=\partial_\mu\rho-\Gamma_\mu\rho=\partial_\mu\rho-\partial_\mu\ln\sqrt{|g|}\rho.$$ Inserting $\rho=\sqrt{|g|}$ gives $$ \nabla_\mu\sqrt{|g|}=\partial_\mu\sqrt{|g|}-\frac{1}{\sqrt{|g|}}\partial_\mu\sqrt{|g|}\sqrt{|g|}=\partial_\mu\sqrt{|g|}-\partial_\mu\sqrt{|g|}=0. $$
Best Answer
Basically $\partial^\beta$ are the components of the dual vector to $\partial_\beta$. In Minkowski space the sum $\partial_\beta g\partial^\beta g$ is indeed $-g_t^2 + |\nabla g|^2$.
And because $\alpha$ and $\beta$ are dummy variables the two sums $\partial_\beta g\partial^\beta g$ and $\partial^\alpha g\partial_\alpha g$ are the same, the product of functions is commutative.
When we talk about the dual of $\partial_\beta$ we mean as follows:
Let's say that we are given a smooth manifold of dimension n, if we look at the tangent space of a point p in the manifold we can prove that it is isomorphic to the derivation space which is spanned by $\partial_\beta$ therefore we can treat the tangent space as the vector space spanned by $\partial_\beta$ which has dimension n.
For this vector space we can define the dual space as in linear algebra which is spanned by $\partial^\beta$ which satisfy $\partial^\alpha(\partial_\beta) = \delta^\alpha{}_\beta$.
If you have a metric g (or a pseudo metric as in general relativity), which means the tangent space is an inner product space, then the components of dual vectors are related by the formula: $$v_\alpha=g_{\alpha\beta} v^\beta, v^\alpha=g^{\alpha\beta} v_\beta$$ where subscript components are for the dual vector and superscript for the vector in the tangent space.