Your answer is not going to be the same as the answer in the paper, because you are calculating the canonical stress-energy tensor, which is conserved, but which is not symmetric, and which has a complicated relation to the angular momentum tensor. The issue is that there two different conserved quantities, the stress energy tensor, and the angular momentum tensor, and the information in the two partially overlaps for a theory with both rotational/Lorentz and translation invariance.
The standard easy fix is to calculate the stress tensor by differentiating with respect to $g_{\mu\nu}$. This is how you automatically get a symmetric stress tensor with the right properties that it makes the angular momentum tensor by just multiplying by x factors
$$ L_{\mu\nu\alpha} = x_{\alpha} T_{\mu\nu} - x_{\mu}T_{\alpha\nu} $$
In some convention, and assuming you are using a symmetric stress tensor. The derivative of the action with respect to $g_{\mu\nu}$ gives Harvey's result.
I should point out that you missed a term of $-\eta^{\mu\nu} L$ in your stress tensors, both in yours and in Harvey's. This doesn't affect the question.
Formula (3) defines the canonical energy-momentum tensor of the field, which is defined only in flat Minkowski spacetime and is the Noether current associated to translation symmetry.
By contrast, formula (2) defines the Einstein-Hilbert energy momentum tensor, which is also defined in a general spacetime background, but is not a Noether current, at least not in the sense the canonical tensor is.
It is well-known that the canonical EM tensor often requires "improvement", whereas it is generally accepted that the Einstein-Hilbert EM tensor is the "correct" EM tensor of a given field.
Formula (4) would be valid if the EHEM tensor (Einstein-Hilbert energy momentum tensor) and the CEM tensor (canonical energy-momentum tensor) turned out to be always equivalent to one another. This is not true in general however.
This paper by Forger and Römer examines the relationship between these two tensors in great detail.
I will give a simpler analysis here. Suppose that the Lagrangian is $\mathcal L(\phi,\partial\phi,g)$ (the factor of $\sqrt{|\mathfrak g|}$ is considered part of the Lagrangian) and the field theory has some concept of diffeomorphism symmetry and the Lagrangian is diffeomorphism invariant. The Lie derivative of the $\phi=(\phi^i)$ field is written generally as $$ \delta_X\phi^i=X^\mu\phi^i_{,\mu}+Q^{i,\mu}{}_\nu(\phi,\partial\phi)X^\nu_{,\mu} $$for any vector field $X^\mu$. (For fields which are not in the representation of the diffeomorphism group, see the paper by Forger and Römer).
Under a diffeomorphism, the Lagrangian is a scalar density, thus $$ \delta_X\mathcal L=\partial_\mu(X^\mu\mathcal L)=\frac{1}{2}\mathfrak{T}^{\mu\nu}\delta_X g_{\mu\nu}+E_i\delta_X\phi^i+\partial_\mu(P^\mu_i\delta_X\phi^i), $$
where $$ \mathfrak T^{\mu\nu}:=2\frac{\delta \mathcal L}{\delta g_{\mu\nu}},\quad E_i =\frac{\delta\mathcal L}{\delta\phi^i},\quad P^\mu_i=\frac{\partial\mathcal L}{\partial\phi^i_{,\mu}},$$and finally, $\delta_Xg_{\mu\nu}=2\nabla_{[\mu}X_{\nu]}$ is the metric Lie derivative.
If we expand the Lie derivatives and integrate by parts on the vector field $X$ and rearrange the terms to $0$, we get $$ 0=[E_i\phi^i_{,\nu}-\partial_\mu(E_i Q^{i,\mu}{}_\nu)-\nabla_\mu\mathfrak T^{\mu}{}_\nu]X^\nu+\partial_\mu[\mathfrak T^\mu{}_\nu X^\nu+P^\mu_i\delta_X\phi^i+E_iQ^{i,\mu}{}_\nu X^\nu]. $$
If we integrate this and take $X^\mu$ to be compactly supported, we get that the first bracket $[...]$ must vanish identically. This gives $$ \nabla_\mu\mathfrak T^\mu{}_\nu=E_i\phi^i_{,\nu}-\partial_\mu(E_iQ^{i,\mu}{}_\nu) $$as an off-shell relation. If the field $\phi$ is on-shell, then this reduces to $\nabla_\mu\mathfrak T^\mu{}_\nu=0$. Let the vector density whose total divergence appears in the second bracket $\partial_\mu[...]$ be denoted $S^\mu_X$.
Then we also get that $\partial_\mu S^\mu_X=0$ for all possible choices of $X$ and $\phi$ and $g$. If we expand the Lie derivative in $S^\mu_X$ as well, we get $$S^\mu_X=(\mathfrak T^\mu{}_\nu+P^\mu_i\phi^i_{,\nu}-\mathcal L\delta^\mu_\nu+E_iQ^{i,\mu}{}_\nu)X^\nu+P^\mu_iQ^{i,\kappa}{}_\nu X^\nu_{,\kappa}. $$ It is then fairly easy to show that $\partial_\mu S^\mu_X=0$ for all choices of $X^\mu$ implies that $$ S^\mu_X=\partial_\rho K^{\mu\rho}_X,\quad K^{\mu\rho}_X=P^{[\mu}_i Q^{i,\rho]}{}_\nu X^\nu, $$ so $K^{\mu\rho}_X$ is skew-symmetric in $\mu\rho$, moreover $K^{\mu\rho}_X$ is unique (the higher order analogue of this proof is much more difficult and the uniqueness is lost!).
Now evaluate this entire relation ($S^\mu_X=\partial_\rho K^{\mu\rho}_X$) in flat spacetime with $X^\mu=\tau^\mu$ a constant translation vector! Moreover, evaluate it on-shell for the $\phi$ field, i.e. $E_i=0$. Then we get $$ S^\mu=(T^\mu{}_\nu+P^\mu_i\phi^i_{,\nu}-\mathcal L\delta^\mu_\nu)\tau^\nu=\partial_\rho(P^{[\mu}_i Q^{i,\rho]}{}_\nu)\tau^\nu, $$ and using that $\tau$ is any constant vector field, we get $$ T^\mu{}_\nu=\mathcal L\delta^\mu_\nu-P^\mu_i\phi^i_{,\nu}+\partial_\rho(P^{[\mu}_i Q^{i,\rho]}{}_\nu). $$
Here $T^\mu{}_\nu$ is the (flat spacetime limit of the) Einstein-Hilbert energy momentum tensor, the $\Theta^\mu{}_\nu:=\mathcal L\delta^\mu_\nu-P^\mu_i\phi^i_{,\nu}$ is the canonical energy momentum tensor, and the divergence terms are correction terms.
This is the correct relation at least under the assumptions made on the initial form of the Lagrangian.
Best Answer
This is just elaborating a little more on the 'behind the scenes', since OP's confusions seem to be resolved in the comments already.
Consider the following simplified situation. Let $f_1,f_2:\Bbb{R}^2\to\Bbb{R}$ be two functions defined as $f_1(x,y)=x^2y^3$ and $f_2(x,y)=xy^2$. These are clearly two different functions. Consider now two curves, $\gamma_1,\gamma_2:\Bbb{R}\to\Bbb{R}^2$ defined as $\gamma_1(t)=(t,t)$ and $\gamma_2(t)=(t,t^2)$. Then, you can easily verify that the composed maps are equal: for all $t\in\Bbb{R}$, we have $(f_1\circ\gamma_1)(t)=(f_2\circ\gamma_2)(t)=t^5$. On the other hand, let us calculate their partial derivative: \begin{align} \frac{\partial f_1}{\partial x}\bigg|_{\gamma_1(t)}=2t^4,\quad\text{but}\quad\frac{\partial f_2}{\partial x}\bigg|_{\gamma_2(t)}=t^4. \end{align} This shouldn't be surprising: we started off with two different functions $f_1,f_2$, and we just happened to find two curves $\gamma_1,\gamma_2$ such that $f_1\circ\gamma_1=f_2\circ\gamma_2$. There's no reason to expect that this implies $\frac{\partial f_1}{\partial x}\circ \gamma_1= \frac{\partial f_2}{\partial x}\circ \gamma_2$, and in fact as shown above, this equality is false.
How does this relate to the Lagrangian? Fix any smooth manifold $M$, and consider the mappings
In words, $\mathscr{L}_1$ eats a $(0,2)$-tensor field in its first slot, and a vector field (a $(1,0)$ tensor field) in its second slot, and it outputs a smooth function by contracting the tensor field and vector field completely; $\mathscr{L}_2$ does a similar thing (contraction) except it has a different domain. Now, without any doubt, $\mathscr{L}_1$ and $\mathscr{L}_2$ are completely different maps.
Now, let us fix a scalar field $\phi$ on $M$. We now get two induced mappings via composition, denoted $\mathcal{L}_1$ and $\mathcal{L}_2$, defined on the space of metric tensors and taking values in $C^{\infty}(M)$, such that
Here, $g^{\sharp}$ denotes the musical isomorphism which converts covector fields into vector fields (the index-raising operation), and $g^{``-1"}$ denotes the 'inverse' metric tensor (I put 'inverse' in quotation marks since a $(0,2)$ tensor field strictly speaking doesn't have an inverse; rather we refer to a corresponding $(2,0)$ tensor).
So you see, the composed functions $\mathcal{L}_1$ and $\mathcal{L}_2$ are equal. However, the variations $\frac{\delta \mathscr{L}_1}{\delta H}\bigg|_{(g,\text{grad}_g\phi)}$ and $\frac{\delta\mathscr{L}_2 }{\delta K}\bigg|_{(g^{``-1"}, d\phi)}$ are not equal.
Hopefully the analogy with the above simple case is clear: $\mathscr{L}_i$ is like $f_i$, the map $g\mapsto (g,\text{grad}_g(\phi))$ is like the curve $\gamma_1$, and the map $g\mapsto (g^{``-1"},d\phi)$ is like the curve $\gamma_2$, and it turns out their compositions are equal: $\mathcal{L}_1=\mathcal{L}_2$. But that doesn't mean the original maps are equal, nor does it imply the composition of the derivatives along these 'curves'are equal (remember in variational calculus, we always perform the variation first, and only afterwards evaluate).
So roughly speaking, your first calculation corresponds to $\mathscr{L}_1$, where we view $\partial^a\phi$ as indpendent variables, whereas in the latter case we view $\partial_a\phi$ as the independent variables. In going from one to the other, there are factors of the metric which appear. Lastly, in physics, we view the second situation as more 'fundamental', i.e $\partial_a\phi$ is the basic quantity (afterall the exterior derivative $d\phi$ can be defined without any metric).