This formula is a concise and expressive version of Koszul formula. This is just the matter of regrouping the terms.
It shows that the Levi-Civita covariant derivative is given with a formula, which employs only the Lie derivative, the exterior derivative, and the given Riemannian metric.
I find this formula very illuminating, because the Lie derivative and the exterior derivative are always present on a smooth manifold (do not require any choice), and the only choice is made when a Riemannian metric is fixed.
This formula also exhibits the important properties of the covariant derivative: it is linear in $Y$ and non-linear in $X$.
Furthermore, in reveals that the Levi-Civita covariant derivative depends on the Lie derivative of the metric, which is the source of non-linearity on the slot $X$. This observation may also lead to other interesting insights.
(1) and (2) are both right, but it's just that the $v^{\phi}$ in your two formulas mean different things, and you've unknowingly abused notation by calling them both $v^{\phi}$. This issue boils down to the distinction between the tangent vectors $\frac{\partial}{\partial \phi}$ and $e_{\phi}$. The first vector has norm $r$, while the second vector has norm $1$; and it is precisely this factor of $r$ which is the "discrepancy" you observed among the components.
Note that in the formula
\begin{align}
D_vf(p) &= \sum_{j=1}^n \frac{\partial f}{\partial x^i}\bigg|_p \cdot v^i
\end{align}
we often say "$v^i$ is the component of the vector $v$", but strictly speaking, this is an incomplete sentence. Components with respect to which basis? For this formula to work, the way we have to interpret it is that we have to write a vector $v$ as
\begin{align}
v &= \sum_{i=1}^n v^i \frac{\partial}{\partial x^i}\bigg|_p
\end{align}
In other words, they are the components of $v$ with respect to the basis $\left\{\frac{\partial}{\partial x^i}(p)\right\}_{i=1}^n$ of the tangent space $T_pM$. Once again, said differently, we have $v^i:= dx^i(p)[v]$ (the evaluation of a covector on a vector). In differential geometry, we often deal with such "coordinate induced basis".
However, in vector calculus, people often work with the normalied version of these vectors:
\begin{align}
e_i := \dfrac{\frac{\partial}{\partial x^i}(p)}{\lVert \frac{\partial}{\partial x^i}(p)\rVert}
\end{align}
In the case of polar coordinates in the plane, we have the following vectors: $\frac{\partial}{\partial r}, \frac{\partial}{\partial \phi}$ and their normalized counterparts $e_r, e_{\phi}$. The relation between them is:
\begin{align}
\frac{\partial}{\partial r} &= e_r \quad \text{and} \quad \frac{\partial}{\partial \phi} = re_{\phi} \tag{$*$}
\end{align}
So, now given a vector $v$, we can write it as
\begin{align}
v &= v^r \frac{\partial}{\partial r} + v^{\phi} \frac{\partial}{\partial \phi}
\end{align}
for some numbers $v^r, v^{\phi}\in \Bbb{R}$, OR, we can also write it as
\begin{align}
v &= \xi^r e_r + \xi^{\phi} e_{\phi}
\end{align}
for some OTHER numbers $\xi^r, \xi^{\phi}\in \Bbb{R}$. Now, based on $(*)$, we can deduce that
\begin{align}
\begin{cases}
\xi^r &= v^r \\
\xi^{\phi} &= r v^{\phi} \tag{$**$}
\end{cases}
\end{align}
One last thing: when Wikipedia says $\nabla f = \left( \frac{\partial f}{\partial r}, \frac{1}{r}\frac{\partial f}{\partial \phi}\right)$, it should really specify the basis being used. The explicit expression is:
\begin{align}
\nabla f &= \frac{\partial f}{\partial r} e_r + \frac{1}{r}\frac{\partial f}{\partial \phi} e_{\phi} \\
&= \frac{\partial f}{\partial r}\frac{\partial }{\partial r} + \frac{1}{r^2} \frac{\partial f}{\partial \phi}\frac{\partial }{\partial \phi} \tag{$\ddot{\frown}$}
\end{align}
Now, we are finally ready to resolve the issue. Starting from your equation $(1)$, we have
\begin{align}
D_vf &= \frac{\partial f}{\partial r}v^r + \frac{\partial f}{\partial \phi}v^{\phi}
\end{align}
Next, if we do this from $(2)$, then we have
\begin{align}
\langle \nabla f, v\rangle &= \left\langle\frac{\partial f}{\partial r} e_r + \frac{1}{r}\frac{\partial f}{\partial \phi} e_{\phi},\,\,\, \xi^r e_r + \xi^{\phi} e_{\phi} \right\rangle \\\\
&= \frac{\partial f}{\partial r} \xi^r + \frac{1}{r}\frac{\partial f}{\partial \phi} \xi^{\phi}
\end{align}
where I used the fact that $\{e_r,e_{\phi}\}$ is an orthonormal basis, so the inner product is just the sum of the products of the coefficients. Finally, if we plug in $(**)$ above, we find that
\begin{align}
\langle \nabla f, v\rangle &=
\frac{\partial f}{\partial r} \xi^r + \frac{1}{r}\frac{\partial f}{\partial \phi} \xi^{\phi}
=\frac{\partial f}{\partial r}v^r + \frac{\partial f}{\partial \phi}v^{\phi}
= D_vf
\end{align}
which is of course what we expect, since $\nabla f$ is DEFINED so as to make the equation $\langle \nabla f(p), v\rangle = D_vf(p) = df_p(v)$ work out.
Summary:
Whenever you speak of "components of a vector", you MUST ALWAYS keep track of which basis you're referring to. Very often in Differential geometry/Riemannian geometry, people work with the coordinate-induced basis vectors $\frac{\partial}{\partial x^i}$ (so when people write $v^i$ in this context, it's components relative to this basis), whereas in elementary vector calculus, people work with the normalized vectors $e_i$ (and because this is the only basis they use, when they write $v^i$, they mean the components relative to this basis).
Wikipedia from my experience isn't too consistent regarding the usage, and I recall seeing a single article with both uses simultaneously... which is of course very confusing. My suggestion for the future is to always be cautious of this distinction (there are also several other questions on this site where the entire confusion boils down to the difference between a normalized vs unnormalized basis).
Best Answer
Given a smoothly varying $1$-parameter family $\Psi[t]$ of tensor fields, which we can regard as a map $\Psi: J \to \Gamma(\bigotimes^l TM \otimes \bigotimes^k T^*M)$ for some interval $J \ni 0$, we can differentiate $\Psi$ with respect to $t$ and evaluate at $t = 0$ to produce another such family, $$\partial_t \Psi[t] = \lim_{h \to 0} \frac{\Psi[t + h] - \Psi[t]}{h} \in \Gamma({\textstyle \bigotimes^l TM \otimes \bigotimes^k T^*M}).$$ (With respect to any local coordinates, $\Psi[t]$ has some components $\hat{\Psi}[t]^{b_1 \cdots b_l}{}_{a_1 \cdots a_k}$, and the components of $\partial_t \Psi$ are just the usual single-variable derivatives of these with respect to $t$, i.e., $$\widehat{\partial_t \Psi[t]}^{b_1 \cdots b_l}{}_{a_1 \cdots a_k} = \partial_t \hat{\Psi}[t]^{b_1 \cdots b_l}{}_{a_1 \cdots a_k}. )$$
Now, we can just as easily compute higher derivatives $\partial_t^k \Psi[t]$. Then, just as in the familiar case of Taylor series of functions, we can evaluate all of these derivatives at $t = 0$ and assemble the result into a (formal) Taylor series $$\sum_{k = 0}^{\infty} \frac{1}{k!} \partial^k \Psi[t] \vert_{t = 0}.$$ The (coefficient of) the first-order term here is the tensor field $$\phantom{(\ast)} \qquad \partial_t \Psi[t]\vert_{t = 0} = \left.\lim_{h \to 0} \frac{\Psi[h] - \Psi[0]}{h}\right\vert_{t = 0}. \qquad (\ast)$$
Now, given a tensor field $\Phi \in \Gamma(\bigotimes^l TM \otimes \bigotimes^k T^*M)$ and a vector field $X \in \Gamma(TM)$, the flow $\theta_t$ of $X$ gives us (up to the usual issues involved in existence of flows) a $1$-parameter family of tensor fields $$\Phi[t] := \theta_t^* \Phi$$ with $\Phi[0] = \Phi$. Then, substituting $\Phi[t]$ into the formula $(\ast)$ for the first-order coefficient of the series recovers the usual definition of Lie derivative of a tensor field as claim.