It seems to me that people usually include the $\mathbb{R}$-linearity as part of the definition (correct me if I'm wrong); i.e. define $\nabla$ on the simple tensors by the product rule above, and then extend it linearly over $\mathbb{R}$ to the whole space $C^{\infty}(E\otimes F)$.
Actually, due to the fact that $C^{\infty}(E\otimes F)\simeq C^{\infty}(E)\otimes_{C^{\infty}(M)}C^{\infty}(F)$ which OP has mentioned in the comment above, it follows that additivity (i.e.
\begin{align}
\nabla_X(s\otimes t+s'\otimes t')=\nabla_X(s\otimes t)+\nabla_X(s'\otimes t')
\end{align}
which is weaker than $\mathbb{R}$-linearity) is sufficient.
With this in mind, here is an alternative proof, which is essentially a direct computation, though perhaps it may look less elegant.
Let $A\in C^{\infty}(E\otimes F)$. Such $A$ can be expressed as a finite sum of simple tensors, though not uniquely in general. So we want to show that if it can be written in the following two ways
\begin{align}
A=\sum_is_i\otimes t_i=\sum_j\tilde{s}_j\otimes\tilde{t}_j
\end{align}
(the number of summands in these two expressions may be different in general), then
\begin{align}
\nabla_X\left(\sum_is_i\otimes t_i\right)
=\nabla_X\left(\sum_j\tilde{s}_j\otimes\tilde{t}_j\right) & & (*)
\end{align}
Let $(e_{\alpha})$ and $(\epsilon_{\beta})$ be smooth local frames for $E$ and $F$ respectively. Then one can write
\begin{align}
s_i=s_i^{\alpha}e_{\alpha},\qquad
t_j=t_j^{\beta}\epsilon_{\beta},\qquad
\tilde{s}_i=\tilde{s}_i^{\alpha}e_{\alpha},\qquad
\tilde{t}_j=\tilde{t}_j^{\beta}\epsilon_{\beta}
\end{align}
(Einstein summation convention is assumed.) Then we will have
\begin{align}
\sum_is_i\otimes t_i
=\left(\sum_is_i^{\alpha}t_i^{\beta}\right)e_{\alpha}\otimes\epsilon_{\beta},
\qquad
\sum_j\tilde{s}_j\otimes\tilde{t}_j
=\left(\sum_j\tilde{s}_j^{\alpha}\tilde{t}_j^{\beta}\right)e_{\alpha}\otimes\epsilon_{\beta}
\end{align}
Since $\{e_{\alpha}\otimes\epsilon_{\beta}\}$ is a smooth local frame for $E\otimes F$, by uniqueness of local components we must have
\begin{align}
\sum_is_i^{\alpha}t_i^{\beta}=\sum_j\tilde{s}_j^{\alpha}\tilde{t}_j^{\beta} & & (1)
\end{align}
Taking exterior derivative we then have
\begin{align}
\sum_i\left(t_i^{\beta}ds_i^{\alpha}+s_i^{\alpha}dt_i^{\beta}\right)
=\sum_j\left(\tilde{t}_j^{\beta}d\tilde{s}_j^{\alpha}
+\tilde{s}_j^{\alpha}d\tilde{t}_j^{\beta}\right) & & (2)
\end{align}
Now let $\omega_{\alpha}^{\beta}$ and $\theta_{\alpha}^{\beta}$ be the connection 1-forms of $\nabla^E$ and $\nabla^F$ respectively, so that e.g.
\begin{align}
\nabla_X s_i=\left[Xs_i^{\alpha}+s_i^{\beta}\omega_{\beta}^{\alpha} (X)\right]e_{\alpha} & & (3)
\end{align}
and similar identities hold for $\nabla_Xt_i$, $\nabla_X\tilde{s}_j$ and $\nabla_X\tilde{t}_j$. Then we can compute
\begin{align}
&\nabla_X\left(\sum_is_i\otimes t_i\right) \\
%%%
&=\sum_i\left(\nabla_Xs_i\otimes t_i+s_i\otimes\nabla_Xt_i\right) \\
%%%
&=\underbrace{\left(\sum_i\left((Xs_i^{\alpha})t_i^{\beta}
+s_i^{\alpha}(Xt_i^{\beta})\right)\right)}_{=:I}e_{\alpha}\otimes\epsilon_{\beta}
+\underbrace{\left(\sum_is_i^{\gamma}t_i^{\beta}\right)}_{=:II}
\omega_{\gamma}^{\alpha}(X)e_{\alpha}\otimes\epsilon_{\beta}
+\underbrace{\left(\sum_is_i^{\alpha}t_i^{\gamma}\right)}_{=:III}
\theta_{\gamma}^{\beta}(X)e_{\alpha}\otimes\epsilon_{\beta}
\end{align}
where the first step follows by additivity and product rule, while the second step is obtained by substitution of (3) and rearranging the terms.
Now $\nabla_X\left(\sum_j\tilde{s}_j\otimes\tilde{t}_j\right)$ will have the same expression, except that all of the $s_i,t_i$ are replaced by $\tilde{s}_j,\tilde{t}_j$; says
\begin{align}
\nabla_X\left(\sum_j\tilde{s}_j\otimes\tilde{t}_j\right)
=\tilde{I}\cdot e_{\alpha}\otimes\epsilon_{\beta}
+\tilde{II}\cdot \omega_{\gamma}^{\alpha}(X)e_{\alpha}\otimes\epsilon_{\beta}
+\tilde{III}\cdot \theta_{\gamma}^{\beta}(X)e_{\alpha}\otimes\epsilon_{\beta}
\end{align}
Then:
- By (2), we have $I=\tilde{I}$.
- By (1), we have $II=\tilde{II}$ and $III=\tilde{III}$.
Hence, (*) holds as desired.
$\Gamma(E \otimes T^*M)$ is the set of sections of the vector bundle $E \otimes T^*M = \bigcup_{p \in M} E_p \otimes T_p^*M$. The usual second definition (that I will be using) is
Definition: A connection on a vector bundle $\pi:E \to M$ is a map
$$D: \Gamma(E) \to \Gamma(T^*M \otimes E)$$ such that for any $s_1, s_2 \in \Gamma(E)$, $D(s_1 + s_2) = Ds_1 + Ds_2$ and for any section $s \in \Gamma(E)$ and $\alpha \in C^\infty(M)$, $D(\alpha s) = d\alpha \otimes s + \alpha Ds$.
Now I will sketch one way to get between the two definitions.
($D \to \nabla$): For fixed $X \in \Gamma(TM)$, define $\nabla_X(s) := i_X(Ds)$ where, for $\omega \otimes s \in \Gamma(T^*M \otimes E)$, $i_X(\omega \otimes s) = \omega(X) s \in \Gamma(E)$. Then it's a straightforward exercise to check that this has the desired properties.
($\nabla \to D$): Let $x^1, \dots, x^n$ be local coordinates for some chart $U \subseteq M$. Define in local coordinates
$$D(s) = \sum_{i=1}^n dx^i \otimes \nabla_{\frac{\partial}{\partial x^i}} s.$$
Then you can check that this definition has the right behaviour under a change of coordinates to define a global object. Finally, it is an easy coordinate based calculation to check the desired properties.
Finally, by a calculation in local coordinates, you can check that these constructions are inverse to each other so that the definitions are equivalent.
Best Answer
The second definition views a connection as a map $\nabla : \Gamma(E) \to \Gamma(T^*M\otimes E)$, so given $s \in \Gamma(E)$, we have $\nabla s \in \Gamma(T^*M\otimes E)$. Using the isomorphism $T^*M\otimes E \cong \operatorname{Hom}(TM, E)$, the element $\nabla s \in \Gamma(M, T^*M\otimes E)$ corresponds to an element $\overline{\nabla s} \in \Gamma(M, \operatorname{Hom}(TM, E))$. If $X \in \mathfrak{X}(M) = \Gamma(TM)$, then $(\overline{\nabla s})(X) \in \Gamma(E)$; this is precisely $\nabla_Xs$ according to the first definition, i.e. $(\overline{\nabla s})(X) = \nabla_Xs$.
With this in mind, the Leibniz rule for the two definitions correspond in the same way. Note that
$$(\overline{\nabla(fs)})(X) = \nabla_X(fs) = X(f)s + f\nabla_Xs= df(X)s + f(\overline{\nabla s})(X).$$
Recall that for finite-dimensional vector spaces $V$ and $W$, the isomorphism $V^*\otimes W \to \operatorname{Hom}(V, W)$ is generated by $\alpha\otimes w \mapsto L$ where $L(v) = \alpha(v)w$. So under the isomorphism $\operatorname{Hom}(TM, E) \cong T^*M\otimes E$, the map $X \mapsto df(X)s$ corresponds to $df\otimes s$, i.e. $df(X)s = (\overline{df\otimes s})(X) = df(X)s$. Therefore
$$(\overline{\nabla(fs)})(X) = df(X) s + f(\overline{\nabla s})(X) = (\overline{df\otimes s})(X) + f(\overline{\nabla s})(X) = (\overline{df\otimes s + f\nabla s})(X),$$
so $\overline{\nabla(fs)} = \overline{df\otimes s + f\nabla s}$, and hence $\nabla(fs) = df\otimes s + f\nabla s$.