This is a bit too long for a comment, so let me answer your first question.
Let $(\cdot,\cdot)$ denote the pairing of covector fields with vector fields, e.g., $(\lambda,Y) := \lambda(Y)$ for $\lambda$ a covector field and $Y$ a vector field. Think of the scalar field $(\lambda,Y)$ as the product of the covector field $\lambda$ with the vector field $Y$ and think of covariant differentation as a consistent way to define the directional differentiation of objects more complicated than scalar fields, particularly on curved spaces. Then, for all these notions of directional differentiation to be truly consistent, they should satisfy the obvious Leibniz rule with respect to taking the product of a covector field and a vector field, i.e., for any given covector field $\lambda$ and vector field $Y$,
$$
X(\lambda,Y) = (\nabla_X \lambda,Y) + (\lambda,\nabla_X Y)
$$
for any vector field $X$. What's nice is that this is actually enough to define the covariant derivative $\nabla_X \lambda$ of a covector field $\lambda$ along the vector field $X$, by defining $\nabla_X \lambda$ to be the covector field such that
$$
X(\lambda,Y) = (\nabla_X \lambda,Y) + (\lambda,\nabla_X Y),
$$
or equivalently,
$$
\nabla_X \lambda (Y) = X(\lambda(Y)) - \lambda(\nabla_X Y),
$$
for all vector fields $Y$.
Indeed, let $\Sigma \subset \mathbb{R}^3$ denote the image of $f$, let $\{e_1,e_2\}$ be your favourite frame for $T\Sigma$, i.e., a set of vector fields such that for each $p \in U$, $\{e_1(p),e_2(p)\}$ is a basis of the tangent space $T_p \Sigma \subset \mathbb{R}^3$ to $\Sigma$ at $p$, and let $\{e^1,e^2\}$ be the corresponding coframe for $T^\ast \Sigma$, i.e., a set of covector fields such that for each $p \in U$, $\{e^1(p),e^2(p)\}$ is the dual basis of the cotangent space $T^\ast_p \Sigma = (T_p \Sigma)^\ast \subset (\mathbb{R}^3)^\ast$ to $\Sigma$ at $p$ corresponding to the basis $\{e_1(p),e_2(p)\}$ of $T_p \Sigma$. Recall that any covector field $\omega$ can be uniquely written in terms of $\{e^1,e^2\}$ as
$$
\omega = (\omega,e_1)e^1 + (\omega,e_2)e^2 = \omega(e_1)e^1 + \omega(e_2)e^2.
$$
Then, we can solve Leibniz rule
$$
X(\lambda,e_j) = (\nabla_{X}\lambda,e_j) + (\lambda,\nabla_{X}e_j),
$$
i.e.,
$$
X(\lambda(e_j)) = \nabla_{X}\lambda(e_j) + \lambda(\nabla_{X} e_j),
$$
for $\nabla_{X}\lambda(e_j)$ to get
$$
\nabla_{X} \lambda(e_j) = X(\lambda(e_j)) - \lambda(\nabla_{X} e_j),
$$
which therefore forces
$$
\nabla_{X} \lambda = (X(\lambda(e_1)) - \lambda(\nabla_{X} e_1))e^1 + (X(\lambda(e_2)) - \lambda(\nabla_{X} e_2))e^2.
$$
In particular, we find that
$$
\nabla_{e_i} \lambda = (e_i(\lambda(e_1)) - \lambda(\nabla_{e_i} e_1))e^1 + (e_i(\lambda(e_2)) - \lambda(\nabla_{e_i} e_2))e^2,
$$
which is exactly what you learnt in class.
I think it's much more common to let the last index position be the one introduced by differentiation, not the first. But regardless of which convention you choose, you're never going to have a Leibniz rule for total covariant derivatives of the form $\nabla (T\otimes S) = \nabla T \otimes S + T\otimes \nabla S$. (Notice, for example, that if both $S$ and $T$ have positive rank, then the differentiation index in $\nabla T \otimes S $ is never the last one, and in $T\otimes \nabla S$ it's never the first one.)
The Leibniz rule for covariant derivatives of tensor fields applies to the covariant derivative in the direction of a vector field (or vector):
$$
\nabla_V(S\otimes T) = \nabla_V S \otimes T + S \otimes \nabla_V T.
$$
This is true whether you put the differentiated index last or first (or somewhere else).
Best Answer
Without a metric
you are immediately able to take the covariant derivative $\nabla f$ of a scalar field, which coincides with its exterior derivative $\mathrm{d}f$ $$\nabla f = \mathrm{d}f = \sum_i \partial_i f \,\omega^i$$ where $\omega^i$ are the basis covector fields. Obviously this is a covector field.
Then the derivative of $f$ in the direction of a vector $v$ admits the following notations:
$$vf = \nabla_{v}f = (\nabla f)(v) = (\mathrm{d}f)(v) \tag{1}$$
If you have a metric
say $g$, then it induces the so-called musical isomorphisms $\sharp$ (which maps covector fields to vectors fields) and $\flat$ (which maps in the other direction). So then you can define the gradient vector field of a scalar field as $$\vec{\nabla}f := (\nabla f)^\sharp$$ In this case, the directional derivative $vf$ can be expressed (apart from the notations in $(1)$) by
$$g(\vec{\nabla}f,v)$$.
In short
"Del operator" may be a bit ambiguous. When applied to functions, in my experience people use it to refer to the grandient vector, but judging by what you wrote
you have found a place where "del operator" is used to talk about the covariant derivative.