Yes, you're right: Fréchet-Uryson spaces are precisely the spaces in which the sequential closure is the same as the ordinary closure. For a specific example you can use the Arens space, which is discussed in Dan Ma’s Topology Blog: the sequential closure of the set of isolated points of the Arens space is not closed.
It's true that the sequential closure operator is not a true closure operator in the usual sense of the term; this is a minor nuisance, and one might wish for a better term, but I hardly think that it qualifies as exceptionally inconvenient.
Added: To turn the sequential closure operator into a true closure operator, you have to iterate it, possibly transfinitely. That is, if $\operatorname{scl}$ denotes the sequential closure, define
$$\operatorname{scl}^\eta A=\begin{cases}
A,&\text{if }\eta=0\\\\
\operatorname{scl}\bigcup_{\xi<\eta}\operatorname{scl}^\xi A,&\text{if }\eta>0\;.
\end{cases}$$
There is always an ordinal $\eta$ such that $\operatorname{scl}^\xi A=\operatorname{scl}^\eta A$ for all $\xi\ge\eta$; if we define $\operatorname{scl}^*A=\operatorname{scl}^\eta A,$ then $\operatorname{scl}^*$ is a true closure operator, and a space $X$ is sequential iff $\operatorname{scl}_X^*$ is identical to the ordinary closure.
As you've undoubtedly noticed, you can't just argue as in the case of finite products, thinning out the sequence again and again to et convergence in more and more components. After any finite number of steps, you still have an infinite subsequence of your original sequence, but if you do infinitely many steps then every term of your original sequence might eventually get removed. Then, instead of having a subsequence at the end of the process, you've got nothing.
The idea of the diagonal argument is to slightly modify the process so that your sequence doesn't entirely disappear. Very roughly, you just restrain your thinning-out operations to ensure that an infinite subsequence remains at the end of the process. Here are the details:
Start with your original sequence, and, before doing any thinning, promise yourself that you will never delete the first of its terms; call that term $a_1$. Now thin out the sequence so that the first components converge, but, in accordance with your promise, keep $a_1$ in your new, thinned-out sequence. This does not harm the first-component-convergence. Keeping $a_1$ means that the sequence of first-components has one unavoidable term at the beginning, namely the first component of $a_1$, but one term at the beginning doesn't affect convergence.
So now you have your first thinned-out sequence, starting with $a_1$, and having its first-components converging. Now make a second promise, namely that the second term of this thinned-out sequence, which I'll call $a_2$, will never be deleted. Then thin out the sequence again, jut as in your finite-product proof, to make the sequence of second-components converge, but, while thinning it out, keep your two promises. That is, $a_1$ and $a_2$ are in this second thinned-out sequence. Again, you can do this because two terms at the beginning have no effect on convergence.
Continue in this way, alternating promises with thinnings. After $n$ steps, you have a subsequence of your original sequence with two crucial properties. (1) Its first, second, $\dots$, $n$-th components are convergent sequences, and (2) its first, second, $\dots$, $n$-th terms, which I'm calling $a_1,a_2,\dots,a_n$, will be the same in all future thinned-out sequences.
Now look at the infinite sequence $a_1,a_2,\dots$ consisting of the subjects of all your promises. For each $n$, its $n$-th components converge, because you have a subsequence of what you had after $n$ thinnings, and you ensured convergence of the $n$-th components at that stage.
This means that $a_1,a_2,\dots$ converges in the product topology. Since it's clearly a subsequence of the sequence you began with, the proof is complete.
Best Answer
There may be contexts where the first definition is appropriate, but it does seem somewhat pathological in that a sequentially compact subspace may not be relatively sequentially compact. In this respect it differs from the second definition.
For example, take the Tychonoff plank $X = ([0, \omega_1] \times [0, \omega]) \setminus (\omega_1, \omega)$ and the subspace $A = [0, \omega_1) \times [0, \omega]$. Then $A$ is sequentially compact, since any sequence is confined to a subspace $[0, \alpha] \times [0, \omega]$ with $\alpha < \omega_1$, which is compact and first countable. On the other hand $\overline{A} = X$, which is not sequentially compact since $\{ (\omega_1, n) \}_{n=0}^\infty$ has no cluster point.
To decide which is the most useful definition would probably involve looking at a large number of applications, which I don't have available. I could even imagine some use for a definition $1\frac12$: a subspace $A$ of a topological space $X$ is relatively sequentially compact if there is a sequentially compact $B \subset X$ such that $A \subset B$. This is strictly weaker than definition 1 and stronger than definition 2, although I don't know if it is strictly stronger than definition 2.