The two definitions are certainly not equivalent.
Suppose that $A$ is any infinite Dedekind-finite set. Let $S$ be the set of finite injective sequences from $A$, namely, all the injective functions from a natural number into $A$.
Claim. $S$ is Dedekind-finite.
Proof. Suppose not, then we have a countably infinite subset of sequences, $f_n$ for $n<\omega$, and without loss of generality $f_n\neq\varnothing$ for all $n<\omega$ as well, otherwise just omit that one empty function. Define the following function by recursion $F(0)=f_0(0)$; $F(n+1)=f_n(k)$ where $n$ is the least such that for some $k\in\operatorname{dom}(f_n)$, $F(i)\neq f_n(k)$ for all $i\leq n$. This is well-defined since the first $n$ functions enumerated only a finite union of finite sets, and there are only finitely many injections from a finite ordinal into a finite set, so there will be some $n$ and $k$.
In other words, go through the functions in order, and along their domains, in order, and when you find a new element you've not encountered before, add it to your list.
Easily, $F$ is injective and with domain $\omega$. So $A$ must be Dedekind-infinite, which is a contradiction. So, indeed, $S$ is Dedekind-finite. $\square$
Claim. There is a surjection from $S$ onto itself which is not a bijection.
Proof. Let $f\in S$, if $f=\varnothing$, then $f\mapsto f$; otherwise, the domain of $f$ is some $n+1$, simply define $f\mapsto f\restriction n$. In other words, remove the last point from each sequence, if such point exists.
Easily, this function is a surjection, and it is certainly not a bijection since any $1$-tuple is mapped to the empty function, as well as the empty function itself.
So, what we have shown is that if there is any kind of Dedekind-finite set, then there is one which is $*$-Dedekind-infinite.
To your second question, since you've proved the equivalences of all the others, let me use that.
Assuming (2), or equivalently (4), we can take either $B$ to be a proper subset of $A$ of the same cardinality, or $A=\cup\{x\}$ for some $x\notin A$. So we get the case that $B\subsetneq A$ or $A\subsetneq B$, respectively. In both cases, exactly one of $A\setminus B$ and $B\setminus A$ is empty.
So (1)–(4) all imply (5).
In the other direction, assume that there is no bijection between $A\setminus B$ and $B\setminus A$ and let $f\colon A\to B$ be a bijection.
Then we know that either there is some $a\in A\setminus B$ such that $f(a)\in A\cap B$ or there is some $b\in B\setminus A$ such that $f^{-1}(b)\in A$. Since otherwise, $f\restriction A\setminus B\to B\setminus A$ would have been a bijection.
Let's consider all the images and preimages of points in $A$ that are in $A$; and let's do the same for those in $B$ (by $f^{-1}$). So given a point $a\in A$, if $f(a)\in A$, or $f^{-1}(a)\in A$, we will chart that course. Once we've hit $B\setminus A$ or $A\setminus B$, we stop. Now let's consider our cases.
There is some $a\in A$ with an infinite orbit. In this case, we have (3) holds, so $A$ is Dedekind-infinite.
There are no infinite orbits. But if $a\in A\setminus B$ or $b\in B\setminus A$, then the orbits of $a$ and $b$ must not only be finite, but linear and satisfy these two properties:
If $f^n(a)$ is the last point in the orbit, then $f^{n+1}(a)\in B\setminus A$. Otherwise, we can continue and apply $f$ for longer.
If $a\notin A\setminus B$, then it is the only point in that orbit satisfying that property. Because $f\colon A\to B$, it is impossible for a point in $A\setminus B$ to be in the image of $f$.
The two properties have versions for $b\in B\setminus A$, of course. However, what it means is that we can identify for each $a\in A\setminus B$, some $b\in B\setminus A$ which is uniquely determined by some finitely many applications of $f$. In other words, $f$ determines an injection from $A\setminus B$ into $B\setminus A$, and likewise $f^{-1}$ determines an injection in the other direction. By Cantor–Bernstein, this means that $A\setminus B$ and $B\setminus A$ have the same cardinality.
Therefore, either $A$ is Dedekind-infinite, or $A\setminus B$ and $B\setminus A$ have the same cardinality. Since we assumed the latter failed, it must be that the former holds. $\square$
The second Fraenkel model (in the terminology of Jech's AOC monagraph, see section 4.4) is a permutation model that has these properties.
The model is constructed as follows: We start with countably-many atoms grouped like $A = \bigcup_n \{a_n,b_n\}$, consider the group of permutations $\pi$ such that for all $n,$ either $\pi a_n= a_n$ and $\pi b_n = b_n,$ or $\pi a_n= b_n$ and $\pi b_n = a_n$, and then take the permutation model $\mathcal V$ of this group with finite supports.
As usual for finite supports, $A$ is infinite and Dedekind-finite in $\mathcal V$: For any $f:\omega \to A$ with infinite range and any finite $E\subseteq A,$ there is an $n$ such that $a_n,b_n\notin E$ and either $a_n$ or $b_n$ in the range of $f.$ Then the $\pi$ that just swaps $a_n$ and $b_n$ fixes $E$ and has $\pi f \ne f,$ and thus $f\notin \mathcal V.$
But unlike the other two permutation models Jech studies in chapter $4,$ note that $A$ is not weakly Dedekind-finite: The surjection $A\to \omega$ given by $a_n\mapsto n$ and $b_n\mapsto n$ is clearly symmetric with support $\emptyset$.
To show that every infinite set is weakly Dedekind-infinite in $\mathcal V,$ we can use the fact that $\mathcal V$ satisfies the axiom of multiple choice (i.e. for any collection $X$ of nonempty sets, there is a function $f$ on $X$ such that for all $x\in X,$ $f(x)$ is a finite subset of $x$), which is proven as part of Jech's theorem 9.2. The idea of the proof is that for any nonempty $X\in \mathcal V$ and $x\in X,$ the set $G(X,x) = \{\pi x: \pi X = X\}$ is finite, since $\pi x$ only determined by what $\pi$ does on the support of $x$, and is clearly unchanged by any $\pi$ with $\pi X= X.$ By choosing a canonical $X$ and $x$ in each orbit of the full group and defining $F(\rho X) = \rho G(X,x)$, we can see $F$ is a symmetric multiple choice function on $\mathcal V.$
So with that result, it just remains to show multiple choice implies every infinite set is weakly Dedekind-infinite. Let $X$ be infinite and for $n\in \omega,$ let $X_n = \{Y\subseteq X: |Y|=n\}.$ Let $Y_n$ be a multiple choice sequence from $X_n$ and let $Y'_n = \cup Y_n,$ so each $Y_n'$ is a finite subset of $X$ with size $\ge n.$ Then by finding the next $Y_n$ with new elements (which always will exist since the size keeps growing) and taking the new elements, we can get a sequence $Z_n$ of disjoint nonempty subsets of $X.$ Then for $x\in X,$ define $g(x)$ be the unique $n$ such that $x\in Z_n$ if $x\in \bigcup_n Z_n,$ and otherwise let $g(x) =0$, and $g$ is a surjection $X\to \omega.$
It should be noted that this only establishes independence of the two propositions over ZFA, and the argument seems unpromising to transfer to ZF since multiple choice is equivalent to choice over ZF. (Also, according to Howard and Rubin, which may be outdated, it's unknown if every infinite set is weakly Dedekind infinite in the second Cohen model.) However, as noted in the comments, the basic Cohen model witnesses independence over ZF, and exercise 5.20 in Jech sketches the argument.
Best Answer
Assuming the axiom of choice, yes, quite easily. If $f\colon A\to A$ is a surjection, there is some $g\colon A\to A$ which is an injection and $f(g(a))=a$. Simply prove that since $f$ is not injective, $g$ cannot be surjective.
Not assuming choice, however, this is not necessarily true. It is consistent that there is a Dedekind-finite set $A$ and $f\colon A\to A$ which is surjective and not injective. For example, if $A$ is an infinite Dedekind-finite set, the set $S(A)$ of all finite injective sequences from $A$ is also Dedekind-finite and the function $f\colon S(A)\to S(A)$ given by erasing the last coordinate of a sequence is surjective but far from injective.