Everything works as it should if this were about a pure functional programming language. Yes, it is lexical scoping.
$(\lambda x.(x\lambda x.x))a = a\lambda x.x$.
Basically, $\lambda x.x = \lambda y.y$. You are defining operations as you would expect in a programming language. So you do have to be careful about cases like $\lambda x.(x\lambda x.x)$.
So $\lambda a . (\lambda x.(ax))$ is an operation which, given applied to $b$, returns $\lambda x.(bx)$.
The worst case is something like this:
$$(\lambda a.\lambda x.(ax))x$$
A naive approach would yield $\lambda x.(xx)$. The rigorous definition to deal with this is a bit nightmarish. The Wikipedia page for $\lambda$-calculus has this fairly opaque language:
The freshness condition (requiring that y is not in the free variables of r) is crucial in order to ensure that substitution does not change the meaning of functions. ...
In general, failure to meet the freshness condition can be remedied by alpha-renaming with a suitable fresh variable.
Basically, we have to be careful when apply $\lambda x.E$ to an expression with a free $x$ variable in it.
Your reduction sequences according to the normal order and the unspecified order are correct, but your second reduction sequence does not follow the applicative order. In order to understand why, some definitions are required.
A redex is a term of the form $(\lambda x M)N$. An outermost redex is a redex that is not contained inside another one. An innermost redex is one that has no redexes inside it (see here).
In your attempt to follow the applicative order (i.e. reducing the leftmost innermost redex), your first reduction step is correct. Now, consider $(\lambda x.xx)((\lambda a.a)(\lambda b.b))$: its leftmost innermost redex is $\color{red}{(\lambda a.a)}\color{blue}{(\lambda b.b)}$, and not $\color{red}{(\lambda x.xx)}\color{blue}{((\lambda a.a)(\lambda b.b))}$ because the latter contain a redex $\color{blue}{(\lambda a.a)(\lambda b.b)}$.
Therefore, the correct reduction sequence following the applicative order is:
\begin{align}
&(\lambda x.x(\color{red}{(\lambda y.y)}\color{blue}x))((\lambda a.a)(\lambda b.b)) \\ \equiv_{\beta} \ & (\lambda x.xx)(\color{red}{(\lambda a.a)}\color{blue}{(\lambda b.b))} \\
\equiv_\beta \ & \color{red}{(\lambda x. xx)}\color{blue}{(\lambda b.b)} \\
\equiv_\beta \ & \color{red}{(\lambda b. b)}\color{blue}{(\lambda b.b)} \\
\equiv_\beta \ & \lambda b.b
\end{align}
To see that the normal order reduces to a normal form while the applicative order diverges, consider the term $M = (\lambda y. x)(\delta\delta)$ where $\delta =
\lambda z.zz$. Indeed, the leftmost outermost redex in $M$ is the whole term, which reduces to $x$; while the leftmost innermost redex in $M$ is $\delta\delta$, which reduces to itself.
Best Answer
As you said, the lambda expression $(\lambda y.yx)$ just adds a final $x$ to the term it's applied to. In the case at hand, it's applied to $x$, not to $xz$. Applying it to $x$ produces $xx$ while the $z$ at the end just sits there.
Remember the standard convention of lambda calculus that $abc$ means $(ab)c$, not $a(bc)$. That's why, in $(\lambda y.yx)xz$, the $(\lambda y.yx)$ is applied only to the $x$, not to the $xz$.