The distinction you are making can be useful and is actually used in some proof assistants.
There are many possible rewriting rules in lambda-calculus, the main ones being $\alpha$-conversion, $\beta$-reduction and $\eta$-conversion. The fundamental equivalence relation between terms is syntactic equality $\equiv$, which is usually defined so that it contains $\alpha$-conversion. But there are also other equivalence relations which identify terms up to repeated application of $\beta$-reduction and/or $\eta$-conversion.
For example, intensional equality $=_\beta$ between terms is defined by the rules:
$$\frac {M \to_\beta N} {M =_\beta N} \qquad \frac {M \equiv N} {M =_\beta N} \qquad \frac {M =_\beta N} {N =_\beta M} \qquad \frac {L =_\beta M \quad M =_\beta N} {L =_\beta N}$$
which together express the fact that $=_\beta$ is the reflexive, symmetric and transitive closure of $\to_\beta$.
Extensional equality $=_{\beta\eta}$ is another relation between terms, defined by the rules:
$$\frac {M \to_\beta N} {M =_{\beta\eta} N} \qquad \frac {M \to_\eta N} {M =_{\beta\eta} N} \qquad \frac {M \equiv N} {M =_{\beta\eta} N} \qquad \frac {M =_{\beta\eta} N} {N =_{\beta\eta} M} \qquad \frac {L =_{\beta\eta} M \quad M =_{\beta\eta} N} {L =_{\beta\eta} N}$$
So, when we say that two terms are "the same" or "equal", we need to specify whether we are talking about syntactic equality, intensional equality or extensional equality.
In your case, what you call "creation of definitions" might be seen as either an extension of syntactic equality with rules such as
$$\mathsf {id} \equiv \lambda x. x$$
or as giving introduction rules for another rewriting rule that is sometimes known as $\delta$-conversion. One of such rules could be
$$\mathsf {id}\, M \to_\delta M$$
which morally reflects the fact that $\mathsf {id}$ is the identity. In this case, we would then define new equivalence relations $=_{\beta\delta}$ and $=_{\beta\delta\eta}$, just like before but respectively with the additional rules
$$\frac {M \to_\delta N} {M =_{\beta\delta} N} \qquad \frac {M \to_\delta N} {M =_{\beta\delta\eta} N}$$
The choice depends on what we want to do. Most of the times we do treat $\mathsf {id}$ as a shortcut for $\lambda x. x$, so we may choose to consider them syntactically equivalent. But in some contexts it makes sense to consider also $\delta$-conversion: for example, in Coq the so-called "unfolding of transparent constants", which essentially corresponds to $\to_\delta$, allows to rewrite a new constant only when it makes sense to do so.
Remark. There are reasons why it is preferable to have $\mathsf{id} \, M \to_\delta M$ instead of $\mathsf{id} \to_\delta \lambda x. x$, see Taroccoesbrocco's comments below.
Your reduction sequences according to the normal order and the unspecified order are correct, but your second reduction sequence does not follow the applicative order. In order to understand why, some definitions are required.
A redex is a term of the form $(\lambda x M)N$. An outermost redex is a redex that is not contained inside another one. An innermost redex is one that has no redexes inside it (see here).
In your attempt to follow the applicative order (i.e. reducing the leftmost innermost redex), your first reduction step is correct. Now, consider $(\lambda x.xx)((\lambda a.a)(\lambda b.b))$: its leftmost innermost redex is $\color{red}{(\lambda a.a)}\color{blue}{(\lambda b.b)}$, and not $\color{red}{(\lambda x.xx)}\color{blue}{((\lambda a.a)(\lambda b.b))}$ because the latter contain a redex $\color{blue}{(\lambda a.a)(\lambda b.b)}$.
Therefore, the correct reduction sequence following the applicative order is:
\begin{align}
&(\lambda x.x(\color{red}{(\lambda y.y)}\color{blue}x))((\lambda a.a)(\lambda b.b)) \\ \equiv_{\beta} \ & (\lambda x.xx)(\color{red}{(\lambda a.a)}\color{blue}{(\lambda b.b))} \\
\equiv_\beta \ & \color{red}{(\lambda x. xx)}\color{blue}{(\lambda b.b)} \\
\equiv_\beta \ & \color{red}{(\lambda b. b)}\color{blue}{(\lambda b.b)} \\
\equiv_\beta \ & \lambda b.b
\end{align}
To see that the normal order reduces to a normal form while the applicative order diverges, consider the term $M = (\lambda y. x)(\delta\delta)$ where $\delta =
\lambda z.zz$. Indeed, the leftmost outermost redex in $M$ is the whole term, which reduces to $x$; while the leftmost innermost redex in $M$ is $\delta\delta$, which reduces to itself.
Best Answer
Everything works as it should if this were about a pure functional programming language. Yes, it is lexical scoping.
$(\lambda x.(x\lambda x.x))a = a\lambda x.x$.
Basically, $\lambda x.x = \lambda y.y$. You are defining operations as you would expect in a programming language. So you do have to be careful about cases like $\lambda x.(x\lambda x.x)$.
So $\lambda a . (\lambda x.(ax))$ is an operation which, given applied to $b$, returns $\lambda x.(bx)$.
The worst case is something like this:
$$(\lambda a.\lambda x.(ax))x$$
A naive approach would yield $\lambda x.(xx)$. The rigorous definition to deal with this is a bit nightmarish. The Wikipedia page for $\lambda$-calculus has this fairly opaque language:
Basically, we have to be careful when apply $\lambda x.E$ to an expression with a free $x$ variable in it.