I like you. Unlike most people, you actually think about things.
The usual way of doing this is to think of a tuple $x \in X^n$ as a function $n \rightarrow X$, not necessarily injective, where $n$ is identified with the set $\{0,\ldots,n-1\}$. Then:
First.
$a \in x$ can be interpreted as shorthand for $a \in \mathrm{img}(x)$, so yes, this makes sense.
You may wish to define a new symbol $\dot{\in}$ and assert that $a \mathbin{\dot{\in}} x$ is the number of times $a$ occurs in the tuple $x$. Explicitly, if $x \in X^n$, then $$a \mathbin{\dot{\in}} x = |i \in n : x_i = a|.$$
You can write $x^{-1}(a)$ for the set of all indices at which the value $a$ occurs. Explicitly, if $x \in X^n$, then $$x^{-1}(a) = \{i \in n : x_i = a\}.$$
Second. Yes, you can take Cartesian products of functions, so Cartesian products of tuples is a special case. In particular:$$\frac{f : A \rightarrow B \qquad f' : A' \rightarrow B'}{f \times f' : A \times A' \rightarrow B \times B'}$$
$$(f \times f')(a,a') = (f(a),f(a'))$$
According to category theory, this is all packaged up into something called the "Cartesian product functor." Judging by your interests, you should definitely learn some category theory.
As you noted, the Cartesian product of tuples won't usually be a tuple. You can solve this as follows. For each pair of natural numbers $a$ and $b$, there's a function $\lambda_{a,b} : ab \rightarrow a \times b$ that implements of the lexicographical order. Then if $x \in X^a$ and $y \in Y^b$, we can build $x \times y : a \times b \rightarrow Y \times X$, and then compose with $\lambda$ to obtain $$(x \times y) \circ \lambda_{a,b} : ab \rightarrow X \times Y,$$ which is a tuple. This has something to do with commutativity.
Third. Instead of defining functions on tuples, the usual thing is to define functions on the domain or codomain. For instance, if you want to count the number occurrences in a tuple $x \in X^n$, this could be viewed as the function $X \rightarrow \mathbb{N}$ defined by $a \mapsto a \mathbin{\dot{\in}} x$.
Fourth. Everything I've said ignores your initial injectivity assumption. If this is fundamental to what you're doing, perhaps what you're really looking for is the notion of a "finite totally-ordered set." These can be treated as if they were sets in a rather direct way. You're correct to note that if $A$ and $B$ are finite totally-ordered sets, then $A \times B$ isn't totally-ordered with the usual order. It is, however, partially ordered. And you can use to lexicographic order to totalize this, if desired.
Well, you could define $(a,b,c)$ as $((a,b),(c,c))$, and then it is an element of $\mathscr{P}^4(X)$. More generally, if $n\leq 2^m$, you can represent $n$-tuples as a tree of $m$-tuply nested ordered pairs, and thus as elements of $\mathscr{P}^{2m}(X)$. This shows that $$k^n\leq f^{2\lceil \log_2 n\rceil}(k)$$ where $f(x)=2^x$. (Of course, this bound can also be obtained by just iterating $k^2\leq 2^{2^k}$ to get $k^4=(k^2)^2\leq (2^{2^k})^2\leq 2^{2^{2^{2^k}}}$ and so on.)
You can do better at least in some cases, though. For instance, you can represent triples in $\mathscr{P}^3(X)$ by defining $(a,b,c)$ as $\{(a,b),(a,c),(b,c)\}$. More generally, you can represent an $n$-tuple as the set of all the $(n-1)$-tuples obtained by deleting one entry in the $n$-tuple (for any $n>2$), so inductively this represents $n$-tuples as elements of $\mathscr{P}^n(X)$. This gives the bound $$k^n\leq f^n(k),$$ which is better than the previous bound for $n=3$ and $n=5$. Or, you can use the earlier representation of $2^m$-tuples as elements of $\mathscr{P}^{2m}(X)$ to represent $(2^m+1)$-tuples as elements of $\mathscr{P}^{2m+1}(X)$, which is slightly better than the $\mathscr{P}^{2m+2}(X)$ that would be given by the first method.
To prove that this works, let $s$ be an $n$-tuple and let $R$ be the set of $(n-1)$-tuples obtained by removing an entry from $s$; we will recover $s$ from $R$. If all elements of $R$ have the same first entry (say, $a$), then $a$ must be the first entry of $s$. Moreover, there is then a unique element of $R$ which starts with fewer $a$s than every other element of $R$ (namely the $(n-1)$-tuple obtained by removing the first entry of $s$), and that element is the remaining $n-1$ entries of $s$.
So, we may assume that not all elements of $R$ have the same first entry. If there is some $a$ such that two different elements of $R$ start with $a$, then $a$ must be the first entry of $s$ and the remaining entries are given by the unique element of $R$ that doesn't start with $a$. Thus, we may assume that $R$ has only two different elements, say one starting with $a$ and another starting with $b$. But this means the entries of $s$ can only be $a$ and $b$ (if $s$ had three distinct entries, they would give three distinct elements of $R$). Moreover, all the $a$s must be consecutive, since removing $a$s in different consecutive blocks would give different elements of $R$, and similarly the $b$s must be consecutive. We can count how many $a$s there are in $s$ (the maximum that there are in any element of $R$) and similarly for the $b$s, and we can tell whether the $a$s come first or the $b$s come first since $n>2$. Thus, we can recover $s$ from the set $R$.
For $n=3$, at least, this is optimal in the following sense: it is impossible to represent ordered triples as elements of $\mathscr{P}(\mathscr{P}(X))$ (by a formula that sends a triple $(a,b,c)$ to some doubly nested set expression in $a,b,$ and $c$). Clearly such a representation of a triple $(a,b,c)$ would have to involve all three of $a,b,$ and $c$. But now consider the 6 triples $(a,a,b),(b,b,a),(a,b,a),(b,a,b),(b,a,a),(a,b,b)$. Each of these must be represented by a distinct element of $\mathscr{P}(\mathscr{P}(\{a,b\}))$ which is not fixed if you swap $a$ and $b$. Thus, each one must contain exactly one of $\{a\}$ and $\{b\}$. This means that each one also must contain $\{a,b\}$, since the formula for $(a,b,c)$ must involve all three of $a,b,$ and $c$. But now we have a problem: there are only 4 different subsets of $\mathscr{P}(\{a,b\})$ satisfying this constraint, so our 6 triples cannot all be distinct.
Best Answer
$X^n$ denotes the $n$-th Cartesian power of the set $X$. Your guess is correct, $X^2$ is the set of ordered pairs $(x_1,x_2)$ with $x_1, x_2 \in X$. Similarly, $X^3$ is the set of ordered triples $(x_1,x_2,x_3)$ with $x_1, x_2, x_3 \in X$. Unless you choose to define pairs, triples and $k$-tuples in a very uncommon way, ordered pairs are 2-tuples and ordered triples are 3-tuples. Some examples that you might already be familiar with are $\mathbb{R}^2$,$\mathbb{R}^3$ and $\mathbb{R}^n$.
This notation, Cartesian powers and a $k$-tuple as a mathematical object are very universal in mathematics. Here is the Wikipedia entry about them.