How can one feel comfortable with non-solvable algebraic numbers?
The nice thing about solvable numbers is this idea that they have a formula. You can manipulate the formula as if it were actually a number using some algebra formalism that you probably have felt comfortable with for a while. For instance $\sqrt{3+\sqrt{6}}+2$ is an algebraic number. What do you get if you add it to $7$? Well $\left(\sqrt{3+\sqrt{6}}+2\right)+7=\sqrt{3+\sqrt{6}}+9$ seems like a decent answer. As a side note: there actually some reasonably hard algorithmic questions along these lines, but I'll assume they don't worry you. :-)
We'd like to be able to manipulate other algebraic numbers with similar comfort. The first method I was taught is pretty reasonable:
Kronecker's construction: If $x$ really is an algebraic number, then it is the root of some irreducible polynomial $x^n - a_{n-1} x^{n-1} - \ldots - a_1 x - a_0$. But how do we manipulate $x$? It's almost silly: we treat it just like $x$, and add and multiply as usual, except that $x \cdot x^{n-1}$ needs to be replaced by $a_{n-1} x^{n-1} + \ldots + a_1 x + a_0$, and division is handled by replacing $1/x$ with $( x^{n-1} - a_{n-1} x^{n-2} - \ldots - a_2 x - a_1)/a_0$. This is very similar to "integers mod n" where you replace big numbers by their remainder mod n,. In fact this is just replacing a polynomial in $x$ with its remainder mod $x^n - a_{n-1} x^{n-1} - \ldots - a_1 x - a_0$.
I found it somewhat satisfying, but in many ways it is very mysterious. We use the same symbol for many different algebraic numbers; each time we have to keep track of the $f(x)$ floating in the background. Also it raises deep questions about how to tell two algebraic numbers apart. Luckily more or less all of these questions have clean algorithmic answers, and they are described in Cohen's textbooks CCANT (A Course in Computational Algebraic Number Theory, Henri Cohen, 1993).
Companion matrices: But years later, it still bugged me. Then I studied splitting fields of group representations. The crazy thing about these fields is that they are subrings of matrix rings. So “numbers” were actually matrices. You've probably seen some tricks like this $$\mathbb{C} = \left\{ \begin{bmatrix} a & b \\ -b & a \end{bmatrix} : a,b \in \mathbb{R} \right\}$$ where we can make a bigger field out of matrices over a smaller field. It turns out that is always true: If $K \leq F$ are fields, then $F$ is a $K$-vector space, and the function $f:F \to M_n(K) : x \mapsto ( y \mapsto xy )$ is an injective homomorphism of fields, so that $f(F)$ is a field isomorphic to $F$ but whose “numbers” are just $n \times n$ matrices over $K$, where $n$ is the dimension of $F$ as a $K$-vector space (and yes $n$ could be infinite if you want, but it's not).
That might seem a little complicated, but $f$ just says "what does multiplying look like?" For instance if $\mathbb{C} = \mathbb{R} \oplus \mathbb{R} i$ then multiplying $a+bi$ sends $1$ to $a+bi$ and $i$ to $-b+ai$. The first row is $[a,b]$ and the second row $[-b,a]$. Too easy.
Ok, fine, but that assumes you already know how to multiply, and perhaps you are not yet comfortable enough to multiply non-solvable algebraic numbers! Again we use the polynomial $x^n - a_{n-1} x^{n-1} - \ldots - a_1 x - a_0$, but this time as a matrix. We use the same rule, viewing $F=K \oplus Kx \oplus Kx^2 \oplus \ldots \oplus Kx^{n-1}$ and ask what $x$ does to each basis element: well $x^i$ is typically sent to $x^{i+1}$. It's only the last one that things get funny:
$$f(x) = \begin{bmatrix} 0 & 1 & 0 & 0 & \ldots & 0 & 0 \\ 0 & 0 & 1 & 0 & \ldots & 0 & 0 \\
0 & 0 & 0 & 1 & \ldots & 0 & 0 \\
& & & & \ddots & & \\
0 & 0 & 0 & 0 & \ldots & 1 & 0 \\
0 & 0 & 0 & 0 & \ldots & 0 & 1 \\
a_0 & a_1 & a_2 & a_3 & \ldots & a_{n-2} & a_{n-1}
\end{bmatrix}$$
So this fancy “number” $x$ just becomes a matrix, most of whose entries are $0$. For instance $x^2 - (-1)$ gives the matrix $i = \left[\begin{smallmatrix} 0 & 1 \\ -1 & 0 \end{smallmatrix}\right]$.
This nice part here is that different algebraic numbers can actually have different matrix representations. The dark part is making sure that if you have two unrelated algebraic numbers that they actually multiply up like a field. You see $M_n(K)$ has many subfields, but is not itself a field, so you have to choose matrices that both lie within a subfield. Now for splitting fields and centralizer fields and all sorts of handy dandy fancy fields, you absolutely can make sure everything you care about comes from the field. Starting from just a bunch of polynomials though, you need to be careful and find a single polynomial that works for both. This is called the primitive element theorem.
This also lets you see the difference between eigenvalues in the field $K$ and eigenvalues (“numbers”) in the field $F$: the former are actually numbers, or multiples of the identity matrix, while the latter are full-fledged matrices that happen to lie in a subfield. If you ever studied the “real form” of the eigenvalue decomposition with $2\times 2$ blocks, those $2 \times 2$ blocks are exactly the $\begin{bmatrix}a&b\\-b&a\end{bmatrix}$ complex numbers.
Re-posting this comment as an answer, because it's the best answer that's made itself available so-far; but I might come back and edit this if I make time to properly educate myself on this subject!
My understanding of the historical development is that Emil Artin developed much of modern Galois theory (at least the stuff that undergrads see) with the specific aim of proving the fundamental theorem without the Primitive Element Theorem (which he regarded as unnecessary, because it is tantamount to choosing a convenient basis), and he succeeded. I would venture to say that most modern courses in the UK do not use PET (my own undergrad course did not even mention it!).
Source: the comment of JS Milne on this MathOverflow question.
As for mathematical references, the aforementioned Milne has a very nice set of notes on Galois Theory on his website, which I believe takes the approach you are looking for.
Best Answer
This paper, titled "Elementary functions and their inverses" by J.F. Ritt addresses your question.
Some time ago, in searching for why some functions don't have elementary integrals, I was led to the work of Liouville, as digested by Ritt in his book "Integration in finite terms; Liouville's theory of elementary methods". It was written in 1948 so I think the copyright has expired, and you can find a download link via a Google search.
Liouville's results on elementary integrals were derived using quite basic tools (it is hard to be precise here on what I mean by "basic tools", best you see for yourself). The latter portions of Ritt's book explore elementary solutions of differential equations, which is based on the work of mathematicians after Liouville, and it is only from then that some differential Galois theory is used.
Ritt's paper uses methods somewhat similar to Liouville and in particular, does not seem to use differential Galois theory. However, it is possible there may be a more modern approach to your question that does use differential Galois theory, since the generalization of Liouville's original work develops it.
Alternatively, if you can express your inverse function in terms of the Lambert W function as Nicholas suggests, then you can answer your question via the more specific methods of this paper.