It's not a question where people are typically much concerned with efficiency. When one contemplates implementing the lambda calculus in practice, the target model is usually not one of arithmetic on a single huge integer at a time, but a more realistic model such as a RAM with bounded word size. And then you would be thinking in terms of pointers and linked data structures (or perhaps even byte sequences) rather than arithmetization.
On the other hand, for purely theoretic considerations, Gödel numbers do the job well enough that an improved scheme is not really a publishable matter.
However, for "recreational" purposes, what you would do is start by selecting some pairing function $f:\mathbb N\times\mathbb N\hookrightarrow\mathbb N$. There are various choices which strike different balances between representation efficiency and how cumbersome they are to encode and decode.
Once you have that, one plan could be to represent lambda terms with de Bruijn indices:
$$ \begin{align} [\![ M N ]\!] &= 3 f( [\![ M ]\!], [\![ N ]\!] ) \\
[\![ \lambda . M ]\!] &= 1+3 [\![ M ]\!] \\
[\![ v^n ]\!] &= 2 + 3n \end{align} $$
(remember that with de Bruin indices, the variable in a lambda binder is implicit, and an occurrence of a variable is represented as number of other binders nested between the binding and bound occurrence of the variable. Here I'm showing that as $v^n$, which means "the variable bound $n$ levels above here" -- some sources, such as the Wikipedia article, just use the number $n$ itself as the syntax for a variable occurrence).
De Bruijn notation is convenient for manipulating lambda terms, because it abstracts away from $\alpha$-conversion, but if you would rather represent lambda terms closer to how it looks written down, you could do
$$ \begin{align} [\![ M N ]\!] &= 3 f( [\![ M ]\!], [\![ N ]\!] ) \\
[\![ \lambda x_n . M ]\!] &= 1+3 f( n, [\![ M ]\!] ) \\
[\![ x_n ]\!] &= 2 + 3n \end{align} $$
Alternatively, you could represent a combinator calculus such as $\mathbf{SKI}$ like you propose:
$$ \begin{align} [\![ \mathbf I ]\!] &= 0 \\
[\![ \mathbf K ]\!] &= 1 \\
[\![ \mathbf S ]\!] &= 2 \\
[\![ M N ]\!] &= 3 + f( [\![ M ]\!], [\![ N ]\!] ) \end{align} $$
The possibilities are practically endless. And note that all of this makes as much sense for representing logical formulas as it does for lambda terms. The prevalence of prime factorizations in the literature about arithmetization even in logic settings is to a large extent just a matter of inertia and the legacy of Gödel's original work. From a technical standpoint a more structured encoding such as the above ones would work just as well.
One point in favor of using an encoding based on symbol sequences (such as tranditional Gödel numbers) rather than one based on tree structures (like above) is that it -- supposedly -- makes it easier to convince an uninitiated reader that what goes on in the arithmetics is indeed exactly the same as one can do mechanically with symbols on paper. However, that depends on the background of the assumed reader. For example, if one is writing for a computer-science audience, going directly to trees will be very likely to feel more natural to them.
Untyped $\lambda$-calculus and untyped combinatory terms both have a form of the $Y$ combinator, so we introduce types to avoid infinite reductions. As (untyped) systems, they are equivalent only as equational theories. To elaborate;
Take $(\Lambda , \rightarrow_{\beta\eta})$ to be the set of $\lambda$-terms equipped with the $\beta\eta$-relation (abstraction and application only, I am considering the simplest case) and $(\mathcal{C}, \rightarrow_R)$ be the set of combinatory terms equipped with reduction. We want a translation $T$, be a bijection between the two sets that respects reduction. The problem is that reduction in combinatory terms happens "too fast" when compared with $\beta\eta$, so such a translation does not exist. When viewed as equational theories, i.e. $(\Lambda , =_{\beta\eta})$ and $(\mathcal{C}, =_R)$, we can find a translation that respects equalities.
An interesting fact is that $=_{\beta\eta} \approx =_{ext}$, where $=_{ext}$ is a form of extensional equality on $\lambda$-terms.
To answer your question, when you view these systems as equational theories, they are both expressible (in FOL) since one of them is and this translation exists. But using just reduction, $\lambda$-calculus has the notion of bound variables, while combinatory logic does not. Bound variables require more expressibility in the system, more than FOL. You need to have access to FOL expressions in order to differentiate them from free variables, which you cannot do from inside the system.
Concepts related to, and explanatory of my abuse of terminology when I say "you need to have access to expressions of FOL" are the $\lambda$-cube and higher-order logics from formal systems, and "linguistic levels" from linguistics. Informally it is the level of precision the system in question has. Take for example the alphabet $\{a,b\}$ and consider all words from this alphabet. This language requires the minimum amount of precision to be expressed, since it is the largest language I can create from this alphabet. Now consider only words of the form $a^nb^n$. This language requires a higher level of precision for a system to be able to express it, because you need to be able to "look inside" a word to tell if belongs to the language or not. You can read more about this topic on https://devopedia.org/chomsky-hierarchy
Best Answer
In general it is not the case that given lambda terms $f,g,h$, then $(fg)h = f(gh)$. For example, let $I=\lambda x.x$, and take $$f =\lambda x.\lambda y.x,\quad g=h = I$$ then $(fg)h = I$, however $f(gh) = \lambda y.I$.
However, you could define a composition term $$c = \lambda f.\lambda g.\lambda x.f(gx)$$ and show that $$((c((cf)g))h) = ((cf)((cg)h)).$$ Notice that this notation is hard to read, and for this reason it is common to associate application of lambda terms to the left, that is we can instead write $$c(cfg)h = cf(cgh).$$
To show this we just expand $c$ as follows: \begin{align*} c(cfg)h &= \lambda x.(cfg)(hx)\\ &=\lambda x.(\lambda x.f(gx))(hx)\\ &=\lambda x. f(g(hx))\\ &=\lambda x.f((\lambda x.g(hx))x)\\ &=\lambda x.f((cgh)x)\\ &=cf(cgh). \end{align*}