The deduction theorem says if {$\gamma$, $\alpha$} $\vdash$ $\beta$, then $\gamma$ $\vdash$ ($\alpha$$\rightarrow$$\beta$). Now you can represent a proof of $\gamma$ from $\alpha$ by a sequence
$\alpha$
$\vdash$ $\beta$1
$\vdash$ $\beta$2
.
.
.
$\vdash$ $\gamma$.
Consequently by the deduction theorem we can obtain ($\alpha$$\rightarrow$$\alpha$), ($\alpha$$\rightarrow$$\beta1$), ..., ($\alpha$$\rightarrow$$\gamma$). I repeat for emphasis that we have ($\alpha$$\rightarrow$$\gamma$) by having the deduction theorem. Now there exists a theorem which says:
((p$\rightarrow$q)$\rightarrow$(p$\rightarrow$(r$\rightarrow$q))).
Thus to obtain ($\alpha$$\rightarrow$($\beta$$\rightarrow$$\gamma$)) we substitute p with $\alpha$, q with $\gamma$, and r with $\beta$ in ((p$\rightarrow$q)$\rightarrow$(p$\rightarrow$(r$\rightarrow$q))), and from ($\alpha$$\rightarrow$$\gamma$) we detach ($\alpha$$\rightarrow$($\beta$$\rightarrow$$\gamma$)).
We iterate this process, substituting p with $\alpha$, q with ($\beta$$\rightarrow$$\gamma$), and r with $\delta$, and then we can detach
($\alpha$$\rightarrow$($\delta$$\rightarrow$($\beta$$\rightarrow$$\gamma$))).
The catch here comes as that we should obtain ($\alpha$$\rightarrow$($\beta$$_n$$\rightarrow$$\gamma$)) first, then ($\alpha$$\rightarrow$($\beta$$_m$$\rightarrow$($\beta$$_n$$\rightarrow$$\gamma$)) next [where "m"=(n-1)], until we finally obtain ($\alpha$→(β1→(β2→(....→γ)))).
Now let's suppose that we have ($\alpha$→(β1→(β2→(....→γ)))), and $\alpha$. By detachment we can obtain (β1→(β2→(....→γ))). Thus, if we have ($\alpha$$\rightarrow$$\beta$1) also, we can obtain $\beta$1 and thus (β2→(....→γ)). Eventually we can obtain ($\alpha$$\rightarrow$$\gamma$).
So, the answer to your question, I think, is "yes".
But, notice that the deduction theorem itself doesn't directly give us such a transformation as your original post suggests. We used
((p$\rightarrow$q)$\rightarrow$(p$\rightarrow$(r$\rightarrow$q))) for this to work.
That said, if we have the deduction theorem, then ((p$\rightarrow$q)$\rightarrow$(p$\rightarrow$(r$\rightarrow$q))) will follow, as we can see from the following:
hypothesis 1 | (p→q)
hypothesis 2 || p
hypothesis 3 ||| r
detachment 1, 2 4 ||| q
conditional-in 3-4 5 || (r→q)
conditional-in 2-5 6 | (p→(r→q))
conditional-in 1-6 7 ((p→q)→(p→(r→q)))
If we have {(p→(q→p)), ((p→(q→r))→((p→q)→(p→r)))} as the basis used for proving the deduction theorem (other bases exist) we could also prove ((p→q)→(p→(r→q))) as follows:
axiom 1 (p→(q→p))
axiom 2 ((p→(q→r))→((p→q)→(p→r)))
1 p/q, q/r 3 (q→(r→q))
1 p/(q→(r→q)), q/p 4 ((q→(r→q))→(p→(q→(r→q))))
detachment 4, 3 5 (p→(q→(r→q)))
2 r/(r→q) 6 ((p→(q→(r→q)))→((p→q)→(p→(r→q))))
detachment 6, 5 7 ((p→q)→(p→(r→q))))
If we ignore the substitutions here, it looks like in this case that an axiomatic proof can work out as shorter than a natural deduction derivation.
Model theory is an excellent tool for talking about the semantics of formal languages, particularly first order logic with equality and with constants, functions and non-logical predicates, by building models out of sets. You can do other, fancier things with it like considering models of set theory itself, but I don't understand those applications and can't comment on them.
One interesting fact from model theory is the existence of nonstandard models of arithmetic. I think this is a good motivating example because it shows that axiomatizations of seemingly simple things like arithmetic do not pin down a semantics as much as we would like.
I can try to answer this from the perspective of someone who found out about model theory through the proof theory and programming language theory route. I don't know the subject well; I'm just a beginner. As a meta note, I think Noah Schweber is right and you should phrase your question more politely.
Basically, a model is a structure that assigns interpretations to various symbols and has a domain of discourse. The interpretations are all drawn from our universe of sets in our underlying set theory.
A symbol is an abstract component in a signature. It can be a constant symbol (like a nullary function), a function symbol, or a predicate symbol. Functions and predicates can have different arities. Normally, there's a single domain of discourse, but you can generalize this to multiple sorts if you want to.
So, symbols in signatures can be used in two ways. They can be used in your formal language. For instance, if $f$ is a function symbol, you can express properties about $f$ in your language.
$$ \forall x \mathop. \forall y \mathop. f(x, y) = f(y, x) $$
The function symbol $f$ is also supplied an interpretation in a given structure.
A set of sentences using non-logical symbols taken from a particular signature can be used to define a theory. A structure where all the sentences in a theory are true is called a model of the theory.
But, structures are also made of sets. This means that their domains have cardinalities.
This means that you can talk about the spectrum of a theory, which is the number of models up to isomorphism for each cardinality.
The two results that Noah Schweber mentioned in a comment constrain the shape of spectra of a theory for infinite cardinalities. NOTE: I'm counting models up to isomorphism here, not "raw" models.
The Löwenheim–Skolem theorem says that if your theory has an infinite model, it has at least one model of every infinite cardinality.
Morley's categoricity theorem says that if your theory has exactly one infinite model for some uncountable cardinality $\kappa$, then it has exactly one infinite model for all uncountable cardinalities.
Best Answer
About intuition.
Aristotle introduced a notion of consequence along the following lines. Suppose $A$ and $B$ are sentences, each of them either true or false. We study the relation :
Many logicians reckoned that the main use of logic is derive conclusions from premises, with the obvious goal that if we start from true premises, and we use “logical rules”, we will derive true conclusions. This idea of logic gives rise to the relation :
Now we try to translate all this into modern terms.
(1) translates into the relation of logical consequence $A \vDash B$ :
We “generalize” it putting in place of the sentence A a set $\Gamma$ of sentences :
which means that :
Lastly, we have the “special acse” when $\Gamma = \emptyset$ : $\vDash \varphi$ means that the sentence $\varphi$ is valid (in propositional logic it is a tautology) when it is true under every interpretation.
In this case, we call it also logical (or universally) valid sentence, meaning that it is true “under all possible circumstances”.
For (2) [see Neil Tennant, Natural logic (1978), page 5], we have that, in order to demonstrate the validity of an argument one needs a proof.
A proof of an argument is a sequence of steps starting from its premisses, taken as starting points, to its conclusion as end. Within a proof each step, or 'inferences', must be obviously valid.
A system of proof is a codification of these obviously valid kinds of inference, which we call rules of inference. A proof in accordance with these rules must, in order to meet the demands of certainty, satisfy the following conditions :
(i) It must be of finite size.
(ii) Every one of its steps must be effectively checkable.
A system of this type is a logical calculus (i.e.proof systems) formalizing the relation of derivability :
which is :
We “generalize” it putting in place of the sentence A a set $\Gamma$ of sentences :
Lastly, we have the “special case” when $\Gamma = \emptyset$ : $\vdash \varphi$ means that the sentence $\varphi$ is theorem of the calculus (it is provable) when it is derivable from no premises.
Obviously there must be no invalid proofs in our system : the system must be sound.
Moreover we wish to be able to prove any valid argument expressible in the language. The system, that is, must be complete (or adequate).