First, I'm surprised that nobody has pointed out that reading $\vdash$ as "infers" is simply wrong: implies versus infers.
You might read $\vdash$ as "proves" or "entails". On the other hand, "infers" is roughly the same as "deduces". Saying $A\vdash B$ means that one can deduce $B$ from $A$; if you read $\vdash$ as "infers" you're reading $A\vdash B$ as "$A$ deduces $B$", which, regardless of whether it makes any sense, certainly does not mean the same thing.
On the difference between $\to$ and $\vdash$: $A\to B$ is just a formula in ur formal system; it does not say anything (it's not an assertion). On the other hand, $A\vdash B$ is a statement about the formulas $A$ and $B$; it says that given $A$ there is a proof of $B$ in whatever formal proof system we're taking about.
If the proof system is sound and complete then $A\vdash B$ is equivalent to "$A\to B$ is a tautology". But jumping from there to the conclusion that $A\vdash B$ is equivalent to $A\to B$ is wrong; "$A\to B$ is a tautology" is a statement about $A$ and $B$, while $A\to B$ is simply not a statement at all.
An analogy from algebra: if $x$ and $y$ are numbers then $x>y$ is a statement about $x$ and $y$, while $x-y$ is not a statement at all, it's just a number. It is true that "$x>y$ is equivalent to $x-y>0$", but if you concluded that "$x>y$ is equivalent to $x-y$" that would be clearly nonsense. Going from the true fact "$A\vdash B$ is equivalent to the statement that $A\to B$ is a tautology" to "$A\vdash B$ is equivalent to $A\to B$" is making exactly the same error
To answer the last paragraph first, there is no presumption of an interpretation here. To say whether or not $\to$ could be any arbitrary connective in this case, it depends on what you mean. If the question is whether, from the same axioms and rules of inference, you could derive the same thing for $\vee$ or $\wedge$, the answer is no; the proof rules treat the connectives differently. If the question is whether you can derive it regardless of what kind of changes you made to the semantics, the answer is yes; changing the semantics may break things like soundness or completeness, but it won't change the notion of proof.
And since this theorem is stated outside of the logic in question, it means the proof lives in some kind of metalanguage, but the metalanguage may be another formal language (like ZFC or something). In practice, the proofs you read will usually be presented informally, as a sketch of how you could potentially implement it in some more formal language, but lots of these theorems have been proved in a completely formal way with software like Coq.
As for how to prove it, I steal this approach for a Hilbert system from Bell & Machover's A Course in Mathematical Logic. First, we need to understand what a deduction is for a Hilbert-style proof system. Given a set of formulas $\Phi$, a deduction of $\alpha$ from $\Phi$ is a sequence of formulas $\varphi_1,\ldots,\varphi_n$ such that (A) $\varphi_n=\alpha$, and (B) for each $k$ with $1\leq k\leq n$, $\varphi_k$ is either an axiom, a member of $\Phi$, or there are $i,j\lt k$ with $\varphi_j=\varphi_i\to\varphi_k$ (i.e. $\varphi_k$ can be obtained by modus ponens from previous formulas in the proof). If there exists a sequence like this, we say $\Phi\vdash \alpha$.
So let's single out a particular member of $\Phi$ and call it $\beta$, and assume we have any particular deduction $D=[\varphi_1,\ldots,\varphi_n=\alpha]$. There are basically two cases.
Case 1: $\varphi_k$ is an axiom, or in $\Phi$, we have already that $\varphi_k\to (\beta\to\varphi_k)$ is an axiom, then we can add to our deduction $D$ the three additional steps
- $\varphi_k$
- $\varphi_k\to(\beta\to\varphi_k)$
- $\beta\to\varphi_k$
The new deduction is now a deduction $\Phi\vdash\beta\to\alpha$.
Case 2: If there are $i,j\lt k$ with $\varphi_j=\varphi_i\to\varphi_k$, then we can add to $D$ the new lines $$[\varphi_i\to(\beta\to\varphi_i),\:\beta\to\varphi_i,\:(\varphi_i\to\varphi_k)\to(\beta\to(\varphi_i\to\varphi_k),\:\beta\to(\varphi_i\to\varphi_k)].$$ Another axiom of most Hilbert systems is $$(\beta\to(\varphi_i\to\varphi_k))\to((\beta\to\varphi_i)\to(\beta\to\varphi_k)).$$ If in addition to the four steps we just added we also add this axiom and the obvious ones that follow from applying modus ponens, then the new deduction is again one that $\Phi\vdash\beta\to\varphi_k$. In particular, we obtain $\beta\to\alpha$ since $\alpha=\varphi_n$.
Best Answer
The theorem says that to prove an implication it is enough to assume the hypothesis and proceed to prove the conclusion. Proofs of that kind tend to be more natural than proofs that conclude the implication directly.
Just as in regular mathematical practice: many theorems have the form "Assuming $A$, then we have $B$", and we usually prove them by assuming $A$ and using it along the way to conclude $B$. And we feel we are done, even though the task was not to prove $B$, but instead "$A$ implies $B$".
So the theorem is saying that the rules of propositional calculus "capture" a common way of reasoning in practice. The more of such rules we manage to establish the easier it becomes to use the calculus to prove statements. The first few formal proofs always feel strange, perhaps even artificial. These results, such as the deduction theorem, aim at turning this rigid, "artificial" proof system into a natural and useful tool.
There is another key reason why the theorem is useful, now in the context of propositional logic itself, namely, it is a basic tool that helps us prove the completeness theorem. We establish other metatheorems along the way (such as: We can argue by contradiction), and all combined help us in making feasible the task of showing that the proof system is complete: We define a notion of truth for propositional formulas (via truth tables), and completeness says that the proof system is enough to deduce all true implications. This is a very useful first step in more elaborate contexts where we may want to mechanize proofs in more complicated logics.