I'm going to answer this in two parts. First I'll try to demystify the argument above, and then I'll say a bit about what an algorithm actually is or isn't.
The issue you raise is a common one - that the way we "defeat" $A$ feels circular, or at the very least slippery:
The part that trips me up is that in order to evaluate $A(B, int(B))$ in a programmatic manner would basically require invoking A once again, resulting in an infinite loop.
The "loopiness" of $A(B,int(B))$ - namely that there isn't really any "coherent interpretation" of what that computation should be doing - is exactly what we're shooting for: the weirdness of $A(B,int(B))$ is a sign that our original $A$ is dubious.
I think the reason that this feels slippery is that our intuition frequently assigns fault incorrectly. The shape of the argument is basically: "given $A$, we build $B$, and then weird stuff happens." This can make it feel like the weirdness is $B$'s fault, that is, that $B$ is the source of the "loopiness." However, this isn't right. Rather, $A$ itself (were it to exist) has "loopiness," and our construction of $B$ is merely unveiling the weird (and indeed impossible) behavior of $A$.
It may help to rephrase the theorem as follows. Say that a machine $A$ is halting-correct iff for every machine $C$ and number $n$ we have $$A(C,n)=1\implies C(n)\downarrow\quad\mbox{and}\quad A(C,n)=0\implies C(n)\uparrow.$$ Basically, $A$ might not answer, but if $A$ answers then $A$ gets the question "Does $C(n)$ halt?" correct. There are lots of halting-correct programs, such as:
On input $(C,n)$, run $C(n)$ for $17$ stages and output $1$ if the result halts, and don't output anything otherwise.
Just don't do anything on any input whatsoever.
Etc.
However, the argument above shows the following:
There is a "computable function from programs to programs," $\mathfrak{B}$, such that for every halting-correct $A$ we have $A(\mathfrak{B}(A),int(\mathfrak{B}(A)))\uparrow$.
(Compare this with the constructive version of Cantor's theorem: that there is a functional $\mathfrak{F}$ which takes in any map $f:\mathbb{N}\rightarrow\mathbb{R}$ and spits out a real $\mathfrak{F}(f)$ with $\mathfrak{F}(f)\not\in ran(f)$.)
Specifically, given a program $C$ the program $\mathfrak{B}(C)$ does the following: on input $X$, it runs $C(X, int(X))$. If that "subcomputation" never halts, then $\mathfrak{B}(C)(X)$ doesn't halt either. Otherwise, if that "subcomputation" halts and outputs $0$, $\mathfrak{B}(C)(X)$ halts and outputs $1$, and if that subcomputation halts and outputs $1$ then $\mathfrak{B}(C)(X)$ goes into an endless loop. Note that there's nothing hypothetical here: $\mathfrak{B}$ really does make sense, and for every program $C$ the program $\mathfrak{B}(C)$ really does exist and behave as described. And in the particular case that $C$ is halting-correct, $\mathfrak{B}(C)$ happens to have the additional nice property that $C(\mathfrak{B}(C), int(\mathfrak{B}(C)))$ does not halt - which tells us in particular that there is no total halting-correct program, or to put it another way that the halting problem is incomputable.
OK, now on to the other issue: how do we think about what a program actually is?
You get at this when you write:
I guess in this proof we are essentially thinking of the program $A$ as a lookup table which has a $0$ or $1$ for each $A(X, i)$. However, I feel like this goes against my understanding of computing in which I understand programs as things which operate upon inputs, with specific implementations, not predetermined lookup tables.
First of all, let me make a slight quibble. Actually $A$ is a "delayed lookup" table, or a lookup table with three variables: cell $(X,i,s)$ has either $0$, $1$, or $\Box$, depending on whether $A(X,i)$ has halted and output $0$ by stage $s$, has halted and output $1$ by stage $s$, or has not yet halted by stage $s$. (Without this delay we'd be able to tell ahead of time what a program is going to do!)
This is extremely important, but it's not relevant to what I think is your actual concern here: that lookup tables, no matter how many "dimensions" they have, are structureless or arbitrary in a way that real algorithms aren't. And the answer, unfortunately, is that this is just the way it is. Genuinely arbitrary programs really are morally equivalent to tables with values. Of course they'll often be presented as dynamical phenomena, like Turing machines, but that additional structure is really superficial once we look at the whole class of computable functions.
The disconnect between computable functions in full generality and the specific concrete algorithms we play around with in everyday life is a genuine intuitive hurdle. I think my take on it is the following. When I describe (say) the Euclidean algorithm to you, I'm not just describing an algorithm; I'm also describing its semantics, namely that the running of the program is paralleled by the transformation of some better-understood mathematical object (a pair of integers in this case). That semantic behavior is what I really care about, and it's what I have in mind when I prove that the Euclidean algorithm always halts.
The point is that this semantic interpretation is useful because it's more natural than the program itself. However, an arbitrary "Turing machine in the wild" need not have any obvious "natural semantics" describing its behavior; of course for any reasonable definition of "semantics" we can artificially produce one, but this will in general amount to just rephrasing the original machine itself. (For that matter, the Turing machine model is itself a semantics in some sense!)
So yes, we do in general have to adopt a more "austere" notion of what a program is or is doing. This will however become much more intuitive and comfortable over time. (Incidentally, a natural reaction at this point is "Well why don't we study 'meaningful algorithms' instead of arbitrary computations instead?" Well, it turns out that there are some issues there.)
For the first question, I'll give a proof with about the same level of rigor as the notion of "behavior" of a computable function. Without a slightly firmer definition, there's not much more to be done. Indeed, as Noah pointed out in the comments, for sufficiently loose definitions of "behavior", the theorem becomes false. This sounds like rice's theorem, and so I will basically reproduce a proof of that.
As you've noticed, behavior is a more general condition than halting. So we want to imitate the proof that $\mathsf{Halts}$ is undecidable and see what happens. The biggest issue is that we cannot directly plug our function into itself for a contradiction. We will need the following fact:
Enumerate the turing machines as $\mathsf{TM}_n$. For every computable $Q(x,y)$, there is an $e$ so that $\mathsf{TM}_e(y) = Q(e,y)$.
That is, the $e$th turing machine is the same as $Q(e,-)$. This theorem is commonly used in diagonalization arguments with turing machines, and it's a good one to have in your back pocket.
Let $f$ be a computable function, and $b$ be a behavior that we want it to have. We claim there is no function $B$ so that $B(f) = 1$ if and only if $f$ behaves like $b$.
Indeed, towards a contradiction say $B$ exists. We will assume the behavior is nontrivial. That is, we can find functions $f$ and $g$ which do, and don't, satisfy the behavior. Then we can define a new function $\mathsf{UhOh}$ as follows:
$$\mathsf{UhOh}(x,y) = \begin{cases} g(y) & B(\mathsf{TM}_x) \\ f(y) & \lnot B(\mathsf{TM_x}) \end{cases}$$
Note, as in the solution to the halting problem, that $\mathsf{UhOh}$ does the opposite of its input. That is, $$B(\mathsf{UhOh}(x,-)) \iff \lnot B(\mathsf{TM}_x).$$
Do you see contradiction around the bend? Let's use the theorem cited above! We know there is some $e$ so that $\mathsf{TM}_e(y) = \mathsf{UhOh}(e,y)$. Now, we ask the question: $B(\mathsf{TM_e})$?
$$B(\mathsf{TM_e}) \iff B(\mathsf{UhOh}(e,-)) \iff \lnot B(\mathsf{TM}_e)$$
UhOh indeed.
Thankfully, the second question is much easier. We want to know that "$h(x)$ is undefined" is not a semidecidable thing to check.
Recall that "$h(x)$ is defined" is a semidecidable thing to check. This is because if $h(x)$ is defined, then its computation will halt in say, $N$ steps. Then by waiting long enough we can check that $h(x)$ is defined.
Now we remember the following fact: Whenever $P$ and $\lnot P$ are semidecidable, they must both be decidable. This is because we can run our semideciders for $P$ and $\lnot P$ in parallel, and we know that one of them will give us an answer. But once we've answered, one, we've answered both!
So if "$h(x)$ is undefined" were semidecidable, then "$h(x)$ is defined" would be decidable. But this is obviously false (cf. the halting problem).
(Notice that while we contradict by using the halting problem, this is not a reduction to $\mathsf{Halts}$.)
I hope this helps ^_^
Best Answer
Reduction. That is almost always the method that we use. Details of the reduction process differ, depending on the problem. But we almost always show that if there were an algorithm for our specific kind of problem, then we could use that algorithm as a subroutine to produce an algorithm that solves the Halting Problem.
Define machine $M_n$ as follows. Given input $0$, $M_n$ produces output $111$ and halts. Given any other input, such as $1$, $M_n$ produces the number $n$, then imitates the behaviour of a universal Turing machine on input $(n,n)$, except that if it halts, it produces output $111$. The index of $M_n$ is a recursive function of $n$.
If we had a general algorithm for determining whether a machine has the same halting behaviour on inputs $0$ and $1$, we could apply it to the machines $M_n$, and have an algorithm for determining whether a universal Turing machine halts on input $(n,n)$. However, the usual diagonalization argument shows that there is no such algorithm.