Reduction. That is almost always the method that we use. Details of the reduction process differ, depending on the problem. But we almost always show that if there were an algorithm for our specific kind of problem, then we could use that algorithm as a subroutine to produce an algorithm that solves the Halting Problem.
Define machine $M_n$ as follows. Given input $0$, $M_n$ produces output $111$ and halts. Given any other input, such as $1$, $M_n$ produces the number $n$, then imitates the behaviour of a universal Turing machine on input $(n,n)$, except that if it halts, it produces output $111$. The index of $M_n$ is a recursive function of $n$.
If we had a general algorithm for determining whether a machine has the same halting behaviour on inputs $0$ and $1$, we could apply it to the machines $M_n$, and have an algorithm for determining whether a universal Turing machine halts on input $(n,n)$. However, the usual diagonalization argument shows that there is no such algorithm.
This is not what you asked for, because it's not a proof. But it is an argument.
Consider some famous and unresolved problem of mathematics, such as the twin primes conjecture. (Or the Collatz conjecture, the Goldbach conjecture, or, until recently, the Fermat conjecture.) I am sure you recall that the conjecture is that there are arbitrarily large numbers $p$ and $p+2$ that are both prime. ("Twin primes")
It's easy to write a computer program which, given an input $N$, looks for a pair of twin primes larger than $N$, and which halts when it finds such a pair.
The twin primes conjecture is true if, and only if, this program halts on all inputs.
Therefore, if there were a reliable way to tell if a program halts on all inputs, the twin primes conjecture would be easy to resolve, and we could conclude from the fact that it is not resolved that mathematicians are all a bunch of dummies.
Now maybe your students don't care about the twin primes conjecture and perhaps they are not sure that mathematicians are not all a bunch of dummies. But they are familiar with problems and puzzles in general, and they are probably familiar with the idea that it is sometimes not only hard to find solutions, but it can even be hard to see ahead of time if if there is a solution. They can probably be persuaded that you can get a computer to search exhaustively for the solution to some problem, and halt and print it out if it finds one.
A solution to the halting problem would mean that a very large class of problems could be easily solved. You would write a program to search for a solution, and then, instead of running it, you would analyze the program to see if it would halt. If the answer was yes, you would know there was a solution, and all you had to do to find it was to run the program. On the other hand if the program would not halt, you wouldn't have to bother to run it because you would know there was no solution.
That would be a very different world from the one we actually live in, and it may be possible to persuade the students of that.
Best Answer
For the first question, I'll give a proof with about the same level of rigor as the notion of "behavior" of a computable function. Without a slightly firmer definition, there's not much more to be done. Indeed, as Noah pointed out in the comments, for sufficiently loose definitions of "behavior", the theorem becomes false. This sounds like rice's theorem, and so I will basically reproduce a proof of that.
As you've noticed, behavior is a more general condition than halting. So we want to imitate the proof that $\mathsf{Halts}$ is undecidable and see what happens. The biggest issue is that we cannot directly plug our function into itself for a contradiction. We will need the following fact:
That is, the $e$th turing machine is the same as $Q(e,-)$. This theorem is commonly used in diagonalization arguments with turing machines, and it's a good one to have in your back pocket.
Let $f$ be a computable function, and $b$ be a behavior that we want it to have. We claim there is no function $B$ so that $B(f) = 1$ if and only if $f$ behaves like $b$.
Indeed, towards a contradiction say $B$ exists. We will assume the behavior is nontrivial. That is, we can find functions $f$ and $g$ which do, and don't, satisfy the behavior. Then we can define a new function $\mathsf{UhOh}$ as follows:
$$\mathsf{UhOh}(x,y) = \begin{cases} g(y) & B(\mathsf{TM}_x) \\ f(y) & \lnot B(\mathsf{TM_x}) \end{cases}$$
Note, as in the solution to the halting problem, that $\mathsf{UhOh}$ does the opposite of its input. That is, $$B(\mathsf{UhOh}(x,-)) \iff \lnot B(\mathsf{TM}_x).$$
Do you see contradiction around the bend? Let's use the theorem cited above! We know there is some $e$ so that $\mathsf{TM}_e(y) = \mathsf{UhOh}(e,y)$. Now, we ask the question: $B(\mathsf{TM_e})$?
$$B(\mathsf{TM_e}) \iff B(\mathsf{UhOh}(e,-)) \iff \lnot B(\mathsf{TM}_e)$$
UhOh indeed.
Thankfully, the second question is much easier. We want to know that "$h(x)$ is undefined" is not a semidecidable thing to check.
Recall that "$h(x)$ is defined" is a semidecidable thing to check. This is because if $h(x)$ is defined, then its computation will halt in say, $N$ steps. Then by waiting long enough we can check that $h(x)$ is defined.
Now we remember the following fact: Whenever $P$ and $\lnot P$ are semidecidable, they must both be decidable. This is because we can run our semideciders for $P$ and $\lnot P$ in parallel, and we know that one of them will give us an answer. But once we've answered, one, we've answered both!
So if "$h(x)$ is undefined" were semidecidable, then "$h(x)$ is defined" would be decidable. But this is obviously false (cf. the halting problem).
(Notice that while we contradict by using the halting problem, this is not a reduction to $\mathsf{Halts}$.)
I hope this helps ^_^