Good questions; I'm sure a lot of people are confused on this stuff (as I was the first time I used Jackson).
Essentially your confusion boils down to being careful to consider the following fact:
The Green's function for a particular boundary value problem depends on the boundary conditions.
In particular, let's say you have a Dirichlet boundary value problem. Then, as Jackson shows on page 39, the appropriate Green's function for such a boundary value problem must (a) satisfy Poisson's equation with a delta function source in that region and (b) vanish on the boundary (see eq. 1.43) of that region. If you can find a function that has these two properties in the region you are considering, then you have found the Green's function for the Dirichlet problem.
So if we consider the half space ($z>0$) with dirichlet boundary conditions at $z=0$, then we are looking for a function that satisfies Poisson's equation in the upper half space with unit source and which vanishes on the boundary, which in this case is $z=0$ plus the "boundary at infinity."
You can check yourself that the function
$$
G(\mathbf x, \mathbf x') = \frac{1}{|\mathbf x - \mathbf x'|} - \frac{1}{|\mathbf x + \mathbf x'|}
$$
has these properties. The intuition for this, and the reason why most people say you can just immediately write this down, is that the first term clearly satisfies Poisson's equation with unit source in the upper half space where $\mathbf x'$ is being taken to have $z>0$, and the second term corresponds to having a unit source in the lower half space with opposite sign. Our intuition about potentials of points charges indicates that this will cause that combination to vanish at $z=0$. Also, the Laplacian of the second term vanishes in the upper half space, so it doesn't affect the fact that in the upper half space, this function satisfies Poisson's equation with unit source at $\mathbf x'$.
I hope that was clear? I can definitely try to clean it up or expand on this. I know from personal experience that it's a confusing topic!
Addendum in response to comments.
Green's functions are associated with a set of two data (1) A region (2) boundary conditions on that region. The function $1/|\mathbf x-\mathbf x'|$ is the Green's function for (1) All of space with (2) Dirichlet boundary conditions. This is because it (a) satisfies Poisson's equation with unit source in that region and (b) vanishes at the boundary of that region (which in this case is at infinity). In general, for any region $R$, for Dirichlet boundary conditions, as long as we simply find a function $G(\mathbf x,\mathbf x')$ that, (a) satisfies Poisson's equation with a unit source placed in that region, and that (b) vanishes on the boundary of the region, then we have found the Green's function for that Dirichlet problem (by the definition of a Green's function).
When we are trying to find the Green's function for the Dirichlet problem on the upper half space, we first imagine putting a point charge in the upper half space so that condition (a) is satisfied, this leaves us with the function $1/|\mathbf x-\mathbf x'|$ where $\mathbf x'$ is a point in the upper half space. Then, we notice that although this function is an appropriate solution to Poisson's equation, it does not vanish for $x$ on the boundary, so this can't be the Green's function for this Dirichlet problem. We need to do something to this function which does not spoil the fact that it satisfies the unit source Poisson equation in the upper half space but such that the resulting function additionally satisfies the appropriate boundary condition.
So we ask ourselves "what can we do to this function so that (a) remains satisfied, but so that (b) is also satisfied in the upper half space. Well we notice that if we add any function that satisfies Laplace's equation (Laplacian equaling zero) in the upper half space to $1/|\mathbf x-\mathbf x'|$, then the resulting function will still satisfy (a).
Now what sorts of functions satisfy Laplace's equation in the upper half space?
The answer is any charge distribution whose charge density is only nonzero outside of the upper half space will create a potential that satisfies Laplace's equation in the upper half space.
So if we can find a charge distribution that when placed in the lower half space produces a potential that when added to $1/|\mathbf x-\mathbf x'|$ causes their sum to vanish on the boundary, then their sum will satisfy the properties required of a Green's function. This is where we notice that an "image" point charge will do exactly this!
All we are doing with this point charges is an intuitive way of finding a function that satisfies the appropriate mathematical properties (a) and (b) that a Green's function for a Dirichlet problem must satisfy.
There are many possible choices regarding the overall scaling coefficients as well as the scaling coefficient converting time and frequency. It is possible to summarize these conventions succinctly using two numbers $a$ and $b$. I use the same notation as used in the Mathematica Fourier Transform function.
We define the Fourier Transform:
$$
\mathcal{FT}_{a,b}[f(t)](\omega) = \sqrt{\frac {|b|}{(2\pi)^{1-a}}}\int_{-\infty}^{+\infty} e^{+i b \omega t} f(t) dt
$$
And the inverse Fourier Transform
$$
\mathcal{FT}_{a,b}^{-1}[\tilde{f}(\omega)](t) = \sqrt{\frac{|b|}{(2\pi)^{1+a}}}\int_{-\infty}^{+\infty} e^{-i b \omega t} \tilde{f}(\omega) d\omega
$$
Let
$$
\tilde{f}_{a,b}(\omega) = \mathcal{FT}_{a,b}[f(t)](\omega)
$$
$$
\check{f}_{a,b}(t) = \mathcal{FT}_{a,b}^{-1}[\tilde{f}_{a,b}(\omega)](t)
$$
It can be shown via the Fourier inversion theorem that for the classes of functions we care about in physics $\check{f}_{a,b}(t) = f(t)$ for any $a$ and $b$. That is, for these definitions of the Fourier Transform and Inverse Fourier transform the two operations are inverses of eachother.
It's turns out that in the engineering and scientific literature there are many conventions that people choose depending mostly on what they are used to.
The first convention in the OP is $(a,b) = (1,-1)$ which is commonly used in physics, about as commonly as $(a,b) = (1,+1)$ which is the second convention you have shown.
In addition you will also see conventions where $(a,b) = (0,\pm1)$ where the factor of $2\pi$ is split evenly between the transform and inverse transform showing up with a square root.
Furthermore, usually in math or signal processing you will come across the $(a,b) = (0,\pm 2\pi)$ convention in which there is NO prefactor of $2\pi$ on either the transform or the inverse transform but now instead of angular frequency $\omega$ represents a cyclic frequency and a $2\pi$ appears in all of the exponentials.
All of these different conventions have advantages and disadvantages which may make one choice of convention more attractive than another depending on the application. The main point is that in any problem, whichever convention is chosen should be kept the same throughout the whole problem.
To get back to the OP's main question now. In the language set up in this answer the OP is basically asking if it matters whether $b=+1$ or $b=-1$. The short answer is that it does not matter. Either way works and converts the original signal as a function of time into a function of frequency. The difference has to do with how we interpret positive and negative frequencies.
Consider
$$
f^1(t) = e^{+i\omega_0 t}
$$
$$
f^2(t) = e^{-i \omega_0 t}
$$
The phasor for the first function rotates counterclockwise in phase space whereas the second rotates clockwise in phasespace.
If we choose the $b=-1$ convention then $\tilde{f}^1_{1,-1}(\omega)$ will have a nonzero contribution at $+\omega_0$ whereas $\tilde{f}^2_{1,-1}(\omega)$ will have a nonzero contribution at $-\omega_0$. We might say $f^1$ is a positive frequency signal while $f^2$ is negative.
However, if we choose $b=+1$ then everything reverses. $\tilde{f}^1_{1,+1}(\omega)$ will have a nonzero contribution at $-\omega_0$ while $\tilde{f}^2_{1,+1}(\omega)$ will have a contribution at $+\omega_0$. now $f^1$ is negative frequency and $f^2$ is positive frequency!
Thuse we see that both $b=+1$ and $b=-1$ give answer that we can interpret as frequencies with the only difference between the two being what we call positive and negative frequencies. As a note I personally prefer $(a,b)=(1,+1)$ because it makes the formula for the Fourier transform (which I use more often than the inverse transform) as simple as possible. No prefactor and no minus sign in the exponent.
edit: As you have pointed out sometimes these signs can have a substantial effect on some physical quantity such as reversing the sign (inverting the phase) of the complex impedance of a capacitor. Unfortunately this is something we just have to deal with and try to be consistent with our own conventions and those used by the references we consult. Of course you will find both conventions give the same answer for a real measurable quantity such as $V(t)$ across the resistor.
Best Answer
Your problem is not specific to d'Alembert's PDE, but can be traced back to ODE's, even the simple harmonic oscillator. Say you want to solve: $$ \frac{d^2G}{dt^2}+G = \delta $$ Going to Fourier transform: $$ \tilde G(\omega)=\int\frac{d\omega}{2\pi}e^{i\omega t}G(t) \\ G(t)=\int dte^{-i\omega t}\tilde G(t) \\ (-\omega^2+1)\tilde G = 1 $$ You want to divide by $1-\omega^2$. However, when $\omega=\pm1$ this is zero, so the "value" there is undefined. Using the theory of distributions, you can prove that $\hat G$ is necessarily: $$ \tilde G = PV\frac{1}{1-\omega^2}+C_+\delta(\omega-1)+C_-\delta(\omega+1) $$ with $PV$ denoting the Cauchy principal value and $C_\pm$ being constants. The converse is easy to check, using the fact that: $$ f(t)\delta(t) = f(0)\delta(t) $$ Going back to time: $$ G = \frac{\sin |t|}{2} +C_+e^{-it}+C_-e^{it} $$ with $H$ you see that $C_\pm$ is specified by the initial conditions $G(t_0),\frac{dG}{dt}(t_0)$.
Back to your case, the similar situation happens when you divide by $k^2-\omega^2$. From: $$ (-\omega^2+k^2)\tilde G = 1 $$ you deduce: $$ \tilde G = PV\frac{1}{-\omega^2+k^2}+C_+(k)\delta(\omega-|k|)+C_-(k)\delta(\omega+|k|) $$ with now $C_\pm$ are functions defined on $R^3$. They can be determined by the Fourier transforms of $G,\partial_tG$ at $t_0$ which will give the unique solution to the Cauchy problem.
In classical physics, you are often interested in the causal solutions, i.e. $G$ is supported on positive times (in field theory, you are rather more interested in the Feynman propagator which you obtain by Wick rotation or time ordering). It turns out, using Jordan's lemma and the residue theorem, that it amounts to taking the limit of $\epsilon\to 0^+$ of: $$ \tilde G = \frac{1}{k^2-(\omega-i\epsilon)^2} $$ and more generally making the substitution $\omega\to \omega-i\epsilon$.
Hope this helps.
Answer to comment
a) Actually the prescription retarded/advanced/Feynman will determine the $C_\pm$. Once again, there is an ambiguity in defining $\frac{1}{k^2-\omega^2}$ which these prescriptions help resolve. If you want to interpret it as the Cauchy $PV$, then you need to add the appropriate $C_\pm$.
However, in this case, it is easier to think directly in terms of contour integration since you are not interested in solving the Cauchy problem. Indeed, writing explicitly what the prescriptions mean for the $C_\pm$ is needlessly complicated.
b) It depends. The problem arises when you divide by zero, in which case you need additional information to uniquely define your Gren's function (boundary condition, causality ...). However, this is not always necessary. Take for example the operator $-\Delta+m^2$ with $m$ real in arbitrary dimensions. Then its Green's function is (in Fourier): $$ \tilde G = \frac{1}{k^2+m^2} $$ which makes sense no matter what since the denominator never vanishes. Adding boundary conditions would overdetermine it.
An important caveat is that when you are taking Fourier transforms, you are restricting your attention to tempered solutions. In general, you could have more solutions thus requiring further restrictions. Take for example a modification of the previous simple case: $$ -\frac{d^2G}{dt^2}+G = \delta $$ By Fourier transform, you obtain the unique tempered Green's function
$$ \tilde G = \frac{1}{1+\omega^2} \\ G = \frac{e^{-|t|}}{2} $$
however, if you are looking for distribution solutions in general, you could get for example: $$ G = \frac{e^{-|t|}}{2}+C_+e^t+C_-e^{-t} $$ with $C_\pm$ arbitrary constants and can be resolved by Cauchy boundary conditions. Note that when the coefficients are not zero, $G$ is not tempered, so the usual Fourier transform is ill defined.
In physics, you usually sweep under the rug such technical assumption (temperedness ...). To really guarantee unicity you'll need to do some rigorous math. Check out any course on distributions for more information.