As $t \to 0$ the kernel $K(x,t)$ tends to the Dirac delta function $\delta(x)$ (which is not actually a function, but a so-called distribution, or generalized function).
You can recognize this at least informally because the area under $K(x,t)$ stays constant, equal to $1$, but as $t \to 0$ it becomes more and more concentrated at the point $x =0$, with a higher and higher peak there.
Formally, what this convergence statement means is that the equation that you wrote down holds, namely that $$a(x) = \lim_{t \to 0} \int_{-\infty}^{\infty} a(x') K(x-x',t)\, dx'.$$
(Note also that in your first convoluton integral, there is a typo, as pointed out by Pacciu; the $dt$ there should be $dx'$.)
Now think about $K(x,t)$ as $t$ tends from $0$ to $\infty$: it flows from being
a peak concentrated at $x = 0$ to a more-and-more shallow graph that spreads out over the whole $x$-axis. You should think of this as heat flowing from a point source at the origin and slowly spreading out over the whole axis. (Imagine blasting a point on a long steel beam with a blow torch for an instant, and then think about how the heat will diffuse along the beam.)
Now when you have initial conditions $a(x)$, this describes heat being applied at $t = 0$ not just at the point $x = 0$, but along the whole $x$-axis, according to the density $a(x)$. The convolution $a(x)*K(x,t)$ then describes how this heat has diffused through the line at time $t$; as sos440 writes in their answer, it is the superposition of the diffusion of the heat from each point $x$ that was present at time $t = 0$.
(If we took $a(x)$ to be $\delta(x)$, then we would be back at the situation of all the heat being initially concentrated at the single point $x = 0$; mathematically this corresponds to the formula $\delta(x)*K(x,t) = K(x,t)$ ---
i.e. the $\delta$ function is the identity for convolution.)
Added in response to a question in the comments below:
Imagine for a moment that we had a certain amount of heat $A_i$ initially
applied at the points $x_i$, for $i = 1, \ldots, n$. When one unit of heat is placed at $x =0$, it diffuses according to $K(x,t)$. So the amount $A_i$ of heat at $x_i$ diffuses according to $A_i K(x-x_i,t)$. (I am just changing the variable in $K(x,t)$ to shifts its centre from $x = 0$ to $x = x_i$, and scaling it by the amount $A_i$.)
So the total heat at a point $x$ and time $t$ will be $\sum_{i = 1}^n A_i K(x-x_i,t)$. (I am just adding up the heat which has arrived at the point $x$ at time $t$ from each of the points $x_1, \ldots, x_n$.)
Now imagine that instead of just having heat concentrated at $n$ point sources
at time $t$, we instead have heat distributed throughout the line with density $a(x)$, so that the amount of heat in the (infinitesimally) small interval $[x',x' + dx']$ is $a(x') dx'$. Then the above sum becomes the integral
$\int_{-\infty}^{\infty} a(x') K(x-x',t) dx'$, i.e. $a(x) * K(x,t)$.
Hence the amount of heat at a point $x$ at time $t$ is exactly given by
$a(x) * K(x,t)$, as your professor explained.
Best Answer
Your procedure cannot find all of the solutions that satisfy $u(x,0)=0$ .
Because the PDE is inhomogeneous generally and with only one condition $u(x,0)=0$ , So it is better to take Laplace transform on $t$ :
$\mathcal{L}_{t\to s^2}\{u_t\}-\mathcal{L}_{t\to s^2}\{u_{xx}\}=\mathcal{L}_{t\to s^2}\{g(x,t)\}$
$s^2U(x,s)-u(x,0)-U_{xx}(x,s)=G(x,s)$
$U_{xx}(x,s)-s^2U(x,s)=-G(x,s)$
$U(x,s)=C_1(s)e^{xs}+C_2(s)e^{-xs}-\dfrac{e^{xs}}{2s}\int_0^xG(x,s)e^{-xs}~dx+\dfrac{e^{-xs}}{2s}\int_0^xG(x,s)e^{xs}~dx$
$u(x,t)=\mathcal{L}^{-1}_{s^2\to t}\{C_1(s)e^{xs}\}+\mathcal{L}^{-1}_{s^2\to t}\{C_2(s)e^{-xs}\}-\mathcal{L}^{-1}_{s^2\to t}\left\{\dfrac{e^{xs}}{2s}\int_0^xG(x,s)e^{-xs}~dx\right\}+\mathcal{L}^{-1}_{s^2\to t}\left\{\dfrac{e^{-xs}}{2s}\int_0^xG(x,s)e^{xs}~dx\right\}$