Both of these methods (LASSO vs. spike-and-slab) can be interpreted as Bayesian estimation problems where you are specifying different parameters. One of the main differences is that the LASSO method does not put any point-mass on zero for the prior (i.e., the parameters are almost surely non-zero a priori), whereas the spike-and-slab puts a substantial point-mass on zero.
In my humble opinion, the main advantage of the spike-and-slab method is that it is well-suited to problems where the number of parameters is more than the number of data points, and you want to completely eliminate a substantial number of parameters from the model. Because this method puts a large point-mass on zero in the prior, it will yield posterior estimates that tend to involve only a small proportion of the parameters, hopefully avoiding over-fitting of the data.
When your professor tells you that the former is not performing a variable selection method, what he probably means is this. Under LASSO, each of the parameters is almost surely non-zero a priori (i.e., they are all in the model). Since the likelihood is also non-zero over the parameter support, this will also mean that each is are almost surely non-zero a priori (i.e., they are all in the model). Now, you might supplement this with a hypothesis test, and rule parameters out of the model that way, but that would be an additional test imposed on top of the Bayesian model.
The results of Bayesian estimation will reflect a contribution from the data and a contribution from the prior. Naturally, a prior distribution that is more closely concentrated around zero (like the spike-and-slab) will indeed "shrink" the resultant parameter estimators, relative to a prior that is less concentrated (like the LASSO). Of course, this "shrinking" is merely the effect of the prior information you have specified. The shape of the LASSO prior means that it is shrinking all parameter estimates towards the mean, relative to a flatter prior.
A prior distribution that integrates to 1 is a proper prior, by contrast with an improper prior which doesn't.
For example, consider estimation of the mean, $\mu$ in a normal distribution. the following two prior distributions:
$\qquad f(\mu) = N(\mu_0,\tau^2)\,,\: -\infty<\mu<\infty$
$\qquad f(\mu) \propto c\,,\qquad\qquad -\infty<\mu<\infty.$
The first is a proper density. The second is not - no choice of $c$ can yield a density that integrates to $1$. Nevertheless, both lead to proper posterior distributions.
See the following posts which throw additional light on the use of improper priors issue and some closely related issues:
Flat, conjugate, and hyper- priors. What are they?
What is an "uninformative prior"? Can we ever have one with truly no information?
Best Answer
First let's look at Mitchell and Beauchamp (1988)[1], for a description of what a spike-and-slab prior is:
Now if $f_j$ is large-but-finite, this is a proper prior - we can even write down the cdf explicitly.
(You'll sometimes see people actually draw something like this: - which might help with picturing it in some sense, but the problem with that is what then does the y-axis represent? It can't be density because the spike represents probability, and it can't be probability because the uniform part represents density. The two parts are on completely different scales. This seems to encourage the mistaken notion of conflating probability and density.)
Mitchell and Beauchamp keep $f_j$ finite but assume that it's large enough that the relevant integrals from $-f_j$ to $f_j$ can be well approximated by integrals from $-\infty$ to $\infty$.
If, however, we take the limit as $f_j\to\infty$ then of course it wouldn't be a proper prior. When being used as a prior for variable selection this generally isn't going to be done because of the way that impacts variable selection (try it for a simple case).
Other priors have been given the name "spike and slab" since -- including the case with a Gaussian slab, as you mention. In that case, the prior is proper as long as the variance of the normal is finite.
[1]: Mitchell T.J. and Beauchamp, J.J. (1988),
"Bayesian Variable Selection in Linear Regression"
Journal of the American Statistical Association, Vol. 83, No. 404 (Dec.),1023-1032