Mutual information $I(X, Y)$ can be thought as a measure of reduction in uncertainty about $X$ after observing $Y$:
$$ I(X, Y) = H(X) - H(X|Y)$$
where $H(X)$ is entropy of $X$ and $H(X|Y)$ is conditional entropy of $X$ given $Y$. By symmetry it follows that
$$ I(X, Y) = H(Y) - H(Y|X)$$
However mutual information of a variable with itself is equal to entropy of this variable
$$ I(X, X) = H(X)$$
and is called self-information. This is true since $H(X|Y) = 0$ if values of $X$ are completely determined by $Y$ and this is true for $H(X|X)$. It is so because entropy is a measure of uncertainty and there is no uncertainty in reasoning on values of $X$ given the values of $X$, so
$$ X(X) - X(X|X) = X(X) - 0 = H(X) $$
This is immediately obvious if you think of it in terms of Venn diagrams as illustrated below.
You can also show this using the formula for mutual information and substituting the conditional entropy part, i.e.
$$ H(X|Y) = \sum_{x \in X, y \in Y} p(x, y) \log \frac{p(x,y)}{p(x)} $$
by changing $y$'s into $x$'s and with recalling that $X \cap X = X$, so $p(x, x) = p(x)$. [Notice that this is an informal argumentation, since for continuous variables $p(x, x)$ would not have a density function, while having cumulative distribution function.]
So yes, if you know something about $X$, then learning again about $X$ gives you no more information.
Check Chapter 2 of Elements of Information Theory by Cover and Thomas, or Shanon's original 1948 paper itself for learning more.
As about your second question, this is a common problem that in your data you do observe some values that possibly can occur. In this case the classical estimator for probability, i.e.
$$ \hat p = \frac{n_i}{\sum_i n_i} $$
where $n_i$ is a number of occurrences of $i$th value (out of $d$ categories), gives you $\hat p = 0$ if $n_i = 0$. This is called zero-frequency problem. The easy and commonly applied fix is, as your professor told you, to add some constant $\beta$ to your counts, so that
$$ \hat p = \frac{n_i + \beta}{(\sum_i n_i) + d\beta} $$
The common choice for $\beta$ is $1$, i.e. applying uniform prior based on Laplace's rule of succession, $1/2$ for Krichevsky-Trofimov estimate, or $1/d$ for Schurmann-Grassberger (1996) estimator. Notice however that what you do here is you apply out-of-data (prior) information in your model, so it gets subjective, Bayesian flavor. With using this approach you have to remember of assumptions you made and take them into consideration.
This approach is commonly used, e.g. in R enthropy package. You can find some further information in the following paper:
Schurmann, T., and P. Grassberger. (1996). Entropy estimation of symbol sequences. Chaos, 6, 41-427.
Best Answer
If you define $I(X; X)$ for continuous random variables at all, the proper value for it is infinite, not $I(X; X) = H(X)$. Essentially, the value of $X$ gives you an infinite amount of information about $X$. If $X$ is simply a uniformly random real number for instance, it almost surely takes infinite number of bits to describe it (there's no pattern like in e.g. pi).
OTOH, for different variables $X$ and $Y$ (no matter how similar), the value of $X$ always gives you only a finite amount of information about $Y$. If you zoom in sufficiently to some point in $p(x, y)$, it will look flat, so $X$ and $Y$ are practically independent inside that region. Nevertheless, describing where that region is takes a finite number of bits, and specifying the exact point in the region takes an infinite number of bits. The shared information about $X$ and $Y$ is in that finite number of bits, so the mutual information is finite. If, however, $X=Y$, then no matter how much you zoom, knowing $X$ will always tell you exactly where $Y$ is, giving you an infinity of information. That's why $I(X; X)$ is very different from $I(X, Y)$.
If that's not convincing, you can just try some calculations. Example: the mutual information of $(x, y)$ for a bivariate Gaussian with $Var(x) = Var(y) = 1$ and $Cov(x, y) = r$ is $I(x; y) = -0.5log(1-r^2)$, which goes to infinity as $r$ goes to $1$.