You should note that a solution, $f$, to your differential equation, $\mathcal{L}[f] = 0$, is the steady state solution to the second equation, as $\partial_t f = 0$. By turning this into a parabolic equation, only the error term will depend on $t$, and it will decay with time. This can be seen by letting
$$h(x,y,t) = f(x,y) + \triangle f(x,y,t),$$
where $f$ is as before. Then
$$\mathcal{L}[h] = \mathcal{L}[\triangle f] = \partial_t \triangle f$$
In general, this method makes the equations amenable to minimization routines like steepest descent.
Edit: Since you mentioned that you wanted a book to reference, when I was taking numerical analysis, we used v. 3 of Numerical Mathematics and Computing by Cheney and Kincaid, and I found it very useful. Although, at points it lacked depth, however it provided a good jumping off point. They also have a more mathematically in depth book Numerical analysis: mathematics of scientific computing that may be useful to you, which I have not read.
Notice the flux within any infinitesimal region inside your domain of interest is
$$
F = -C\,\nabla I,
$$
which measures the change of the density inside this infinitesimal region under the effect of the diffusion $C$. This is the called the flux, by divergence theorem, this actually is the amount of $I$ flowing into this infinitesimal region:
$$
\nabla \cdot F := \frac{1}{\delta V}\iint_{\text{boundary of }\delta V} F\cdot (\text{outward normal})\,dS.
$$
By conservation law, the diffusion equation is actually
$$
I_t +\nabla \cdot F =0,
$$
and this is
$$
I_t = \nabla \cdot (C\nabla I) = \nabla C \cdot \nabla I + C\,\nabla \cdot (\nabla I) = \nabla C \cdot \nabla I + C\Delta I.
$$
When it is constant diffusion $C=k$, first term is gone, that is why you have the heat equation. But now $C = g(\|I\|)$ and you can't do that.
Backstory of this: Non-constant diffusion causes convection. Thinking hot air (larger diffusion constant) vs cold air(smaller diffusion constant), you get blowing wind (convection).
Best Answer
For equation $(1)$, the problem asks you to consider functions $f:\Omega\subset \Bbb R^2\longrightarrow \Bbb R$ that agree with $f^*$ on the boundary $\partial \Omega$ of $\Omega$, that is, $f=f^*$ on $\partial\Omega$. There are many such functions. For each of them, we may consider the value
$$I(f) = \int_\Omega{\lVert\nabla f\rVert}^2$$
Because $\Omega\subset\Bbb R^2$, we have that $f$ is a function of two variables, say $(x,y)$. Then, $\nabla f(x,y)$ is the vector of partial derivatives $\left(\frac\partial{\partial x}f(x,y), \frac\partial{\partial y}f(x,y)\right)$. Hence,
$${\lVert\nabla f(x,y)\rVert}^2 = {\left(\frac\partial{\partial x}f(x,y)\right)}^2 + {\left(\frac\partial{\partial y}f(x,y)\right)}^2,$$
which is a real number, so the integral $I(f)$ is a real number too. Equation $(2)$ tells you that the 'simplest interpolant $f$ of $f^*$ over $\Omega$' is the $f$ which minimizes $I(f)$; call it $f_0$.
Moreover, equation$(2)$ tells you that $f_0$ satisfies
$$\Delta f_0 = 0$$
on $\Omega$, meaning that for all $(x,y)$ on $\Omega$
$$\Delta f_0(x,y) = \frac{\partial^2}{\partial x^2}f(x,y) + \frac{\partial^2}{\partial y^2}f(x,y) = 0.$$