Here are a few remarks. Perhaps someone else can fill in the details.
1) Basis representations are always a good idea. It's hard to avoid them if you want to actually do something computational with your covariance function. The basis expansion can give you an approximation to the kernel and something to work with. The hope is that you can find a basis that makes sense for the problem you are trying to solve.
2) I'm not quite sure what you mean by a configuration in this question. At least one of the basis functions will need to be a function of $\theta$ for the kernel to depend on $\theta$. So yes, the eigenfunctions will vary with the parameter. They would also vary with different parametrizations.
Typically, the number of basis functions will be (countably) infinite, so the number won't vary with the parameter, unless some values caused the kernel to become degenerate.
I also don't understand what you mean in point 2 about the process having Bayesian prior $w \sim \mathcal{N}(0,diag[\lambda_1^2, \ldots])$ since $w$ has not been mentioned up until this point. Also,$diag[\lambda_1^2, \ldots]$ seems to be an infinite dimensional matrix, and I have a problem with that.
3) Which set of basis functions form valid kernels? If you're thinking about an eigenbasis, then the functions need to be orthogonal with respect to some measure. There are two problems. 1) The resulting kernel has to be positive definite ... and that's OK if the $\lambda_i$ are positive. And 2) the expansion has to converge. This will depend on the $\lambda_i$, which need to dampen fast enough to ensure the convergence of the expression. Convergence will also depend on the domain of the $x$'s
If the basis functions are not orthogonal then it will be more difficult to show that a covariance defined from them is positive definite. Obviously, in that case you are not dealing with an eigen-expansion, but with some other way of approximating the function of interest.
However, I don't think people typically start from a bunch of functions and then try to build a covariance kernel from them.
RE: Differentiability of the kernel and differentiability of the basis functions. I don't actually know the answer to this question, but I would offer the following observation.
Functional analysis proceeds by approximating functions (from an infinite dimensional space) by finite sums of simpler functions. To make this work, everything depends on the type of convergence involved. Typically, if you are working on a compact set with strong convergence properties (uniform convergence or absolute summability) on the functions of interest, then you get the kind of intuitive result that you are looking for: the properties of the simple functions pass over to the limit function -- e.g. if the kernel is a differentiable function of a parameter, then the expansion functions must be differentiable functions of the same parameter, and vice-versa. Under weaker convergence properties or non-compact domains, this does not happen. In my experience, there's a counter-example to every "reasonable" idea one comes up with.
Note: To forestall possible confusion from readers of this question, note that the Gaussian expansion of point 1 is not an example of the eigen-expansion of point 2.
First of all, your question is not quite well-posed. The reason is that Mercer's theorem only applies for the case of a kernel defined on a finite measure space. Practically, this means that in order to apply the theorem, the eigenfunctions $\phi_i$ are in fact taken with respect to the operator $$K_{\mu}(f)= \left( x\mapsto \int_{\mathbb{R}} K(x,y)f(y)\mu(dy)\right)$$ where $\mu(dy)=p(y)dy$ is a probability measure. The $\phi_i$ are then orthonormal wrt to the inner product defined by $<f,g>=\int f(x)g(x)\mu(dx)$.
It is simple to see that the condition $\mu(\mathbb{R})<\infty$ is necessary for Mercer's theorem to hold. Consider the identity:
$$\int_{\mathbb{R}} e^{-(x-y)^2}xdx=\sqrt{\pi}y$$
This shows that the function $f(x)=x$ is an eigfunction of the operator $\int K(x,y)f(y)dy$. But evidently $\int_{\mathbb{R}} f(x)^2dx=\infty$, which shows that it is not possible to construct an orthonormal basis of $K$, without introducing a weighting function $p(y)$.
Secondly, I am assuming there should be a minus sign in the definition of the kernel $e^{-|x-y|}$, otherwise the resulting kernel fails to be positive definite.
Best Answer
Neural networks are not a kernel, they are a learning algorithm.
Plenty of kernel functions exist, such as:
The right kernel depends very much on the nature of the data. Often the best kernel is a custom-made one, particularly in bioinformatics. The Gaussian/RBF and linear kernels are by far the most popular ones, followed by the polynomial one.