Let me assume that you seek for the generalization of Gaussian distribution in order to generalize the Brownian motion.
As far as I know, regarding the heat kernel as the generalization of the Gaussian distribution has long been adopted in many literatures. It comes from the following observation.
In $\mathbb {R}^1$, the following notions coincide:
(1) Gaussian distribution $N(x,t)\sim f(t,x,y)=\frac {1}{\sqrt{2\pi t}}e^{\frac{-(y-x)^2}{2t}}$,
(2) transition function $p(t,x,y)$ of the Brownian motion $B_t$,
(3) (heat kernel) fundamental solution $k_t(x,y)$ of the heat equation $\partial_t k=\Delta_y k$, with initial data $\delta_x$.
Thus, on manifolds, one way to define the Brownian motion is to construct a Markov process on the manifold whose transition function is exactly the heat kernel (let's identify the heat kernel with the Gaussian distribution in this setting). Since we always have the Laplacian-Beltrami $\Delta$ on a manifold, it is justifiable to talk about the heat equation and thus the heat kernel, and the Brownian motion in this sense is known to exist for a large class of manifolds.
But on metric spaces, we no longer have the Laplacian-Beltrami. So, in order to talk about heat kernel/Gaussian distribution, we need to generalize the notion of Laplacian-Beltrami. The key concept on this line the so-called Dirichlet form. A Dirichlet form on metric measure space $(X,d,\mu)$ a closed symmetric form $(\cdot,\cdot)$ defined on $L^2(X,\mu)$. It should further satisfy a couple of conditions so that it behaves like its prototype $(f,g)=\int_{M} {\nabla f\cdot \nabla g dx}$ on a manifold $M$. Notice that $(f,g)=(-\Delta f,g)_{L^2(M)}$ on manifolds, in the general case, one obtains the desired "Laplacian" by the same formula. Therefore, every Dirichlet form corresponds to a "Laplacian" and thus a Gaussian distribution (and thus a Brownian motion). What's more, a reasonable Dirichlet form always exists provided the space is suitably good.
In sum, if the space you are considering have both metric and measure structures, then the theory of Dirichlet form may provide you some satisfactory results regarding construction and properties of the Guassian distribution (and thus the Brownian motion). Roughly speaking, if we don't have a presumed measure, we may not be able to construct a reasonable probability space; if we don't have a metric, it would be hard to measure the regularity and decay of the Gaussian distribution. So metric measure structure might be the minimal structure for reasonable construction of Gaussian distribution.
Some reference books could be found in the above link. This paper by Sturm may allow you to have a glance at the whole picture. I am not an expert in this field. I apologize in advance for any mistake and naivety.
Using the Poisson summation formula, I find that the variance is
$$ \sigma^2 + \dfrac{1}{12} + \sum_{k=1}^\infty (-1)^k e^{-2\sigma^2 k^2 \pi^2} (4 \sigma^2 + 1/(\pi^2 k^2)) $$
If $\sigma$ is not too small, the series converges quite rapidly.
Best Answer
Yes, your $P(X) \propto \exp(a\cdot x +b\cdot x^2 + c\cdot x^3 +d\cdot x^4)$ maximises the entropy $-\int P(X){\rm log} P(X)dX$ for prescribed first four moments, if the skewness and curtosis lie in a certain range:
M. Rockinger and E. Jondau, Entropy densities with an application to autoregressive conditional skewness and kurtosis (2002).
This simple generalization of the normal distribution holds also if higher moments are prescribed, provided that the highest prescribed moment is even. If, for example, only mean, variance, and skewness, are prescribed, then $d$ would be zero and the distribution fails to normalize. In this case the maximum entropy distribution exists, but it has a more complicated form, as discussed here.
You'll still have to determine the coefficients $a,b,c,d$ from the given first four moments. The cited 2002 paper gives a description of an efficient method. There may be no solution (typically, if the skewness is too large relative to the kurtosis), meaning that a maximum entropy solution of this simple form does not exist (see Figure 1 of the paper). The combination of a prescribed skewness and kurtosis typically leads to a bimodal distribution (see Figures 2 and 3), which may or may not be desirable for your application.