(I have edited the question changing $f$ to $g$ in order to keep the symbol $f$ for probability density functions).
First, indeed, when the transformation is linear(affine actually), the covariance involving the transformation will continue to be non-zero. For non-linear transformations you cannot obtain very specific conditions. My approach is the following:
For compactness, set $Z=g\left(X\right)$. Then you require the conditions under which
$$\operatorname{Cov}(Z,Y)\neq 0? \;\Rightarrow\; E\left(ZY\right)\,-\,E\left(Z\right)E\left(Y\right)\, \neq0?\qquad [1]$$
Using Integrals to express the expected values, we require (assuming certain regularity conditions) $$\int_{D_Z} \int_{D_Y} zyf_{Z,Y}\left(z,y\right)\,dy\,dz \;-\;\left(\int_{D_Z}zf_{Z}\left(z\right)dz\right) \left(\int_{D_Y} yf_{Y}\left(y\right)\,dy\right)\neq0?$$
$$\Rightarrow \int_{D_Z} \int_{D_Y} zyf_{Z,Y}\left(z,y\right)\,dy\,dz \;-\;\int_{D_Z} \int_{D_Y} zyf_{Z}\left(z\right)f_{Y}\left(y\right)\,dy\,dz\neq0?$$
$$\Rightarrow \int_{D_Z} \int_{D_Y} zy\left[f_{Z,Y}\left(z,y\right) \;-\;f_{Z}\left(z\right)f_{Y}\left(y\right)\right]\,dy\,dz\neq0?\qquad[2]$$
where the $f()$ functions are joint or marginal probability density functions, and $D$ denotes the support. Now define $\omega \equiv(z,y)$ and note that $zy\;$,$\,z\;$,$\;y\;$, can all be derived as linear transformations of $\omega$, say $zy=\tau_0(\omega)\;$,$\,z=\tau_1(\omega)\;$,$\;y=\tau_2(\omega)\;$. For example $z=\omega[1\; 0]'$, the prime denoting the transpose. Also define $C\equiv D_Z\times D_Y$. Then we can compact even further by writing
$$\cdots\Rightarrow \int_C h\left[\omega,\tau_0(\omega),\tau_1(\omega),\tau_2(\omega)\right]d\omega\neq0?\qquad[3]$$
Why all this trouble? Because now it is clear that what you are asking is the requirements under which a double definite integral of a function of its variable does not equal zero when integrated over the whole domain of its variable. Reversely, we want to exclude the conditions under which this integral equals zero.
First since $\operatorname{Cov}(X,Y)\neq 0\;$, it implies that the random variables $X$ and $Y$ are not independent. Therefore we have $f_{Z,Y}\left(z,y\right) \;\neq\;f_X \left(x\right)f_Y \left(y\right)$ . But this does not guarantee that the double integral will be non-zero. It may still equal zero if the integrand in eq.[2] is an odd function w.r.t. to at least one of $z$ or $y$, and if the domain w.r.t to this variable is symmetric around the origin (which will hold if for example $D= \Bbb R$ - extended). Assume that indeed $C=\Bbb R\times\Bbb R$ extended. Therefore we must guarantee that the integrand
$zy\left[f_{Z,Y}\left(z,y\right) \;-\;f_{Z}\left(z\right)f_{Y}\left(y\right)\right]$
is not an odd function w.r.t. to either $z$ or $y$ separately. Now if you work the integrand, substituting $-z$ for $z$ etc, you will be able to reduce the condition into the following: "The non-linear transformation should not be such as to make both $f_{Z,Y}\left(z,y\right) \;$ and $f_{Z}\left(z\right)$ even functions of $z$."
For $Y$ this could be stated as "If the pdf of $Y$ is an even function, then the non-linear transformation should not be such as to make $f_{Z,Y}\left(z,y\right) \;$ an even function of $y$.". Obviously this is distribution-specific (or at least distribution family- specific) and you cannot go more specific than that. If it looks disappointingly general, imagine a true jungle. Then independence is a straight clear path through this jungle, and linear dependence is an elliptical clear path approaching asymptotically the straight path of independence. The rest of the jungle is non-linear dependence.
$$ \text{}$$
Afterthought: A less "mathematical" and more "statistical" condition (although not necessarily more useful) would be "The non-linear transformation must be such as to not render the variables $Z$ and $Y$ sub-independent". In a nutshell, sub-independent random variables are dependent but uncorrelated: their dependence is "perfectly non-linear".
Lets assume you have $X=(X_1, \dots, X_n)$ a random vector with multinormal distribution with expectation vector $\mu$ and covariance matrix $\Sigma$. We are interested in the quadratic form $Q(X)= X^T A X = \sum \sum a_{ij} X_i X_j$. Define $Y = \Sigma^{-1/2} X$ where we are assuming $\Sigma$ is invertible. Write also $Z=(Y-\Sigma^{-1/2} \mu)$, which will have expectation zero and variance matrix the identity.
Now
$$
Q(X) = X^T A X= (Z+\Sigma^{-1/2} \mu)^T \Sigma^{1/2}A\Sigma^{1/2} (Z+\Sigma^{-1/2} \mu).
$$
Use the spectral theorem now and write $\Sigma^{1/2}A \Sigma^{1/2} = P^T \Lambda P$
where $P$ is an orthogonal matrix (so that$P P^T=P^T P=I$) and $\Lambda$ is diagonal with positive diagonal elements $\lambda_1, \dotsc, \lambda_n$. Write $U = P Z$ so that $U$ is multivariate normal with identity covariance matrix and expectation zero.
We can compute
$$
Q(X) = (Z+\Sigma^{-1/2} \mu)^T \Sigma^{1/2}A\Sigma^{1/2} (Z+\Sigma^{-1/2} \mu) \\
= (Z+\Sigma^{-1/2} \mu)^T P^T \Lambda P (Z+\Sigma^{-1/2} \mu) \\
= (PZ+P\Sigma^{-1/2} \mu)^T \Lambda (PZ+P\Sigma^{-1/2} \mu) \\
= (U+b)^T \Lambda (U+b)
$$
where now
$b = P \Sigma^{-1/2} \mu $. (There was a small typo in above defs of $U$ and $b$, now corrected.) So:
$$
Q(X) = X^T A X = \sum_{j=1}^n \lambda_j (U_j+b_j)^2
$$
In your case, $\mu=0$ so $b=0$ so your quadratic form is a linear combination of independent chi-square variables, each with one degree of freedom. In the general case, we will get a linear combination of independent non-central chisquare variables.
If you want to work numerically with that distribution, there is an CRAN package (that is, package for R) implementing it, called CompQuadForm
.
If you want (much) more detail, there is a book dedicated to the topic, Mathai & Provost: "Quadratic forms in random variables".
Best Answer
For jointly (per @Did) normal random variables, uncorrelated implies independent. In particular, it is easy to see that the joint density function factors, giving the product of the two marginal density functions.
Also, for normal data, the sample mean $\bar X$ and sample SD $S$ are independent. (Proof via linear algebra or moment generating functions.) But $\bar X$ and $S$ are not independent except for normal data.
In the left panel below $S$ is plotted against $\bar X$ for 30,000 randomly generated standard normal datasets of size $n = 5.$ As a 'naturally occurring' instance where zero correlation and dependence coexist: in the right panel the same is done for 30,000 samples of size $n = 5$ from $Beta(.5, .5).$ For these beta data $\bar X$ and $S$ are uncorrelated, but not independent.