If you are only interested in correlation between the two feature values, then there are a lot of ways to compute it (simple correlation, rank correlation, linear or nonlinear regression, etc.).
If you are interested in causality, a few places to look at are: Granger causality
and NIPS workshops on causality: 2008, 2009
You seem to be suffering the usual confusion about what "linear regression" means. Nonlinearity in the $x$ values does not mean you're doing nonlinear regression. If you're talking about fitting the sort of polynomial you're talking about above under the simplest conventional assumptions, that's linear regression, notwithstanding the fact that you're fitting a curve rather than a line.
You're also using the lower-case $n$ to denote two different things in the same problem! Confusing at best!
Think about this:
$$
Y = X\beta + \varepsilon
$$
where $Y$ is an $n\times1$ vector, $X$ is an $n\times k$ matrix with (typically) $k \ll n$, and $\varepsilon$ is an $n\times 1$ random vector of "errors" that is normally distributed with expected value the $n\times 1$ column vector of $0$s and variance $\sigma^2$ times the $n\times n$ identity matrix. You can observe $X$ and $Y$ and you want to estimate $\beta$, which is of course a $k\times 1$ fixed but unobservable vector.
In the case where $Y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \cdots + \beta_{k-1} x_i^{k-1} + \varepsilon_i$, we have
$$
X= \begin{bmatrix} 1 & x_1 & x_1^2 & \cdots & x_1^{k-1} \\ \vdots & \vdots & \vdots & & \vdots \\ 1 & x_n & x_n^2 & \cdots & x_n^{k-1} \end{bmatrix}
$$
Then the least-squares estimate of $\beta$ is $\hat\beta = (X^TX)^{-1}X^T Y$. The vector of fitted values of $Y$ is $\hat Y = HY = X(X^TX)^{-1}X^TY$. The "hat matrix" $H$ is an $n\times n$ symmetrix idempotent matrix of rank $k \ll n$. It's the orthogonal projection onto the column space of $X$. The vector of residuals is $\hat\varepsilon = Y - \hat Y = (I - H)Y$. This matrix $I-H$ represents the complementary orthogonal projection.
Now remember: If $A$ is an $m\times n$ fixed matrix and $B$ is an $\ell\times n$ fixed matrix then $E(AY) = A E(Y)$, $\operatorname{var}(AY) = A(\operatorname{var}(Y))A^T$, and $\operatorname{cov}(AY,BY) = A(\operatorname{cov}(Y,Y)B^T$.
Consequently we have
$$
\hat\beta \sim N_k (\beta, \sigma^2 (X^TX)^{-1}),
$$
(a $k$-dimensional normal distribution)
$$
\hat\varepsilon \sim N_n(0, \sigma^2(I - H))
$$
(an $n$-dimensional normal distribution with a variance of small rank, $k$), and
$$
\operatorname{cov}(\hat\beta, \hat\varepsilon) = 0
$$
(a $k\times n$ matrix of zeros).
Therefore the distribution of the sum of squares of observed residuals (not unboservable errors) is given by
$$
\frac{\|\hat\varepsilon\|^2}{\sigma^2} = \frac{\|(I-H)Y\|^2}{\sigma^2} \sim \chi^2_{n-k},
$$
a chi-square distribution with $n-k$ degrees of freedom, and this random vector is actually independent of the least-squares estimates $\hat\beta$, which is normally distributed.
Now if you think about the definition of the Student's t-distribution, you can derive various confidence intervals.
Best Answer
Usually you predict a dependent variable y and calculate a confidence interval, i.e. given x0, you calculate [y-, y+] where y will probably lie in.
For the reverse, if you have a y0 and want to find [x-, x+] for whatever reasons, regression will not help.
The appropriate tool for this kind of analysis could be structural equations modeling [1]
[1] http://en.wikipedia.org/wiki/Structural_equation_modeling