[Math] How to tell if y is a function of x in a random sample

st.statistics

I have some data and believe that a given metric is a function of another metric. I have the values of both metrics and many different sets of these values. Can I tell if one is a function of the other through some simple exercise like a regression? I'm not sure if the function is linear. I'm not a math expert so apologies if this is a trival question.


Edit: Here's my (Anton's) interpretation of the question. If I misunderstood, I hope gitkin corrects it.

Given a bunch of data points $\{(x_i,y_i)\}$ in the plane, I can find the line best fitting the data. Then I can compute the coefficient of determination $R^2$ to see how good the fit is. More generally, given a model $y=f(x)$ (where $f$ may not be linear), I can do various things to determine how well the model fits the data.

Is there some way to determine if there exists a model $y=f(x)$ fitting the data well? In other words, is there a way to measure your confidence that the $x$ values completely determine the $y$ values (in some reasonable way) in the system you've sampled? Intuitively, you should somehow vary over all possible functions $f$, measure how much the model $y=f(x)$ fails to explain the data, add some penalty depending on the complexity of $f$ relative to the size of the sample, and return the lowest value you get. Is there a precise, theoretically justified way to do this?

e.g. the penalty should be very high if $f$ is a polynomial of degree comparable to the number of data points.

Best Answer

If you are only interested in correlation between the two feature values, then there are a lot of ways to compute it (simple correlation, rank correlation, linear or nonlinear regression, etc.).

If you are interested in causality, a few places to look at are: Granger causality

and NIPS workshops on causality: 2008, 2009