There maybe more to it, but to me it seems that you just want to determine goodness-of-fit (GoF) for a function f(a), fitted to a particular data set (a, f(a)). So, the following only answers your third sub-question (I don't think the first and second are directly relevant to the third one).
Usually, GoF can be determined parametrically (if you know the distribution's function parameters) or non-parametrically (if you don't know them). While you may be able to figure out parameters for the function, as it appears to be exponential or gamma/Weibull (assuming that data is continuous). Nevertheless, I will proceed, as if you didn't know the parameters. In this case, it's a two-step process. First, you need to determine distribution parameters for your data set. Second, you perform a GoF test for the defined distribution. To avoid repeating myself, at this point I will refer you to my earlier answer to a related question, which contains some helpful details. Obviously, this answer can easily be applied to distributions, other than the one mentioned within.
In addition to GoF tests, mentioned there, you may consider another test - chi-square GoF test. Unlike K-S and A-D tests, which are applicable only to continuous distributions, chi-square GoF test is applicable to both discrete and continuous ones. Chi-square GoF test can be performed in R by using one of several packages: stats
built-in package (function chisq.test()
) and vcd
package (function goodfit()
- for discrete data only). More details are available in this document.
That F table is built from sequential sums of squares (Type I); you're effectively considering a test of each effect given the previous ones are in the model. So you test Sepal length as if Petal length were not there (compare it with a regression just on Sepal length), but you test Petal length given Sepal length is present.
By contrast the earlier regression table would correspond to each coefficient being tested with all the other terms in the model no matter what order they're in.
Note that the last row of both tables should give the same p-value (in your example, that's the "petal length" variable -- and they do)
Best Answer
Yes, you can have high p-values for individual coefficients with a good fit and low p-values with a poor fit. The reason for this is straightforward: goodness of fit is a different question than whether the slope of the $X,\ Y$ relationship is $0$ in the population. Generally, when running a regression, we are trying to determine a fitted line that traces the conditional means of $Y$ at different values of $X$. (It is also possible to wonder about other aspects of a model, but that is the most basic and common feature.) Thus, a goodness of fit assessment is whether the model's fitted conditional means actually match the data's conditional means. The answer to this latter question can be either yes or no independently of whether the best estimate of the slope is $0$.
Consider the following examples, which are coded in R. (I don't have access to MATLAB, but the code here is intended to be as close to pseudocode as I can make it.)
What these examples show are a model that has high / non-significant p-values, but a good fit for the predicted means (because the true slopes are $0$), and a model with very low / highly significant p-values, but a poor fit for the predicted means (because, although the slopes within the regions spanned by the data are far from $0$, they are also not very close to straight lines). The p-values are easy to see and understand in the output. To see the quality of the models' fits to the conditional means, I plotted the true data generating process (in this case I have it, because the data are simulated, but in general you won't). In a more typical case, you would just see if the predicted means do a reasonable job of tracing the observed conditional means in your dataset; here I did that by plotting LOWESS lines. (The plots only display
x1
, and collapse overx2
, but I could make analogous plots withx2
, or various kinds of fancy plots with bothx1
andx2
, and they would show the same thing.)