First of all, I had a look here and in a couple of other questions: I couldn't find what I am looking for.
So my question is purely theoretical (although I have an example by my hands).
Suppose I have some data $(x_i,y_i)$ for $i=1,..,n$.
Suppose I fit the following models with IID $\epsilon_i \sim N(0, \sigma^2)$ for $i=1,..,n$
- $M_1: \log(y_i)= \beta_0+\beta_1x_i+\epsilon_i$
- $M_2: \log(y_i)= \beta_0+\beta_1x_i+\beta_2x_i^2+\epsilon_i$
- $M_3: \log(y_i)= \beta_0+\beta_1x_i+\beta_2x_i^2+\beta_3x_i^3+\epsilon_i$
Now I want to see which of these models is better, so I use the following (maybe weird, but stay with me) method, to evaluate their "predictive powers":
- Use $(x_i, \log(y_i))$ for $i=1,..,\frac{n}{2}$, to fit $M_1, M_2, M_3$ respectively.
- Now use the fitted model (so $M_1, M_2,M_3$ respectively), to predict $y_i$'s using the $x_i$'s from the remaining $\frac{n}{2}$ data , so from $i = \frac{n}{2}+1, .., n$ (careful, predict $y_i$ not $\log(y_i)$)
- Use MAE or Mean Absolute Error (here) $MAE = \frac{1}{\frac{n}{2}}\sum_{i=\frac{n}{2}+1}^{n}|y_i-\hat{y}_i|$, being careful that $\hat{y}_i$ is in the original scale of values!
So now my question:
If I do point $1.$ and I fit the three models (hence obtaining estimates for the parameters, their standard errors etc..) and then use these parameters (respectively of course!) to predict the responses of the other $x_i$'s:
- Will I be predicting $\log(y_i)$'s right? And this is true… Is it also true that in order to get $\hat{y}_i$'s , instead of $\widehat{\log{(y)}}_i$, I should just take the exponential of those terms? So in general, is it true $\hat{y}_i = e^{\widehat{\log{(y)}}_i}$?
- Once I find the three MAE's, how do I judge the models? Should I be looking for the one with smaller MAE?
EDIT
For example suppose I have $1000$ data points. I use the first $500$ to fit model $M_1$. Once I've fitted it, I can predict new values. Hence I predict the new responses of the other $500$ $x_i$'s left. of course, the prediction will be given in logarithmic scale. But I want to calculate MAE on the normal scale.
This is the context of my question, of course I would do this procedure for all the three models and compare the MAEs.
Best Answer
IMO which model is better will depend on many factors.
These include:
These should be done first in my opinion, since the results of these should be used for which seeing which assumptions can be used in each model.
Answering your questions:
Yes with what you have wrote.
Not quite: for example in your first model $M_1$ you define as:
$$\log(y_i)=\beta_0+\beta_1x_i+\epsilon_i$$
Hence $\hat{y_i}=\widehat{e^{\beta_0+\beta_1x_i+\epsilon_i}}$
$=e^{\hat{\beta_0}}e^{\hat{\beta_1}x_i}e^{\hat{\epsilon_i}}$
Taking the one with the smaller MAE would make sense, however I would take the value of highest $R^2$.
Most importantly to be able to use any of these models, they need to be significant. The way this is measured is typically via p-values. Depending on the hypothesis being tested, from a p-value that is less than eg $0.05$ it can be inferred it is significant.
http://www.dummies.com/education/math/statistics/what-a-p-value-tells-you-about-statistical-data/