[Math] Prediction Model for forecasting using Linear regression

regressionstatistical-inferencestatistics

I am very new to inferral statistics. I am trying to build a prediction model for forecasting the revenue for physicians based on some historical data. I was planning to use Multiple Linear Regression model where the Payment is dependent to predictors such as Number Of Patient Visits,Number Of Charges for that month.

The structure of the data looks like :-

Payment(Monthly) | Patient Visits | Charges Count | Month Date |

Now after I have built an regression model,I have to forecast the payments of the physician based on the model. But now due to extrapolation while doing the forecast I assume I cannot give values of predictors outside the range of data with which it was built. For example the range for patient Visits is from 100 to 1000,now I want to predict what will be the payment if I had 2000 patient visits. I am not getting correct results with the model that I have built.

One of the other thoughts that I have is building Time Series models. But in that case time will be in X axis and the Payments will be in the Y – axis and we will be looking at the trend of the payments over time to make future predictions. But I want to use the effect of other predictors also when making future predictions.

Please let me know how best I can achieve this. As I am new to this any guidance will be highly appreciated.

Thanks in Advance!!

Best Answer

If the "model" is truly linear and revenue is only a function of your covariates, i.e., $$revenue_i=\alpha+\beta X_i+\varepsilon_i,$$ then your predictions should be close. So there are at least two things you can do:

1) Try other model specifications. Take the log of revenue, for example, and run $$log(revenue_i)=\alpha+\beta X_i+\varepsilon_i,$$ does it predict any better? Do some of the covariates have nonlinear impact on revenue? Try including square terms of some covariates, e.g., $$revenue_i=\alpha+\beta_1 patientvis_i+\beta_2 patientvis_i^2+\beta X_i+\varepsilon_i.$$ If a linear specification won't work at all, try nonlinear least squares. Especially if you know the revenue function and know it isn't linear, then you should use that knowledge and estimate a nonlinear model that exploits that knowledge. Do you have several different physicians? Then you should use physician fixed effects (this accounts for time invariant physician specific factors). If you expect some kind of serial/autocorrelation, try including a lagged term, i.e., $$revenue_{i,t}=\alpha+\beta_1 revenue_{i,t-1} +\beta X_{it}+\varepsilon_{i,t}.$$ You can do all of that with the data you have. If that doesn't work...

2) Get more data. If revenue is a function of some variable that you don't have/include in your regressions, then your prediction will naturally be off if that omitted variable changes a lot from month to month. Predicting without essential factors is like clairvoyance; won't work.

Related Question