Solved – P-value for multiple regression changes as more points are added

multiple regressionp-valueregression

I am working on a multiple regression model that will forecast the value of loans being granted within the current month.

The data points are broken down per day ( Jan will have 31 data points etc) and I will be updating the model every week.

I have 4 independent variables driving the model.
I am currently testing the model and trying to understand what is happening in particular to the $p$-values.

On Day 8, 2 of my Variables (loans and declines) have $p$-values of

loans   0.014030324
declines    0.980464984

On day 15 when I run the regression I get these

loans   0.003114471
declines    0.023498327

I am just wondering why the "declines" $p$-value is now starting to show as a significant value. Is it because the model is working over more data points or is there something in the data that is suggesting that this variable is becoming more significant?

Best Answer

Both options are the same answer.

Sampling error depends on sample size. If I understand allright, you are increasing the amount of evidence when moving from day 8 to day 15. It is expected that the "true relationships" among independent variables and between them and the dependent one in the population will start to reveal as sample size increases. Think in a signal-to-noise ratio increasing. Think in the number of times you need to throw a coin until you are confident that is a fair one.

Anyway, I would measure the progress in those relationships by observing the stability of the coefficients instead of the changes in p-values. P-values are informing you about null hipotheses, that I guess in this case are testing that each of the coefficients equals zero in the population. The confidence intervals will be reduced as sample size increases (under the proper assumptions being valid) so any constant coefficient will start to "be different from zero" (reject the null hipothesis) with enough data.

[Note 1. You have to also be aware of colineality among the "independent" variables. This may also make your (partial regression) coefficients to change]

[Note 2. I would also try to follow the advice by Peter Flom above]