To get started, let's look at an example of what your regression output might look like.
Pred Estimate StdErr t p sig
A1 1.0 0.2 5.00 0.0005 *
A2 -1.9 2.0 -0.95 0.1850
A3 4 0.1 40.0 <0.0001 *
d1 -2 1.1 -1.81 0.0539
d2 0.5 0.1 5.00 0.0005 *
Of special interest to you is the sig
column, which as an *
if and only if the p-value for the corresponding variable is statistically significant given all the other variables in the model.
When I estimate the model with all the variables included some of independent variables are not significant but when I add just one of the dummy variables all of the independent variables are significant.
Think of each variable as carrying some information about the response Y, and a variable being significant if it carries "enough" of that information, in some sense. You can think of the *
in the table to mean that if we dropped that one variable and left the others, we would lose a significant amount of information. A lack of a *
would then mean that we could drop that variable and as long as we kept the rest we wouldn't lose too much information.
Now let's say you dropped d1 because it wasn't significant, and your table now looks like this:
Pred Estimate StdErr t p sig
A1 1.1 0.2 5.50 0.0003 *
A2 -4.1 1.2 -3.42 0.0045 *
A3 4.2 0.1 42.0 <0.0001 *
d2 0.4 0.1 4.00 0.0020 *
Let's pretend A2 is weight and d1 is sex. It might be that weight and sex carry much of the same information about Y, especially since they are correlated. Therefore when we had weight (A1) and sex (d1) in the model, each was a bit redundant when the other was present, and we could drop one as long as we kept the rest. Now once we've dropped sex, all the information that was present in both weight and sex is now present only in weight, and if we now drop weight, we will lose that information. Thus weight (A2) has become significant.
And, when I estimate the model in the form of f(A1,A2,A3,d1) I get different coefficients for the independent variables in comparison with the ones for f(A1, A2, A3, d2).
Recall that the regression model looked like this:
$$
\hat{\bar{Y}}_i = \hat{\beta}_0 + \hat{\beta}_1 A_{1,i} + \hat{\beta}_2 A_{2,i} + \hat{\beta}_3 A_{3,i} + \hat{\beta}_4 d_{1,i} + \hat{\beta}_5 d_{2,i}.
$$
Now once we've dropped d1, it looks like this:
$$
\hat{\bar{Y}}_i = \hat{\beta}_0 + \hat{\beta}_1 A_{1,i} + \hat{\beta}_2 A_{2,i} + \hat{\beta}_3 A_{3,i} + \hat{\beta}_5 d_{2,i}.
$$
If we kept the $\hat{\beta}_{p,i}$ the same, each $\hat{\bar{Y}}_i$ would now be decreased by $\hat{\beta}_4 d_{1,i}$, which would be the difference between the right-hand sides of the two equations. That doesn't really make sense, though, so the estimates of the coefficients have to be changed. The rest of the changes in the table will follow from those new estimates.
Best Answer
I am not a big fan of converting a continuous variable to multiple dummy variables. I guess the binning procedure is considered standard practice in score card development.
Regarding dummy variable insignificance: When you add a dummy variable in regression, the omitted group act as reference group. The reference group is compared to other groups corresponding to the dummy variables. When variables have a nonlinear relationship (e.g. quadratic) with log odds, you may get some dummy variables that are insignificant (the group whose effect is near to the reference group). My suggestion to see the pattern of log-odds in each bin before merging. Either you can make fewer final bins depending one the pattern or change the reference group. I know it is bit abstract. But, I will not be able to go to specific without knowing the case.
You could also drop the insignificant variable. Doing it this way, you are merging the group associated with dropping dummy. It may not be appropriate if the merging of reference group and the dummy group (insignificant) doesn't make business sense.