Regression – How to Perform Logistic Regression Using Excel Solver

excellikelihoodlogisticregressionregression coefficients

Suppose there is a problem where a business analyst works for an energy company and they want to find out the customer probability that a given set of customers will churn and move over to other energy providers. Where 1 = yes (they will churn) and 0 = no (they will not churn).

Using the 2 models below, I need to be able to predict the probability of the churn:

First optimisation model

Second Optimisation Model

So far this is what I have managed to do along with the formulas implemented for each, but I am not sure how/if anything needs to be amended in order to see which model works best as a solution to the problem.

In my example b0, b1 and b2 have been named as x1, x2, and x3 where b0 = x3 (the intercept). I have set this as 0.001 to start off with but the solver will optimise these values accordingly.

Step 1 – Calculating the Logit:

Calculating the Logit

Step 2 – Calculating the exp of the Logit:

Calculating the exp of the logit

Step 3 – Calculating the probability:

Calculating the probability

With this one I have used 2 formulas to satisfy what I understood to be this part of the models. Where churn = 1 the formula used is exp(logit)/1+exp(logit) but where the churn = 0 I have used the formula 1/1+exp(logit). Is this the right approach? As I was unable to find anything directly related to the equations already shared above

Step 4 – Calculating the log likelihood:

Calculating the log likelihood

Is this only supposed to be used for equation 2 or for both equations? Because in my example equation 2 is the only one where I see the use of log.

The last step is the objective function which I have currently calculated as a sum of the log likelihood column, although again according to the equations I should use, only the second one seems to be a sum where as the first one is a multiplication?

After this I then run the excel solver set it to maximise and make sure the relevant cells have been selected:

Excel Solver

The end goal is to have 2 separate (but equivalent) models that I can then compare the output of. This all needs to be done using the standard excel solver and not any add-ins. I feel I am very close to a solution for both, but may be missing something rather simple.

Thank you in advance

Best Answer

The "two models" you are referring to are mathematically equivalent representations of the same model. Recall that $\log(x y) = \log(x) + \log(y)$. The second formula uses sum of logs because this is more numerically stable. Using log transformation doesn't change the relative ordering of the values, if $x > y$ then $\log(x) > \log(y)$, so the transformation wouldn't change the maximum of the function, it wouldn't matter for optimization. You just need to maximize the second formula, if there are any difference they would be only due to numerical precision issues.

Related Question