Translate the GLM model with coefficients into a formula

algebra-precalculuslogarithmsmathematical modelingregressionregression analysis

I wasn't sure whether to post here or on stats.stackexchange, but trying here first since my goal is to find/derive a math formula for my r generated model.

If you look at my post history you can see I've been self studying logs at the high school level for the past few months. Now, in my day job I have a real life problem where I can attempt to apply what I've learned, so I'm pretty motivated and exited about reinforcing a few concepts. But I need some hand holding…

I have created a glm poisson model with the formula (In R but I don't think it maters here):

mod = glm(formula = CUMAMT ~ log(Lag_CUMAMT) + TENURE + log(TENURE), family = "poisson", data = mytrainingdata)

Where CUMAMT is a cumulative dollar amount for a segment of customers and TENURE is the length of time in days that this segment have been a customer with us. My goal is to predict future cumulative revenue (CUMAMT).

Further context. The model starts from tenure day 21. I have an 'initial value' called 'Initial Value Day 20' in the sheet which is the actual cumulative revenue for the group as of day 20. I use this initial amount to predict cumulative revenue for day 21. Then, in turn I use the prediction of cumulative revenue for tenure day 21 to predict cumulative revenue for day 22. And so on. I.e. this model uses the prediction from the previous row or day as an input to the current prediction i.e. a lag.

Using a spreadsheet I am able to plug in my coefficients to actual data and make predictions. Here is a link to the online document I created with scrubbed and disguised data.

Here is a screen shot of the contents of that spreadsheet:
enter image description here

I've added the values of the coefficients in the table at the top left. Then, in the two tables below I've made my prediction.

I would like to turn the coefficients and initial value into a formula.

Looking at my textbook, there is an exponential growth formula of the form:

$$A(t)=A^{rt}$$

Read as 'The amount at time $t$ is a function of the initial amount $A$ to the power of the rate * the time periods'.

My formula is different but I wondered if I can somehow write it in a similar way?

On the spreadsheet, the column 'Empirical Prediction' is just the exponent of the sum of the coefficient calculations:

$$e^{(Intercept) + (log(lag\_CUMAMT)) + (TENURE) + log(TENURE)}$$

I could almost use this formula as is except I hope/guess that there's a clever way to handle the TENURE as a 'number of time periods since day 20'. E.g. the first row, predicting cumamt for day 21, the time periods between tenure 20 and 21 is just 1. For predicting day 25, the time periods delta between the initial amount and prediction day is 25 – 20 = 5. Etc.

Is there some algebra I can do here derive a formula that uses the coefficients, initial value 255152 and the Tenure in days / time units since initial value to predict CUMAMT? i.e. as opposed to (TENURE) + log(TENURE) is there a way for me to simplify this based on the corresponding coefficients of these inputs e.g. Somehow in a way like with the exponential growth formula $A(t)=A^{rt}$ that just takes the amount to a power of the product of two values?

What's the 'right' way to express my model as a formula?

Best Answer

Not sure that it answers your question, but some comments and guides (it is too messy for a comment):

  1. Your code does not fits a Poisson regression, as it misses the family parameter, namely it should be

glm( formula = CUMAMT ~ log(Lag_CUMAMT) + TENURE + log(TENURE), family = "poisson", data = mytrainingdata )

  1. If you fit the Poisson regression, make sure that your dependent variable, CUMAMT, has only integer values. Otherwise, not sure the Poisson is the relevant regression model.

  2. The general form of the Poisson model is $$ \mathbb{E}[Y|x] = e^{\beta_0 + \beta_1x_1 + ... + \beta_p x_p } $$ where $x_i$ are your explantory variables i.e., $\log ( \text{lag}( CUMAMT))$, $TENURE$, etc.

  3. If you want to fit a simple exponential model, i.e., $$ y_i = e^{\beta_0 + \beta_1x_1 + ... + \beta_p x_p }, $$ then just linearize it, i.e., $$ \log(y_i) = \beta_0 + \beta_1x_1 + ... + \beta_p x_p + \epsilon_i, $$ this is not a glm and has only restriction that $y_i > 0$ for all $i$.