Can i use the coefficients of a trained model of Logistic Regression as a result itself without using the model on unseen data

data analysisdata miningmachine learningstatistics

I'm trying to figure out if i can use a logistic regression as a predictive model, to estimate the probability of response of a user in CRM by having the predictors and i also have the class (detractor, not detractor) but the thing is that i don't want to estimate the probability of detraction since i already know the detractors, i already have the class and i'm always going to have it. What i was thinking was to train the model, use the probability given the predictors and then study the behavior of coefficients to know how this affects the probability. I will get the data periodically but is always going to have the class, so would it be ok to train the model everytime we get the data labeled (since we are going to make decisions everytime we train the model data should change and also the coefficients) and the results be the value of coefficients and influence in probability without having to apply a model on not seen data?

Basically i want to know if this is valid in a statistical sense and also if this could be a good result to business, since what they want to know is how the independent variables that we capture, affect the result of a client saying that they will not recommend the use of the product.

Thanks so much in advanced guys, sorry if i'm saying silly things, i'm not an expert in data science yet. Just starting.

Best Answer

The way I understand it, you are asking if you can use the parameter estimates of the logistic model directly, without feeding new data into that fitted model and making a direct prediction for that new data that way.

The answer is yes. Indeed, this is what is almost exclusively done in academia (social sciences, in particular economics). If you are interested not so much in a probability forecast for a specific case (where you would feed in new data), but in the general relationship between your input $x$-variables and your binary $y$ outcome variable, then this is the way to go.

In fact, since the prediction for the specific case as a deterministic function of your estimated model, you do not gain any information about the relationship between $x$ and $y$ by feeding new data in an already estimated model.

In general, any variable with a positive coefficient increases the probability of $y=1$, while any negative coefficient decreases the probability of $y=1$. The magnitudes of these effects are harder to interpret, however, which is why sometimes people compute "average marginal effects" for nonlinear models such as the logistic.

Related Question