Logistic Regression – How to Call the Output

logisticregression

I'm currently working on a scientific paper and I'm struggle to answer the following question: What is the right term for the output (ranging from 0 to 1) of a logistic regression?

Neither of these works (since being already taken): Certainty, probability, confidence.

Ideas: rank, probability of occurrence, probability of success..?

Best Answer

It simply is probability, you can call it "predicted" as suggested by others.

I see from the discussion that you disagree with such name, so let me proove you that this is probability.

First, recall that if $X$ is a Bernoulli distributed random variable parametrized by $p$, then $E(X) = p$. Second, take an intercept-only logistic regression model, such model will calculate mean of your predicted $Y$ variable. This would be the same as if you calculated it simply taking $\hat y_i = (1/N) \sum_{i=1}^N y_i$. This mean would converge to expected value as $N\rightarrow\infty$, i.e. to $E(Y)= p$. In fact, sample mean is a maximum likelihood estimator of $p$ for Bernoulli distributed random variable. In case of more complicated logistic regression model you predict conditional means, i.e. conditional probabilities.

Check also Why isn't Logistic Regression called Logistic Classification?

If this still does not convince you, below you can see simple R example showing exactly that case:

set.seed(123)
p1 <- 0.75
Y1 <- sample(0:1, 500, replace = TRUE, prob = c(1-p1, p1))

fit1 <- glm(Y1~1, family = "binomial")

p1
## [1] 0.75

fitted(fit1)[1] # only the first one since all predictions are the same
##     1 
## 0.762

mean(Y1)
## [1] 0.762

q <- 0.3
p2 <- c(0.4, 0.7)
X <- sample(0:1, 500, replace = TRUE, prob = c(1-q, q))
Y2 <- numeric(500)
Y2[X==0] <- sample(0:1, sum(X==0), replace = TRUE, prob = c(1-p2[1], p2[1]))
Y2[X==1] <- sample(0:1, sum(X==1), replace = TRUE, prob = c(1-p2[2], p2[2]))

fit2 <- glm(Y2~X, family = "binomial")

# predicted probabilities vs the true ones
table( ifelse(X==0, p2[1], p2[2]), round(fitted(fit2), 3)) 
##      
##      0.359 0.658
##  0.4   348     0
##  0.7     0   152

# empirical conditional probabilities (conditional means)
tapply( Y2, X, mean )
##         0         1 
## 0.3591954 0.6578947

Best Answer

Related Solutions

Solved – Updating classification probability in logistic regression through time

Logistic Regression – Logistic Regression and Inflection Point

Related Question