Are thresholds for logistic regression models prevalence-specific

logisticprevalencerregressionthreshold

I wonder if thresholds for logistic regression models are prevalence-specific. I assume that they are, however, I am not sure about the basic statistical principles behind it and how to deal with the implications for clinical practice.

Example:

A hospital wants do deploy a logistic regression model to predict lymph node metastasis in prostate cancer patients. The model is recommended by a specialist society and widely accepted in the medical community.

For model development, a research group used a large dataset where the prevalence of lymph node metastasis was low (15%). They used a lab value (PSA) and age as predictors.
After external validation with data from hospitals with similar prevalence (15 %), decision curve analysis and discussing the benefits and harms of the treatment the specialist society found a threshold probability of ≥0.10 appropriate regarding decision if a patient needs specific surgery (medically reasonable amount of true positive and false positive results).

Now the hospital is deploying the model in their surgery consultation-hour (expected prevalence of Patients with lymph node metastasis = 30%).

Questions:

  1. Can they deploy the same threshold probability if they want to have similar true positive and false positive results?
  2. If not, how should the model and/or threshold probability be adjusted (to get similar true positive and false positive results)?

What I already found on this topic:

An interresting blog about prevalence and probability, however, it does not answer my question regarding thresholds.

The Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) statement (W17):

In general, models will be more generalizable when the case mix of the new population is within the case mix range of the development population (186). However, as we describe under item 10e (see also Box C and Table 3), one may adjust or update a previously developed prediction model that is applied in another setting to the local circumstances of the new setting to improve the model transportability.

from W17 Table 3:

Updating Method: Adjustment of the intercept(baseline risk/hazard)

Reason for Updating: Difference in the outcome frequency (prevalence or
incidence) between development and validation sample

Reproducible Example in R:

#library
library(tidyverse)
library(rmda)

# train data (prevalence= 15%)
train <- tibble(id=1:1000,
                    class=c(rep(1,150),rep(0,850)))

set.seed(1)
train %>% 
  group_by(id) %>% 
  mutate(
  PSA=case_when(class==1 ~ runif(1,1,100),TRUE ~ runif(1,1,40)),
  Age=case_when(class==1 ~ runif(1,30,80),TRUE  ~runif(1,30,60))) -> d.train
  

# test data same prevalence (15%)
test <- tibble(id=1:1000,
                class=c(rep(1,150),rep(0,850)))

set.seed(23)
test %>% 
  group_by(id) %>% 
  mutate(
    PSA=case_when(class==1 ~ runif(1,1,100),TRUE ~ runif(1,1,50)),
    Age=case_when(class==1 ~ runif(1,30,80),TRUE  ~runif(1,25,60))) -> d.test_same_prev



# test data high prevalence (30%)
test1 <- tibble(id=1:1000,
               class=c(rep(1,350),rep(0,650)))

set.seed(123)
test1 %>% 
  group_by(id) %>% 
  mutate(
    PSA=case_when(class==1 ~ runif(1,1,100),TRUE ~ runif(1,1,50)),
    Age=case_when(class==1 ~ runif(1,30,80),TRUE  ~runif(1,25,60))) -> d.test_higher_prev


# train logistic regression model
glm(class ~ Age+PSA, data=d.train,family = binomial) -> model


# make predictions in cohort with same prevalence
predict(model,d.test_same_prev, type="response") -> preds1
plot(preds1)

# make predictions in cohort with high prevalence
predict(model,d.test_higher_prev, type="response") -> preds2
plot(preds2)



# decision curve analysis same prevalence
d.dca.same <- data.frame(reference=d.test_same_prev$class,predictor=preds1)

dca.same <-decision_curve(reference ~predictor,d.dca.same,fitted.risk=TRUE, bootstraps = 10)

plot_decision_curve(dca.same,confidence.intervals=FALSE)



# decision curve analysis high prevalence
d.dca.high <- data.frame(reference=d.test_higher_prev$class,predictor=preds2)

dca.high <-decision_curve(reference ~predictor,d.dca.high,fitted.risk=TRUE, bootstraps = 10)

plot_decision_curve(dca.high,confidence.intervals=FALSE)

Created on 2021-08-08 by the reprex package (v2.0.0)

Best Answer

Three intertwined issues need to be disentangled: (1) calibration of a probability model, (2) whether the model should be used to generate a hard probability threshold, and (3) if so, where the threshold should be. Let's take them in reverse order.

(3) If you have a well-calibrated probability model and there is to be a probability threshold, then the choice should be based on the costs and benefits of true and false assignments to each class. This answer explains the choice for the two-class situation, with links to the complications with multi-class models.

The threshold is not part of the logistic regression, although the title of this question seems to imply otherwise. The threshold is chosen based on the intended application's costs and benefits, after the probability model (however devised, it doesn't have to be logistic regression) is in place.

(2) As Frank Harrell said in a comment, "Optimum decisions are independent of prevalence but are completely dependent on the probability of an outcome for an individual person." The probability of an outcome for an individual might depend on clinical considerations outside of what's captured in your probability model.

Furthermore, the cost/benefit tradeoff discussed above might differ among individuals. An 85-year-old with prostate cancer might have less willingness to undergo surgery to search for potentially positive lymph nodes than a 60-year-old. All of that argues against setting firm probability thresholds for individuals based solely on a model.

(1) The heart of this question is thus whether a probability model based on "a large dataset where the prevalence of lymph node metastasis was low (15%)" can be used in a "surgery consultation-hour (expected prevalence of Patients with lymph node metastasis = 30%)." That's a more complicated question about model calibration, in particular whether the logistic-regression intercept should be adjusted for that prevalence difference.

A logistic regression model for probability $p$ of a condition ($D$) as a function of covariates $X$

$$\log \frac {p}{1-p} = \alpha + \beta^T X $$

has an intercept $\alpha$ representing the log-odds of $D$ in the sampled population at a baseline situation when covariate values are 0 (or at reference levels for categorical predictors). (The answer from @Eoin explores the situation when populations differ in baseline prevalence.) The probability of $D$ given $X$ in that same population is:

$$ p(D|X) = \frac {\exp(\alpha + \beta^T X)}{1+\exp(\alpha + \beta^T X)}.$$

McCullagh and Nelder show (Section 4.3.3) a situation that might need adjustment of the intercept to take the sampled population into account. A retrospective study might evaluate all cases with $D$ but only a subset of those without the condition ($\bar D$). Then to estimate $p(D|X)$ with the above formula in the entire population, you need to adjust the intercept to $\alpha^*=\alpha + \log(\pi_0/\pi_1)$, where $\pi_0,\pi_1$ are the fractions of cases $D$ and non-cases $\bar D$ sampled, respectively. But they warn:

It is essential here that the sampling proportions depend only on $D$ and not on $X$.

That's probably not the case in your example of positive-node probability in prostate cancer patients evaluated in a "surgery consultation-hour." Those patients were chosen in part because their covariate values $X$ (probably including PSA and age) indicate that they already are at higher risk of nodal spread than the overall population of prostate cancer patients.

If the original probability model was properly calibrated for the overall population of prostate cancer patients (15% node-positive), the question is whether that overall population is adequately representative of your overall prostate cancer population. In part: is the probability of node-positivity at baseline covariate conditions in the original study similar to yours?

Patients discussed in the "surgery consultation-hour" presumably aren't at baseline covariate conditions. They were pre-selected based on suspected higher risk and thus should have higher expected node-positive probability. If the original model is well calibrated with respect to your overall prostate cancer population, there should be no problem applying it to this pre-selected higher-risk subset.

Related Question