Solved – Can one do GLM with LOESS transformed variables

generalized linear modelloessnonparametricr

I have binary valued classification variables, and predictors that are not really performing great in GLM with probit/logit model. Some of the predictors are also correlated with each other. I am considering to do a transformation to the parameters like a loess function in R. Loess applies to linear models where dependent variable is continuous, but my dependent variable is binary.

How can this approach extended to GLM probit/logit models? I might need a non-parametric transformation before feeding into GLM. The problem is how to find the non-parametric transform.

Edit 1: Here is an example where loess is applied directly to binary classifier, thus it is two stage. AUC jumps from 0.76 to 0.94. I would be glad to learn if there are any other ways to improve this nonlinear predictor

# nonlinear transformation ------------------------------------------------
set.seed(102)
a  <- runif(1000)
d  <- ifelse((a-0.3)^2 > 0.03, 1, 0)
d[ sample.int(1000, 50)]  <- 1
d[ sample.int(1000, 50)]  <- 0

par(mfrow=c(2,2))


df  <- data.frame(a, d)

glmmod <- glm(d ~ a, df, family=binomial(link = "logit"))
plot(a, glmmod$fitted.values)

lf  <- loess(d ~ a, df, model = T, span = 1)
plot(a, d)
lines(a[order(a)], predict(lf)[order(a)])

df2  <- data.frame(aT = predict(lf), d)
glmmod2 <- glm(d ~ aT, df2, family=binomial(link = "logit"))

plot(a, glmmod2$fitted.values)

require(ROCR)
pred <- prediction(glmmod2$fitted.values, d)
roc.perf = performance(pred, measure = "tpr", x.measure = "fpr")
plot(roc.perf, col="blue")
auc.perf = performance(pred, measure = "auc")
auc.perf@y.values[[1]]

pred <- prediction(glmmod$fitted.values, d)
roc.perf = performance(pred, measure = "tpr", x.measure = "fpr")
plot(roc.perf, add=TRUE, col="red")
auc.perf = performance(pred, measure = "auc")
auc.perf@y.values[[1]]

enter image description here

Best Answer

You don't use loess to transform variables.

You may be looking for generalized additive models (GAM), which is an extension of GLMs in the same way that additive models/nonparametric regression (including smoothing splines and local linear or local polynomial regression models) is an extension of linear regression.

https://en.wikipedia.org/wiki/Generalized_additive_model

example in R (picking your code up from df <- ..., using gam:

df  <- data.frame(a, d)
library(gam) #assuming you already have the package 
gammod <- gam(d ~ s(a,4), df, family=binomial(link = "logit")) #spline model
plot(a,d)
oa=order(a)
lines(a[oa],fitted(gammod)[oa],col=3)

enter image description here

gammod2 <- gam(d ~ lo(a,span=.5), df, family=binomial(link = "logit")) #loess-like 
plot(a,d)
lines(a[oa],fitted(gammod2)[oa],col=4)

enter image description here

Related Question