Solved – How to simulate artificial data for logistic regression

generalized linear modellogisticrregressionsimulation

I know I'm missing something in my understanding of logistic regression, and would really appreciate any help.

As far as I understand it, the logistic regression assumes that the probability of a '1' outcome given the inputs, is a linear combination of the inputs, passed through an inverse-logistic function. This is exemplified in the following R code:

#create data:
x1 = rnorm(1000)           # some continuous variables 
x2 = rnorm(1000)
z = 1 + 2*x1 + 3*x2        # linear combination with a bias
pr = 1/(1+exp(-z))         # pass through an inv-logit function
y = pr > 0.5               # take as '1' if probability > 0.5

#now feed it to glm:
df = data.frame(y=y,x1=x1,x2=x2)
glm =glm( y~x1+x2,data=df,family="binomial")

and I get the following error message:

Warning messages:
1: glm.fit: algorithm did not converge
2: glm.fit: fitted probabilities numerically 0 or 1 occurred

I've worked with R for some time now; enough to know that probably I'm the one to blame..
what is happening here?

Best Answer

No. The response variable $y_i$ is a Bernoulli random variable taking value $1$ with probability $pr(i)$.

> set.seed(666)
> x1 = rnorm(1000)           # some continuous variables 
> x2 = rnorm(1000)
> z = 1 + 2*x1 + 3*x2        # linear combination with a bias
> pr = 1/(1+exp(-z))         # pass through an inv-logit function
> y = rbinom(1000,1,pr)      # bernoulli response variable
> 
> #now feed it to glm:
> df = data.frame(y=y,x1=x1,x2=x2)
> glm( y~x1+x2,data=df,family="binomial")

Call:  glm(formula = y ~ x1 + x2, family = "binomial", data = df)

Coefficients:
(Intercept)           x1           x2  
     0.9915       2.2731       3.1853  

Degrees of Freedom: 999 Total (i.e. Null);  997 Residual
Null Deviance:      1355 
Residual Deviance: 582.9        AIC: 588.9