# Solved – R regression output – Factors vs numeric variables

logisticr

Let's say I have the following logistic regression models:

 df=data.frame(income=c(5,5,3,3,6,5),
won=c(0,0,1,1,1,0),
age=c(18,18,23,50,19,39),
home=c(0,0,1,0,0,1))

> md1 = glm(factor(won) ~ income + age + home,
> md2 = glm(factor(won) ~ factor(income) + factor(age) + factor(home),
> summary(md1)

Call:
glm(formula = factor(won) ~ income + age + home, family = binomial(link = "logit"),
data = df)

Deviance Residuals:
1        2        3        4        5        6
-1.0845  -1.0845   0.8017   0.4901   1.7298  -0.8017

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)  4.784832   6.326264   0.756    0.449
income      -1.027049   1.056031  -0.973    0.331
age          0.007102   0.097759   0.073    0.942
home        -0.896802   2.252894  -0.398    0.691

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 8.3178  on 5  degrees of freedom
Residual deviance: 6.8700  on 2  degrees of freedom
AIC: 14.87

Number of Fisher Scoring iterations: 4

> summary(md2)

Call:
glm(formula = factor(won) ~ factor(income) + factor(age) + factor(home),
family = binomial(link = "logit"), data = df)

Deviance Residuals:
1           2           3           4           5           6
-6.547e-06  -6.547e-06   6.547e-06   6.547e-06   6.547e-06  -6.547e-06

Coefficients: (3 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept)      2.457e+01  1.310e+05       0        1
factor(income)5 -4.913e+01  1.605e+05       0        1
factor(income)6 -2.573e-30  1.853e+05       0        1
factor(age)19           NA         NA      NA       NA
factor(age)23   -1.383e-30  1.853e+05       0        1
factor(age)39   -3.479e-14  1.605e+05       0        1
factor(age)50           NA         NA      NA       NA
factor(home)1           NA         NA      NA       NA

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 8.3178e+00  on 5  degrees of freedom
Residual deviance: 2.5720e-10  on 1  degrees of freedom
AIC: 10


So depending on the mode of the predictors, R produced different outputs. For factors, R splits out the coefficients into separate categories for the levels, but not for the model with numeric predictors. I'm wondering about a couple things.

1. Is it ever useful to have the response categories expressed as individual rows?

2. To express the general regression equation, how does one go from a model with the categories expressed in an individual equation to an equation with a single B_i. So, for example, if gender has two coefficients, 3.5 for Male and 2.3 for Female, how does one use that in an equation such that (besides converting them into numeric values):

Y = B0 + B1 (Gender)