I have a data set of employees who have worked at a company throughout 2017. The data set is broken out month by month; someone can work all 12 months of the year (12 rows) and someone can have worked for 3 months before leaving/getting terminated (3 rows).
I wanted to see if there was a relationship between termination rates (1 for terminated; 0 for still at the company) and department (1 for CA; 0 for US), but because this is panel data, someone suggested that I have to use the PLM package.
I tried doing a basic function like so:
fixed <- plm (Term ~ Canada, data = df, index = c("Month"), model = "within"
summary(fixed)
These were my results:
Unbalanced Panel: n = 12, T = 116-207, N = 1972
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-0.37179 -0.24446 -0.17718 -0.04075 0.95925
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
CANADA -0.084064 0.018344 -4.5826 4.882e-06 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 320.25
Residual Sum of Squares: 316.85
R-Squared: 0.010606
Adj. R-Squared: 0.0045455
F-statistic: 21 on 1 and 1959 DF, p-value: 4.8822e-06
This is my first time using the PLM package, but based on the P-value, we rejet the null hypothesis and there is a significant difference between where you work (CA vs. US) and whether or not you leave.
Have I done this right? I ask especially to ensure I have chosen the correct model function. I ask because just looking at the data, American workers have only a slight termination rate higher than Canadian workers (mostly because there's a smaller total).
EDIT:
I've been informed that I should have used pglm.
I've tried something like this:
fixed <- pglm( Term ~ CANADA, data = df, na.action=na.omit, family = binomial(link = "logit"), index = "Fiscal.Period");
These are the results:
Estimates:
Estimate Std. error t value Pr(> t)
(Intercept) -1.0153 0.1458 -6.962 3.36e-12 ***
CANADA -0.5200 0.1124 -4.625 3.74e-06 ***
sigma 0.6048 0.1620 3.733 0.000189 ***
Best Answer
I would suggest using a logit panel model instead, since that would constrain your outcome variable to fall between 0 and 1. For this, you need to use the function pglm(), which will require you installing the pglm package first. Off the top of my head, I think you need to specify the option, family = "binomial", in order to get a logit model, but double check this in the documentation. Let me know if you have any more questions.