Solved – Using PLM in R for Panel Data

I have a data set of employees who have worked at a company throughout 2017. The data set is broken out month by month; someone can work all 12 months of the year (12 rows) and someone can have worked for 3 months before leaving/getting terminated (3 rows).

I wanted to see if there was a relationship between termination rates (1 for terminated; 0 for still at the company) and department (1 for CA; 0 for US), but because this is panel data, someone suggested that I have to use the PLM package.

I tried doing a basic function like so:

fixed <- plm (Term ~ Canada, data = df, index = c("Month"), model = "within"
summary(fixed)

These were my results:

Unbalanced Panel: n = 12, T = 116-207, N = 1972
Residuals:
Min.  1st Qu.   Median  3rd Qu.     Max. 
-0.37179 -0.24446 -0.17718 -0.04075  0.95925 

Coefficients:
Estimate Std. Error t-value  Pr(>|t|)    
CANADA -0.084064   0.018344 -4.5826 4.882e-06 ***

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    320.25
Residual Sum of Squares: 316.85
R-Squared:      0.010606
Adj. R-Squared: 0.0045455
F-statistic: 21 on 1 and 1959 DF, p-value: 4.8822e-06

This is my first time using the PLM package, but based on the P-value, we rejet the null hypothesis and there is a significant difference between where you work (CA vs. US) and whether or not you leave.

Have I done this right? I ask especially to ensure I have chosen the correct model function. I ask because just looking at the data, American workers have only a slight termination rate higher than Canadian workers (mostly because there's a smaller total).

EDIT:

I've been informed that I should have used pglm.

I've tried something like this:

fixed <- pglm( Term ~ CANADA, data = df, na.action=na.omit, family = binomial(link = "logit"), index = "Fiscal.Period");

These are the results:

Estimates:
        Estimate Std. error t value  Pr(> t)    
(Intercept)  -1.0153     0.1458  -6.962 3.36e-12 ***
CANADA       -0.5200     0.1124  -4.625 3.74e-06 ***   
sigma         0.6048     0.1620   3.733 0.000189 ***

Best Answer

I would suggest using a logit panel model instead, since that would constrain your outcome variable to fall between 0 and 1. For this, you need to use the function pglm(), which will require you installing the pglm package first. Off the top of my head, I think you need to specify the option, family = "binomial", in order to get a logit model, but double check this in the documentation. Let me know if you have any more questions.

Best Answer

Related Solutions

Panel Data – Computing Predicted Value from a Panel Data Model with PLM Package in R

Solved – Panel data and multinomial logistic regression in R

Related Question