R Regression – A Comprehensive Guide on Censored Regression in R

rregressiontobit-regression

I have a Tobit model of the form:
\begin{align}
Y^*_i &= X_i\beta + \epsilon_i \\
Y_i &= \max(Y^*_i,0).
\end{align}

The regressors are one continuous variable and 30 factors (modelling seasonal effects during the day). In my sample there are among $3116$ observations $194$ left-censored and $2922$ uncensored.

I tried the R packages AER and censReg. With tobit() in AER I get an error message:

Error in solve.default(L %*% V %*% t(L)) : 
Lapack routine dgesv: system is exactly singular: U[2,2] = 0

Thus a first question: do I have too many factors? However AER has a predict function. Given the data $Y_i$ (non-negative) I would like to make a prediction of the censored data – $E[Y|Y>0,X_i]$ – is this what the predict function of AER gives me?

How can I predict in the package censReg?

EDIT 1) as Achim Zeileis said, there is a data problem with my factors, they are 0 or NAN too often and the tobit algorithm cannot find a solution. I have to fix the data.

EDIT 2) Please look at the following example:

library(AER)
N = 10
f = rep(c("s1","s2","s3","s4","s5","s6","s7","s8"),N)
fcoeff = rep(c(-1,-2,-3,-4,-3,-5,-10,-5),N)
set.seed(100) 
x = rnorm(8*N)+1
beta = 5
epsilon = rnorm(8*N,sd = sqrt(1/5))
y.star = x*beta+fcoeff+epsilon ## latent response
y = y.star 
y[y<0] <- 0 ## censored response

fit <- tobit(y~0+x+f)
summary(fit)

coef(fit) # very satisfying estimates

my.range = range(y, y.star, predict(fit))

plot(y, ylim=my.range)
lines(predict(fit), col="red")
lines(y.star, col="blue")

As Achim Zeileis writes predict() predicts $Y^*$ but how can we predict $Y$ efficiently and in an unbiased fashion?

Best Answer

The predict() and fitted() methods for tobit objects compute the estimates for the latent mean $\mu = E[y^*] = x^\top \beta$. Additionally, the scale parameter $\sigma$ is available as $scale in the objects:

mu <- fitted(fit)
sigma <- fit$scale

The probability of a non-zero observation is then $P(y > 0) = \Phi(\mu/\sigma)$, i.e.:

p0 <- pnorm(mu/sigma)

The conditional expectation of the censored $y$ given that it is non-zero is $E[y | y > 0] = \mu + \sigma \cdot \lambda(\mu/\sigma)$, where $\lambda(\cdot)$ is the inverse Mills ratio $\lambda(x) = \phi(x)/\Phi(x)$:

lambda <- function(x) dnorm(x)/pnorm(x)
ey0 <- mu + sigma * lambda(mu/sigma)

Finally, the unconditional expectation is $E[y] = P(y > 0) \cdot E[y | y > 0]$, i.e.:

ey <- p0 * ey0

If you want to visualize everything together in a time series style plot:

plot(y, ylim = my.range)
lines(mu, col = "slategray")
lines(y.star, col = "black")
lines(ey0, col = "green")
lines(ey, col = "blue")

The reason that the predict() method for tobit objects does not provide all of this automatically is that for all the distributions other than the normal / Gaussian, the relationship is not that easy. But maybe we should at least support the normal case.

Related Question