I have a Tobit model of the form:
\begin{align}
Y^*_i &= X_i\beta + \epsilon_i \\
Y_i &= \max(Y^*_i,0).
\end{align}
The regressors are one continuous variable and 30 factors (modelling seasonal effects during the day). In my sample there are among $3116$ observations $194$ left-censored and $2922$ uncensored.
I tried the R packages AER
and censReg
. With tobit()
in AER
I get an error message:
Error in solve.default(L %*% V %*% t(L)) :
Lapack routine dgesv: system is exactly singular: U[2,2] = 0
Thus a first question: do I have too many factors? However AER
has a predict function. Given the data $Y_i$ (non-negative) I would like to make a prediction of the censored data – $E[Y|Y>0,X_i]$ – is this what the predict function of AER
gives me?
How can I predict in the package censReg
?
EDIT 1) as Achim Zeileis said, there is a data problem with my factors, they are 0 or NAN too often and the tobit
algorithm cannot find a solution. I have to fix the data.
EDIT 2) Please look at the following example:
library(AER)
N = 10
f = rep(c("s1","s2","s3","s4","s5","s6","s7","s8"),N)
fcoeff = rep(c(-1,-2,-3,-4,-3,-5,-10,-5),N)
set.seed(100)
x = rnorm(8*N)+1
beta = 5
epsilon = rnorm(8*N,sd = sqrt(1/5))
y.star = x*beta+fcoeff+epsilon ## latent response
y = y.star
y[y<0] <- 0 ## censored response
fit <- tobit(y~0+x+f)
summary(fit)
coef(fit) # very satisfying estimates
my.range = range(y, y.star, predict(fit))
plot(y, ylim=my.range)
lines(predict(fit), col="red")
lines(y.star, col="blue")
As Achim Zeileis writes predict()
predicts $Y^*$ but how can we predict $Y$ efficiently and in an unbiased fashion?
Best Answer
The
predict()
andfitted()
methods fortobit
objects compute the estimates for the latent mean $\mu = E[y^*] = x^\top \beta$. Additionally, the scale parameter $\sigma$ is available as$scale
in the objects:The probability of a non-zero observation is then $P(y > 0) = \Phi(\mu/\sigma)$, i.e.:
The conditional expectation of the censored $y$ given that it is non-zero is $E[y | y > 0] = \mu + \sigma \cdot \lambda(\mu/\sigma)$, where $\lambda(\cdot)$ is the inverse Mills ratio $\lambda(x) = \phi(x)/\Phi(x)$:
Finally, the unconditional expectation is $E[y] = P(y > 0) \cdot E[y | y > 0]$, i.e.:
If you want to visualize everything together in a time series style plot:
The reason that the
predict()
method fortobit
objects does not provide all of this automatically is that for all the distributions other than the normal / Gaussian, the relationship is not that easy. But maybe we should at least support the normal case.