Tobit Regression – How to Explain a Censored Regression Model

rregressiontobit-regression

I'm wondering how to explain the logSigma and its p.value in a censored regression model:

require(cenReg)
data( "Affairs", package = "AER" )
estResult <- censReg( affairs ~ age + yearsmarried + religiousness +
                           occupation + rating, data = Affairs )
summary(estResult)

Call:
censReg(formula = affairs ~ age + yearsmarried + religiousness + 
    occupation + rating, data = Affairs)

Observations:
         Total  Left-censored     Uncensored Right-censored 
           601            451            150              0 

Coefficients:
              Estimate Std. error t value  Pr(> t)    
(Intercept)    8.17420    2.74145   2.982  0.00287 ** 
age           -0.17933    0.07909  -2.267  0.02337 *  
yearsmarried   0.55414    0.13452   4.119 3.80e-05 ***
religiousness -1.68622    0.40375  -4.176 2.96e-05 ***
occupation     0.32605    0.25442   1.282  0.20001    
rating        -2.28497    0.40783  -5.603 2.11e-08 ***
logSigma       2.10986    0.06710  31.444  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Newton-Raphson maximisation, 7 iterations
Return code 1: gradient close to zero
Log-likelihood: -705.6 on 7 Df

My question is how to explain the logSigma and its significant p.value in the above model.

Best Answer

By default, the estimated standard deviation of the residuals ($\sigma$) is returned as $\ln(\sigma)$ since that is how the Tobit log likelihood maximization is performed. If you use coef(estResult,logSigma = FALSE), you will get $\sigma$ instead, which is analogous to the square root of the residual variance in OLS regression. That value can be compared to the standard deviation of affairs. If it is much smaller, you may have a reasonably good model. Or you can do the exponentiation yourself with a calculator and use delta method for the variance. You will also need $\sigma$ to construct some of the marginal effects.

I don't think the hypothesis test about $\ln \sigma$ and the corresponding p-value have a clear interpretation, whereas the other coefficients can be interpreted as the marginal effects on the uncensored outcome, so the p-value on the null that the ME is zero makes sense for them. I believe R is just treating $\ln \sigma$ as another parameter.

Here's my replication of your analysis in Stata (where I am also treating the categorical variables as continuous) confirming what I wrote above.

First we load the affairs data:

. ssc install bcuse
checking bcuse consistency and verifying not already installed...
all files already exist and are up to date.

. bcuse affairs

Contains data from http://fmwww.bc.edu/ec-p/data/wooldridge/affairs.dta
  obs:           601                          
 vars:            19                          22 May 2002 11:49
 size:        15,626                          
-------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
-------------------------------------------------------------------------------------------------------
id              int     %9.0g                 identifier
male            byte    %9.0g                 =1 if male
age             float   %9.0g                 in years
yrsmarr         float   %9.0g                 years married
kids            byte    %9.0g                 =1 if have kids
relig           byte    %9.0g                 5 = very relig., 4 = somewhat, 3 = slightly, 2 = not at
                                                all, 1 = anti
educ            byte    %9.0g                 years schooling
occup           byte    %9.0g                 occupation, reverse Hollingshead scale
ratemarr        byte    %9.0g                 5 = vry hap marr, 4 = hap than avg, 3 = avg, 2 = smewht
                                                unhap, 1 = vry unhap
naffairs        byte    %9.0g                 number of affairs within last year
affair          byte    %9.0g                 =1 if had at least one affair
vryhap          byte    %9.0g                 ratemarr == 5
hapavg          byte    %9.0g                 ratemarr == 4
avgmarr         byte    %9.0g                 ratemarr == 3
unhap           byte    %9.0g                 ratemarr == 2
vryrel          byte    %9.0g                 relig == 5
smerel          byte    %9.0g                 relig == 4
slghtrel        byte    %9.0g                 relig == 3
notrel          byte    %9.0g                 relig == 2
-------------------------------------------------------------------------------------------------------
Sorted by:  id

Here's the Stata equivalent of your censReg:

. tobit naffair age yrsmarr relig occup ratemarr , ll(0)

Tobit regression                                  Number of obs   =        601
                                                  LR chi2(5)      =      78.32
                                                  Prob > chi2     =     0.0000
Log likelihood = -705.57622                       Pseudo R2       =     0.0526

------------------------------------------------------------------------------
    naffairs |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.1793326   .0790928    -2.27   0.024    -.3346672    -.023998
     yrsmarr |   .5541418   .1345172     4.12   0.000     .2899564    .8183273
       relig |   -1.68622   .4037495    -4.18   0.000    -2.479165   -.8932758
       occup |   .3260532   .2544235     1.28   0.201    -.1736224    .8257289
    ratemarr |  -2.284973   .4078258    -5.60   0.000    -3.085923   -1.484022
       _cons |   8.174197   2.741432     2.98   0.003     2.790155    13.55824
-------------+----------------------------------------------------------------
      /sigma |    8.24708   .5533582                      7.160311    9.333849
------------------------------------------------------------------------------
  Obs. summary:        451  left-censored observations at naffairs<=0
                       150     uncensored observations
                         0 right-censored observations

Stata reports $\sigma$ rather than $\ln \sigma$, but we can take logs too:

. nlcom logSigma: ln(_b[/sigma])

    logSigma:  ln(_b[/sigma])

------------------------------------------------------------------------------
    naffairs |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    logSigma |   2.109859   .0670975    31.44   0.000     1.978351    2.241368

Note that this matches your R output. The z stat and the p-value are for the null that the log standard deviation of the residual is zero, which is definitely not the case here.

Here are the summary stats for the outcome for comparison to $\sigma$:

. sum naffairs        
        Variable |       Obs        Mean    Std. Dev.       Min        Max
    -------------+--------------------------------------------------------
        naffairs |       601    1.455907    3.298758          0         12

In this case, the model looks pretty bad, which is often the case with Tobit models, especially "toy" ones meant to illustrate syntax.

Related Question