Solved – Tobit model explanation

tobit-regression

We have 100 participants in two groups, $n=50$ in each group. We used an assessment of ability of basic functioning at 4 time-points. The assessment comprises 6 questions, each scored 0 – 5. We do not have individual scores for each question, just total scores that range from 0 – 30. Higher scores indicate better functioning. The problem is that the assessment is very basic and has a significant ceiling effect. Results are very negatively skewed. The majority of participants scored close to 30, especially at the 3 follow-up time-points. It is likely that not all of the participants who scored at the upper limits are truly equal in ability: some of the participants were just about scoring 30 and others scored 30 with ease and would score much higher if it were possible and so the data are censored from above.

I want to compare the two groups and over time but obviously this is very difficult given the nature of the results. Transformations of any kind make no difference. I have been advised that the Tobit model is the best equipped for this assessment and I can run the analysis in R using examples from Arne Henningen’s paper, Estimating censored regression models in R using the censReg package.

However, I have only a basic knowledge of statistics and have found information on the Tobit model to be quite complicated. I need to be able to explain this model in plain language and I cannot find a plain language, nuts and bolts explanation as to what the Tobit model actually does and how. Can anyone explain the Tobit model or point me in the direction of a readable reference without complicated statistical and mathematical explanations?

Extremely grateful for any help

Best Answer

The wiki describes the Tobit model as follows:

$$y_i = \begin{cases} y_i^* &\text{if} \quad y_i^* > 0 \\ \ 0 &\text{if} \quad y_i^* \le 0 \end{cases}$$

$$y_i^* = \beta x_i + u_i$$

$$u_i \sim N(0,\sigma^2)$$

I will adapt the above model with to your context and offer a plain english interpretation of the equations which may be helpful.

$$y_i = \begin{cases}\ y_i^* &\text{if} \quad y_i^* \le 30 \\ 30 &\text{if} \quad y_i^* > 30 \end{cases}$$

$$y_i^* = \beta x_i + u_i$$

$$u_i \sim N(0,\sigma^2)$$

In the above set of equations, $y_i^*$ represents a subject's ability. Thus, the first set of equations state the following:

  1. Our measurements of ability is cut-off on the higher side at 30 (i.e., we capture the ceiling effect). In other words, if a person's ability is greater than 30 then our measurement instrument fails to record the actual value but instead records 30 for that person. Note that the model states $y_i = 30 \quad \text{if} \quad y_i^* > 30$.

  2. If on the other hand a person's ability is less than 30 then our measurement instrument is capable of recording the actual measurement. Note that the model states $y_i = y_i^* \quad \text{if} \quad y_i^* \le 30$.

  3. We model the ability, $y_i^*$, as a linear function of our covariates $x_i$ and an associated error term to capture noise.

I hope that is helpful. If some aspect is not clear feel free to ask in the comments.