Solved – Using PROC QLIM OUTPUT to get predicted values from a two-step Tobit model

predictionsastobit-regression

I am fitting a two-step Tobit model through PROC QLIM in SAS. The first step of the model is a probit model for whether someone "responds" (e.g. makes a donation). The second step of the model is linear for the amount (e.g. amount of the donation, given that someone made a donation). I am using a two-step Tobit model rather than a Tobit-1 model because in my actual data I suspect some selection bias in terms of who responds, and also because I may want to use different covariates for each step (presently, I am using the same ones).

Since PROC QLIM does not appear to support predict or score statements, I created dummy data in mydata by appending a copy of my dataset with the outcomes (response and amount) removed, while modifying the covariates such a way that I can to get predictions for a hypothetical dataset where test=0 throughout. Here is a sample of my code:

proc qlim data=mydata;
class test classvar1 classvar2 classvar3;
model response = test classvar1 classvar2 classvar3 test*classvar1 test*classvar2 test*classvar3 / discrete;
model amount = test classvar1 classvar2 classvar3 test*classvar1 test*classvar2 test*classvar3 / select(response=1);
output out=tempout conditional expected predicted prob mills;
run;

mydata has the following relevant fields:

  • response: 0/1 indicator of donation
  • amount: continuous value indicating the amount of donation; missing if response=0
  • test: 0/1 indicator of whether individual is in test or control group
  • classvar1 – classvar3: various categorical characteristics of individuals

What I am trying to get out of this is a predicted value that reflects each individual's expected donation amount, unconditional on whether they donated (so, the predicted value should include that probability of donation in some way). However, in the predicted values, I get only the following metrics related to amount:

  • P_amount (Predicted value of amount)
  • Expct_amount (Unconditional expected value of amount)

I do not get a "conditional" expected value of amount at all — instead, the P_amount and Expct_amount values above are equivalent to what I would expect the conditional expected value to be (and they are also equal to the Xbeta values for the amount model). In other words, in those predicted values, there does not appear to be any adjustment for the probability of response.

For other PROC QLIM models, such as a simple one-equation Tobit-1 model, I have seen both the conditional and unconditional expected values appear in output, and they differ from each other (i.e. the unconditional values are usually smaller, in some way related to the probability of response). Is there something I'm not specifying correctly that is causing me to get this output? The only clue I found in the logs is this:

Note: The Mills Ratio is not calculated for an ordinal discrete variable
or continuous variable without censoring or truncation

Happy to clarify further if needed. Thank you!

Best Answer

After consulting with SAS Support, it appears that PROC QLIM does not provide direct output to answer my question in the case of selection models like this one. However, other outputs can be used to compute what I seek. Here is the solution in case anyone else has this question.

In the output statement, I can request prob, xbeta, and mills:

output out=tempout xbeta mills prob;

And then, in a separate data step, I can calculate the following:

probability_of_response = (1-prob_response)*(response=0) + (prob_response)*(response=1)
amount_given_a_response = xbeta_amount + mills_response * SIGMA * RHO

Where:

  • SIGMA is the numeric value of _Sigma.amount from my model results
  • RHO is the numeric value of _Rho from my model results

Note that prob_response (output from PROC QLIM) is the "probability that the response equals this record's actual response," so it has to be inverted for non-responders in order to get the actual probability of responding. The second calculation follows the econometric formula:

E(y_i | z_i=1) = x_i * B + rho * sigma * (Mills ratio)

After creating these variables, I can simply multiply everyone's probability_of_response by their amount_given_a_response to get everyone's expected donation, accounting for their probability of donating rather than conditional on donating.