Solved – Ideal transformation for consumption variable in a probit model

data transformationeconometrics

I am trying to figure out the best transformation of my consumption variable. I am running a probit regression to look at whether or not a household enrolls in health insurance. Consumption per capita is an independent variable and in my current model I use both consumption and consumption squared (two separate variables) to show that consumption increases but with diminishing returns. This makes for fairly straightforward interpretation. However, using the log of consumption is a slightly better fit because it normalizes the distribution and contributes a bit more to the overall R2 for the model but it is more difficult to interpret. Which would you suggest I use – log of consumption or consumption plus the quadratic function? My research is focused on health economics so I'm not sure what the preference is in that discipline. Any insight would be much appreciated. Thank you!

Best Answer

Using variables in logs is actually quite common in economics, since the estimated coefficients can be interpreted as sensitivities to relative changes in RHS variables (or elasticities, if both LHS and RHS variables are in logs). For example, say that you have model y = b ln(x), and x changes to x(1+r). Then you can use the approximation $ln(1+t) \approx t$ to see how y changes: $$y = b \ln(x(1+r)) = b \ln(x) + b \ln(1+r) \approx b \ln(x) + b r.$$ So if r is 0.01 (x increases by 1%), y increases by b r = 0.01 b (of course, this works only for small r). In case of your probit model, if coefficient for log-consumption is b, it can be interpreted so that increase in consumption by 1% would increase probability of enrollment by b %.

Related Question