Solved – Ratios in Regression, aka Questions on Kronmal

interactionmodelingratioregressionweighted-regression

Recently, randomly browsing questions triggered a memory of on off-hand comment from one of my professors a few years back warning about the usage of ratios in regression models. So I started reading up on this, leading eventually to Kronmal 1993.

I want to make sure that I'm correctly interpreting his suggestions on how to model these.

  1. For a model with a ratio with the same denominator on both the dependent and independent side:
    $ Z^{-1}Y = Z^{-1}1_n\beta_0 + Z^{-1}X\beta_X + \beta_Z + Z^{-1}\epsilon $

    • Regress dependent ratio on the (inverse) denominator variable in addition to the other ratios
    • Weight by the (inverse) denominator variable
  2. For a model with dependent variable as a ratio:
    $ Y = \beta_0 + \beta_XX + Z1_n\alpha_0 + ZX\alpha_X + Z^{-1}\epsilon $

    • Regress numerator by original variables, denominator, and denominator times original variables [what about categorical variables?]
    • Weight by (inverse) denominator
  3. For model with only independent variable ratios:
    $ Y = \beta_0 + X\beta_X + Z^{-1}1_n\beta_{Z^{-1}} + W\beta_W + Z^{-1}W\beta_{Z^{-1}W} + \epsilon $

    • Include numerator and (inverse) denominator as main effects, ratio as interaction term.

Are my interpretations here correct?

Best Answer

You should really have linked to the Kronmal paper (and explained your notation, which is taken directly from the paper.) Your reading of the paper is too literal. Specifically, he does not give advice about weighting, rather saying that weighting can be done the usual ways, so no need to discuss. It is only mentioned as a possibility. Read your cases more like examples, especially as examples of how to analyze such situations.

In section 6 he does give some general advice, which I will cite here:

The message of this paper is that ratio variables should only be used in the context of a full linear model in which the variables that make up the ratio are included and the intercept term is also present. The common practice of using ratios for either the dependent or the independent variable in regression analysis can lead to misleading inferences, and rarely results in any gain. This practice is widespread and entrenched, however, and it may be difficult to convince some researchers that they should give up their most prized ratio or index.

The paper uses the (fictitious) example by Neyman on births and storks. To play with that example, you can access it from R by

data(stork, package="TeachingDemos")

I will leave the fun for the readers, but one interesting plot is this coplot:

conditioning plot for the  Neyman storks example

Related Question