Solved – Ratio of explanatory variables in multiple regression

data transformationinteractionmultiple regressionreferencesregression

I wonder if anyone has any links or advice on specifying a ratio of two explanatory variables in a linear regression? That is, specifying the two independent variables plus their ratio. We have data where the ratio term seems to be significant. Are there any known issues regarding the functional form or any other issues (e.g., collinearity)?

I meant we are considering a model $y = ax_1 + bx_2 + cx_1/x_2$, where $y$, $x_1$, $x_2$ are all continuous and wondered if this was reasonable, or if there were some good references on assessing appropriate functional form of multiple covariates.

I think we were "inspired" to try a ratio of covariates as we saw a similar approach taken in this paper.

Best Answer

The "ratios" in the paper you cite were determined (according to the "Experimental procedures") as the difference between two measurements (cycles-to-threshold in polymerase chain reaction, PCR) that are related to logarithms of the underlying variables (amounts of messenger RNA, mRNA, for each of two gene transcripts). Since log(x1/x2)= log(x1) - log(x2), so you only have 2 linearly independent variables in this scale among x1, x2, and the ratio.

Log-transformed measurements of things like mRNA are often better behaved in statistical analyses than their linear-scale values. If applicable to your study, try regression using log(x1) and log(x2) as independent variables. If their ratio is "really" the important variable, then the regression coefficients will be close to equal in absolute magnitude and opposite in sign.

And if you are getting inspiration from that paper, also get inspired by the multi-stage discovery and validation process the authors used: discovery of candidates by microarray analysis of thousands of genes, technical validation of candidates by a different technology (PCR), and biological validation by manipulating the expression levels of the 2 genes used to form the "ratio" and finding results consistent with predictions. And even with such effort, the authors might today be expected to perform more thorough statistical validation of their model than they did for this study a decade ago.