Linear Model – Can Standard Deviations Be Modeled in a Linear Model?

generalized linear modelvariance

Is it possible to put standard deviations or variances into a linear model, as the data to be explained? I have a predictor which I think will linearly increase the standard deviation of a measure, and it is this variability that is of interest.

For each condition, I calculated the standard deviation, so that I have a vector of standard deviations which I'd like to model. I then fed this into a linear model

std_k( y_ik ) =  X_ij * beta_j + error_ij

where X is something like

[ 1  -2 
  1  -1
  1   0
  1   1
  1   2 ]

I realise that standard deviations are not normally distributed, so this isn't quite right. Can I transform the variable so that the error terms would be normally distributed? Or can I use a "generalised" linear model with a link function?

(I actually want to feed it into a mixed model, since several subjects perform the experiment. Each subject will have a different baseline variability, and I want to look at the variability across subjects by condition. I will also need to compare groups of subjects. Mixed model seems appropriate for that purpose)

Best Answer

It sounds like you are proposing essentially a two-stage least squares, where stage one reduces each cluster to its standard deviation about a cluster-specific mean. This seems fine, although note that you could actually model on the observational level, ie, let the variance for each observation be a linear function of covariates. Note that I don't know of any off-the-shelf software that would allow for exactly that.

Returning to the two-stage approach, if cluster $i=1,...,N$ are normally distributed, eg $Z_i \sim N(\mu_i, \rho^2_i)$ then the sample variances will be scale chi-square distributed with $N_i -1$ degrees of freedom. Letting $S^2_i$ denote the sample variance in cluster $i$, then $$S^2_i \sim \frac{\rho^2_i}{N_i-1} \times \chi^2(N_i-1).$$

In more detail, we have that \begin{align*} E S^2_i & = \rho^2_i, \\ Var S^2_i & = 2\frac{\rho_i^4}{N_i - 1}. \end{align*}

A gamma GLM assumes that $Var Y = \phi (E Y)^2$, so this might actually be a case for gamma regression, with an identity link! (Which is a first for me, I think.) If the $N_i$ differ very much, then you need precision weights $1/(N_i-1)$.