Solved – Bayesian approach and least-squares approach to multivariate regression with structural design

bayesiangeneralized-least-squaresleast squaresmultivariate regressionr

Assume for example a trivariate Gaussian model:
$$
{\boldsymbol Y}_1, \ldots, {\boldsymbol Y}_n \sim_{\text{iid}} {\cal N}_3\left({\boldsymbol \mu}, \Sigma\right) \quad (*)
$$
with ${\boldsymbol \mu} = (\mu_1,\mu_2,\mu_3)'$.

The Bayesian conjugate theory of this model is well known.
This model is the most simple case of a multivariate linear regression model.
And more generally, there is a well known Bayesian conjugate theory of multivariate linear regression, which is the extension to the case when the multivariate mean ${\boldsymbol \mu}= {\boldsymbol \mu}(x_i)$ is allowed to depend on the covariates $x_i$ of individual $i \in \{1, \ldots, n \}$, with linear constraints about the multivariate means ${\boldsymbol \mu}(x_i)$. See for instance Rossi & al's book accompanied by the crantastic bayesm package for R.
We know in addition that the Jeffreys prior is a limit form of the conjugate prior distributions.

Let us come back to the simple multivariate Gaussian model $(*)$. Instead of generalizing this model to the case of linearly dependent multivariate means ${\boldsymbol \mu}(x_i)$ depending on individuals $i=1,\ldots,n$, we can consider a more restrictive model by assuming linear constraints about the components $\mu_1$, $\mu_2$, $\mu_3$ of the multivariate mean ${\boldsymbol \mu}$.

Example

Consider some concentrations $x_{i,t}$ of $4$ blood samples $i=1,2,3,4$ measured at $3$ timepoints $t=t_1,t_2,t_3$.
Assume that the samples are independent and that the series of the three measurements
$(x_{i,t_1},x_{i,t_2},x_{i,t_3}) \sim {\cal N}_3\left({\boldsymbol \mu}, \Sigma\right)$ for each sample $i$ with a mean
${\boldsymbol \mu} = (\mu_1,\mu_2,\mu_3)'$ whose three components are
linearly related to the timepoints: $\mu_j = \alpha + \beta t_j$.

This example falls into the context of Multivariate linear regression with a within-design structure.
See for instance O'Brien & Kaiser 1985 and Fox & Weisberg's appendix

So my example is a simple example of this situation because there are only some predictors (the timepoints) for the components of the mean, but there are no predictors for individuals. This example could be written as follows:
$$
(**) \left\{\begin{matrix}
{\boldsymbol Y}_1, \ldots, {\boldsymbol Y}_n \sim_{\text{iid}} {\cal N}_3\left({\boldsymbol \mu}, \Sigma\right) \\
{\boldsymbol \mu} = X {\boldsymbol \beta}
\end{matrix}\right.
$$
with ${\boldsymbol \beta}=(\alpha, \beta)'$ and $X=\begin{pmatrix} 1 & t_1 \\ 1 & t_2 \\ 1 & t_3 \end{pmatrix}$ is the matrix of covariates for the components of the multivariate mean ${\boldsymbol \mu}$. The second line of $(**)$ could be termed as the within design, or the repeated measures design, or the structural design (I would appreciate if a specialist had some comments about this vocabulary).

I think such a model can be fitted as a generalized least-squares model, as follows in R :

gls(response ~ "between covariates" , data=dat, 
  correlation=corSymm(form=  ~ "within covariates" | individual ))

(after stacking the data in long format).

My first question is Bayesian: what about the Bayesian analysis of model $(**)$ and more generally the Bayesian analysis of the multivariate linear regression model with a structural design ?
Is there a conjugate family ? What about the Jeffreys prior ? Is there an appropriate R package to perform this Bayesian analysis ?

My second question is not Bayesian: I have recently discovered some possibilities of John Fox's great car package to analyse
such models with ordinary least squares theory (the Anova() function with the idesign argument — see Fox & Weisberg's appendix). Perhaps I'm wrong, but I am under the impression that this package only allows to get the MANOVA table (sum of squares analysis) with an orthogonal matrix $X$, and I'd like to get (exact) confidence intervals about the within-design parameters for an arbitrary matrix $X$, as well as prediction intervals. Is there a way to do so with R using ordinary least squares ?

Best Answer

Partial answer to question 1: I have not really checked but I think that one simply obtains a conjugate family for model $(**)$ by considering the normal distributions on the within subspace (the range of the $X$ matrix) as prior distributions (conditionally to $\Sigma$) for the within parameter ${\boldsymbol \beta}$. Practically it suffices to project the posterior distributions of model $(*)$ on the within subspace, and that should be the Jeffreys prior of model $(**)$ when taking the Jeffreys prior for model $(*)$.

Question 1bis: I am not entirely satisfied by the previous answer. What about the posterior distribution of $\Sigma$ ? (I will think later about this question because I'm too hungry now)

Partial answer to question 1bis: Indeed, the partial answer to question 1 seems to be wrong. The Jeffreys posterior for the covariance matrix the multinormal model is inverse-Wishart with a mean related to the frequentist estimate $\hat\Sigma$ of $\Sigma$ in model $(*)$. Here we should have instead the variation around the fitted values of each response (see question 2bis), which are not the means of each response when there is a within-design.

Partial answer to question 2: Since the within parameters are simply estimated by applying a linear projection to the estimated parameters of model $(*)$, the least-squares theory of model $(**)$ is inherited from the least-squares theory of model $(*)$. No ? And about $\hat\Sigma$ in model $(**)$ it should be $\frac{1}{n-1}({\boldsymbol y} - {\boldsymbol 1}_n \otimes X\hat{\boldsymbol \beta}){({\boldsymbol y} - {\boldsymbol 1}_n \otimes X\hat{\boldsymbol \beta})}'$ where $X \hat{\boldsymbol \beta} = P \hat{\boldsymbol \mu}^*$, $\hat{\boldsymbol \mu}^*$ is the estimate of ${\boldsymbol \mu}$ in model $(*)$, and $P$ is the projection on the range of $X$. In the example below, gls() returns $\frac{1}{n}({\boldsymbol y} - {\boldsymbol 1}_n \otimes X\hat{\boldsymbol \beta}){({\boldsymbol y} - {\boldsymbol 1}_n \otimes X\hat{\boldsymbol \beta})}'$ as the maximum likelihood estimate of $\Sigma$.

Below is a R code for the Bayesian analysis using the bayesm package, with the timepoints example of my OP.

################################
####     SIMULATED DATA     ####
################################
library(mvtnorm)

m <- 3 #  number of timepoints
n <- 24 #  number of individuals
timepoints <- c(1,2,3) 
dat <- data.frame(Subject=gl(n,m), timepoint=rep(timepoints,n))  

# model parameters 
alpha <- 0  # within intercept
beta <- 1   # within slope
Mean <- alpha+beta*timepoints
Sigma <- matrix(c(4,1,1,1,2,1,1,1,2), ncol=3)/2

# simulated responses
set.seed(666)
dat$response <- c(t(rmvnorm(n=n, mean=Mean, sigma=Sigma)))

# data in wide format
wdat <- reshape(dat, direction="wide", v.names="response", idvar="Subject", timevar="timepoint")


################################
## BAYESIAN ANALYSIS          ##
## WITH JEFFREYS PRIOR        ##
## IGNORING THE WITHIN DESIGN ##
################################
library(bayesm)  

 ##### set data parameters #####
 Y <- as.matrix(wdat[c("response.1","response.2","response.3")]) 
 X <- model.matrix(Y~1) # matrix of covariates
 p <- dim(X)[2] # number of covariates
 ##### (noninformative) prior parameters #####
 theta0 <- matrix(rep(0,p*m), ncol=m)
 A0 <- 0.0001*diag(p)
 nu0 <- 0
 Omega0 <- matrix(rep(0,m*m),ncol=m)
 ##### posterior simulations #####
 n.sims <- 10000 # number of simulations
 Beta.sims <- array(NA, dim=c(p,m,n.sims)) # for storing simulations of beta
 Sigma.sims <- array(NA, dim=c(m,m,n.sims)) # for storing simulations of Sigma
 for(sim in 1:n.sims){
    reg <- rmultireg(Y=Y,X=X,Bbar=theta0,A=A0,nu=nu0,V=Omega0)
    Beta.sims[,,sim] <- reg$B
     	Sigma.sims[,,sim] <- reg$Sigma
 }

> # compare expected posterior means with true means:
> rowMeans(Beta.sims, dims = 2)
         [,1]     [,2]     [,3]
[1,] 1.053459 1.956234 2.861015
> Mean
[1] 1 2 3
> # theoretically these are also the frequentist estimates:
> colMeans(Y)
response.1 response.2 response.3 
  1.045015   1.956583   2.858293 

> # compare posterior median Sigma with true Sigma:
> apply(Sigma.sims, c(1, 2), quantile, probs = 0.5)
          [,1]      [,2]      [,3]
[1,] 2.5455707 0.6033634 0.9018785
[2,] 0.6033634 1.4969095 0.6485045
[3,] 0.9018785 0.6485045 1.0620879
> Sigma
     [,1] [,2] [,3]
[1,]  2.0  0.5  0.5
[2,]  0.5  1.0  0.5
[3,]  0.5  0.5  1.0

##############################################
## PROJECTIONS OF THE POSTERIOR SIMULATIONS ##
## ON THE WITHIN-DESING SUBSPACE            ##
##############################################

# projection matrix:
X <- model.matrix(~1+timepoints)
P <- solve((t(X)%*%X))%*%t(X)
rownames(P) <- c("alpha","beta")
# projections of the posterior means are the posterior within parameters:
sims <- P%*%Beta.sims[1,,]

> # posterior summaries
> summary(data.frame(alpha=sims["alpha",], beta=sims["beta",]))
     alpha              beta       
 Min.   :-1.5637   Min.   :0.3387  
 1st Qu.:-0.1171   1st Qu.:0.8134  
 Median : 0.1472   Median :0.9038  
 Mean   : 0.1493   Mean   :0.9038  
 3rd Qu.: 0.4106   3rd Qu.:0.9953  
 Max.   : 2.0783   Max.   :1.5786

> # frequentist estimates
> P%*%colMeans(Y)
           [,1]
alpha 0.1400198
beta  0.9066386

# note: maximum likelihood estimates from gls() differ from "frequentist estimates" 
library(nlme)
fit <-  gls(response ~ timepoint,  data=dat, method="ML",  
  correlation=corSymm(form=  ~ timepoint | Subject),  
    weights=varIdent(form = ~1 | timepoint))      
> coef(fit)
(Intercept)   timepoint 
  0.1430734   0.9054123 

> ## the ML estimate of Sigma is the mean variation around the fitted values:
> # calculate this variation 
> alpha.f <- (P%*%colMeans(Y))["alpha",]
> beta.f <- (P%*%colMeans(Y))["beta",]
> fitted.frequentist <- alpha.f + beta.f*timepoints
> crossprod(Y-rep(1,n)%*%t(fitted.frequentist)) 
           response.1 response.2 response.3
response.1   54.45126   13.18208   19.65210
response.2   13.18208   32.16598   14.20923
response.3   19.65210   14.20923   22.76616
> # compare:
> getVarCov(fit) * n
Marginal variance covariance matrix
       [,1]   [,2]   [,3]
[1,] 54.451 13.182 19.652
[2,] 13.182 32.166 14.209
[3,] 19.652 14.209 22.766
  Standard Deviations: 7.3791 5.6715 4.7714 

Update

After these thoughts, I finally conjecture that the Jeffreys posterior is given by: \begin{align*} \Sigma \mid {\boldsymbol y} & \sim {\cal IW}_m(n, \Omega_n^{-1}) \\ {\boldsymbol \mu} \mid \Sigma, {\boldsymbol y} & \sim {\cal N}_{n \times m}\left(X\hat{\boldsymbol \beta}, \Sigma \otimes J_n^{-1} \right) \end{align*} with \begin{align*} J_n & = {(1,1\ldots,1)}'(1,1\ldots,1) \\ \Omega_n & = ({\boldsymbol y} -{\boldsymbol 1}_n \otimes X\hat{\boldsymbol \beta}){({\boldsymbol y} - {\boldsymbol 1}_n \otimes X\hat{\boldsymbol \beta})}' \end{align*} And then the posterior distributions of the within parameters are obtained by applying the projection $P$ on the within subspace to ${\boldsymbol \mu}$.

When there is no within design this is the Jeffreys posterior for the multinormal model.

Now I don't have the courage to check my conjecture.

Related Question