Partial answer to question 1: I have not really checked but I think that one simply obtains a conjugate family for model $(**)$ by considering the normal distributions on the within subspace (the range of the $X$ matrix) as prior distributions (conditionally to $\Sigma$) for the within parameter ${\boldsymbol \beta}$. Practically it suffices to project the posterior distributions of model $(*)$ on the within subspace, and that should be the Jeffreys prior of model $(**)$ when taking the Jeffreys prior for model $(*)$.
Question 1bis: I am not entirely satisfied by the previous answer. What about the posterior distribution of $\Sigma$ ? (I will think later about this question because I'm too hungry now)
Partial answer to question 1bis: Indeed, the partial answer to question 1 seems to be wrong. The Jeffreys posterior for the covariance matrix the multinormal model is inverse-Wishart with a mean related to the frequentist estimate $\hat\Sigma$ of $\Sigma$ in model $(*)$. Here we should have instead the variation around the fitted values of each response (see question 2bis), which are not the means of each response when there is a within-design.
Partial answer to question 2: Since the within parameters are simply estimated by applying a linear projection to the estimated parameters of model $(*)$, the least-squares theory of model $(**)$ is inherited from the least-squares theory of model $(*)$. No ?
And about $\hat\Sigma$ in model $(**)$ it should be $\frac{1}{n-1}({\boldsymbol y} - {\boldsymbol 1}_n \otimes X\hat{\boldsymbol \beta}){({\boldsymbol y} - {\boldsymbol 1}_n \otimes X\hat{\boldsymbol \beta})}'$ where $X \hat{\boldsymbol \beta} = P \hat{\boldsymbol \mu}^*$, $\hat{\boldsymbol \mu}^*$ is the estimate of ${\boldsymbol \mu}$ in model $(*)$, and $P$ is the projection on the range of $X$. In the example below, gls()
returns $\frac{1}{n}({\boldsymbol y} - {\boldsymbol 1}_n \otimes X\hat{\boldsymbol \beta}){({\boldsymbol y} - {\boldsymbol 1}_n \otimes X\hat{\boldsymbol \beta})}'$ as the maximum likelihood estimate of $\Sigma$.
Below is a R code for the Bayesian analysis using the bayesm
package, with the timepoints example of my OP.
################################
#### SIMULATED DATA ####
################################
library(mvtnorm)
m <- 3 # number of timepoints
n <- 24 # number of individuals
timepoints <- c(1,2,3)
dat <- data.frame(Subject=gl(n,m), timepoint=rep(timepoints,n))
# model parameters
alpha <- 0 # within intercept
beta <- 1 # within slope
Mean <- alpha+beta*timepoints
Sigma <- matrix(c(4,1,1,1,2,1,1,1,2), ncol=3)/2
# simulated responses
set.seed(666)
dat$response <- c(t(rmvnorm(n=n, mean=Mean, sigma=Sigma)))
# data in wide format
wdat <- reshape(dat, direction="wide", v.names="response", idvar="Subject", timevar="timepoint")
################################
## BAYESIAN ANALYSIS ##
## WITH JEFFREYS PRIOR ##
## IGNORING THE WITHIN DESIGN ##
################################
library(bayesm)
##### set data parameters #####
Y <- as.matrix(wdat[c("response.1","response.2","response.3")])
X <- model.matrix(Y~1) # matrix of covariates
p <- dim(X)[2] # number of covariates
##### (noninformative) prior parameters #####
theta0 <- matrix(rep(0,p*m), ncol=m)
A0 <- 0.0001*diag(p)
nu0 <- 0
Omega0 <- matrix(rep(0,m*m),ncol=m)
##### posterior simulations #####
n.sims <- 10000 # number of simulations
Beta.sims <- array(NA, dim=c(p,m,n.sims)) # for storing simulations of beta
Sigma.sims <- array(NA, dim=c(m,m,n.sims)) # for storing simulations of Sigma
for(sim in 1:n.sims){
reg <- rmultireg(Y=Y,X=X,Bbar=theta0,A=A0,nu=nu0,V=Omega0)
Beta.sims[,,sim] <- reg$B
Sigma.sims[,,sim] <- reg$Sigma
}
> # compare expected posterior means with true means:
> rowMeans(Beta.sims, dims = 2)
[,1] [,2] [,3]
[1,] 1.053459 1.956234 2.861015
> Mean
[1] 1 2 3
> # theoretically these are also the frequentist estimates:
> colMeans(Y)
response.1 response.2 response.3
1.045015 1.956583 2.858293
> # compare posterior median Sigma with true Sigma:
> apply(Sigma.sims, c(1, 2), quantile, probs = 0.5)
[,1] [,2] [,3]
[1,] 2.5455707 0.6033634 0.9018785
[2,] 0.6033634 1.4969095 0.6485045
[3,] 0.9018785 0.6485045 1.0620879
> Sigma
[,1] [,2] [,3]
[1,] 2.0 0.5 0.5
[2,] 0.5 1.0 0.5
[3,] 0.5 0.5 1.0
##############################################
## PROJECTIONS OF THE POSTERIOR SIMULATIONS ##
## ON THE WITHIN-DESING SUBSPACE ##
##############################################
# projection matrix:
X <- model.matrix(~1+timepoints)
P <- solve((t(X)%*%X))%*%t(X)
rownames(P) <- c("alpha","beta")
# projections of the posterior means are the posterior within parameters:
sims <- P%*%Beta.sims[1,,]
> # posterior summaries
> summary(data.frame(alpha=sims["alpha",], beta=sims["beta",]))
alpha beta
Min. :-1.5637 Min. :0.3387
1st Qu.:-0.1171 1st Qu.:0.8134
Median : 0.1472 Median :0.9038
Mean : 0.1493 Mean :0.9038
3rd Qu.: 0.4106 3rd Qu.:0.9953
Max. : 2.0783 Max. :1.5786
> # frequentist estimates
> P%*%colMeans(Y)
[,1]
alpha 0.1400198
beta 0.9066386
# note: maximum likelihood estimates from gls() differ from "frequentist estimates"
library(nlme)
fit <- gls(response ~ timepoint, data=dat, method="ML",
correlation=corSymm(form= ~ timepoint | Subject),
weights=varIdent(form = ~1 | timepoint))
> coef(fit)
(Intercept) timepoint
0.1430734 0.9054123
> ## the ML estimate of Sigma is the mean variation around the fitted values:
> # calculate this variation
> alpha.f <- (P%*%colMeans(Y))["alpha",]
> beta.f <- (P%*%colMeans(Y))["beta",]
> fitted.frequentist <- alpha.f + beta.f*timepoints
> crossprod(Y-rep(1,n)%*%t(fitted.frequentist))
response.1 response.2 response.3
response.1 54.45126 13.18208 19.65210
response.2 13.18208 32.16598 14.20923
response.3 19.65210 14.20923 22.76616
> # compare:
> getVarCov(fit) * n
Marginal variance covariance matrix
[,1] [,2] [,3]
[1,] 54.451 13.182 19.652
[2,] 13.182 32.166 14.209
[3,] 19.652 14.209 22.766
Standard Deviations: 7.3791 5.6715 4.7714
Update
After these thoughts, I finally conjecture that the Jeffreys posterior is given by:
\begin{align*}
\Sigma \mid {\boldsymbol y} & \sim {\cal IW}_m(n, \Omega_n^{-1}) \\
{\boldsymbol \mu} \mid \Sigma, {\boldsymbol y}
& \sim {\cal N}_{n \times m}\left(X\hat{\boldsymbol \beta}, \Sigma \otimes J_n^{-1} \right)
\end{align*}
with
\begin{align*}
J_n & = {(1,1\ldots,1)}'(1,1\ldots,1) \\
\Omega_n & = ({\boldsymbol y} -{\boldsymbol 1}_n \otimes X\hat{\boldsymbol \beta}){({\boldsymbol y} - {\boldsymbol 1}_n \otimes X\hat{\boldsymbol \beta})}'
\end{align*}
And then the posterior distributions of the within parameters are obtained by applying the projection $P$ on the within subspace to ${\boldsymbol \mu}$.
When there is no within design this is the Jeffreys posterior for the multinormal model.
Now I don't have the courage to check my conjecture.
To start with I will advice you fit different models with different endpoints. Leaves CorL CorD FilL AntL AntW StaL StiW HeiP
. Why will you need a multiple endpoint model in the first place?. Among independent variables there is quite some multicollinearity like you have rightly said, is this something of concern?.Like you have indicated the VIF will help us with that info. Use the following R code to do VIF(variance inflation) regression before the with the selected variable you can use them for regression. Chose an appriopriate threshold for the VIF. I have used 5.
`
vif_func<-function(in_frame,thresh=10,trace=T){
require(fmsb)
if(class(in_frame) != "data.frame") in_frame<-data.frame(in_frame)
#get initial vif value for all comparisons of variables
vif_init<-NULL
for(val in names(in_frame)){
form_in<-formula(paste(val," ~ ."))
vif_init<-rbind(vif_init,c(val,VIF(lm(form_in,data=in_frame))))
}
vif_max<-max(as.numeric(vif_init[,2]))
if(vif_max < thresh){
if(trace==T){ #print output of each iteration
prmatrix(vif_init,collab=c("var","vif"),rowlab=rep("",nrow(vif_init)),quote=F)
cat("\n")
cat(paste("All variables have VIF < ", thresh,", max VIF ",round(vif_max,2), sep=""),"\n\n")
}
return(names(in_frame))
}
else{
in_dat<-in_frame
#backwards selection of explanatory variables, stops when all VIF values are below "thresh"
while(vif_max >= thresh){
vif_vals<-NULL
for(val in names(in_dat)){
form_in<-formula(paste(val," ~ ."))
vif_add<-VIF(lm(form_in,data=in_dat))
vif_vals<-rbind(vif_vals,c(val,vif_add))
}
max_row<-which(vif_vals[,2] == max(as.numeric(vif_vals[,2])))[1]
vif_max<-as.numeric(vif_vals[max_row,2])
if(vif_max<thresh) break
if(trace==T){ #print output of each iteration
prmatrix(vif_vals,collab=c("var","vif"),rowlab=rep("",nrow(vif_vals)),quote=F)
cat("\n")
cat("removed: ",vif_vals[max_row,1],vif_max,"\n\n")
flush.console()
}
in_dat<-in_dat[,!names(in_dat) %in% vif_vals[max_row,1]]
}
return(names(in_dat))
}
}`
I called you data set dep
you can run it as follows
keep.dat<-vif_func(in_frame=dep,thresh=5,trace=T)
thresh
is the threshold for VIF you can use 10 it depends on what you want.
Here are the result of the good variables to use for regression independent of dependent variables you want to use.
var vif
Elev 2.34467892681204
pH 4.82456111736694
OM 9.60381685354609
P 6.09927871325235
K 6.55185481475336
SoilTemp 9.76265101226991
AirTemp 10.5786139945657
removed: AirTemp 10.57861
var vif
Elev 2.10463008453203
pH 3.12149022393123
OM 5.09603402410369
P 5.56333186256743
K 6.54812601407721
SoilTemp 1.11028184053362
removed: K 6.548126
This should be a good place to start.
Best Answer
You can try the PLS regression which is fit for datasets with multiple dependant variables.
Q1 : For a multivariate linear model, combine it with the clusterwise approach.
Q2 : PLS is built to work with many independant variables, even with more than observations.
Here there is a R package available.
We're working on another open source implementation which will improve it, it will be available soon.