To follow up on @cardinal's comment: your $x$, $y$, and $z$ define a $(3 \times 3)$ correlation matrix $R$. Since a correlation matrix also is a possible covariance matrix (of standardized variables), it has to be positive definite. This is the case if all eigenvalues are $> 0$. If $R$ is indeed positive definite, then all vectors $\boldsymbol{s}$ of variances (i.e., numbers $> 0$) will turn $\boldsymbol{R}$ into a positive definite covariance matrix $\boldsymbol{\Sigma} = \boldsymbol{D}_{s}^{1/2} \boldsymbol{R} \boldsymbol{D}_{s}^{1/2}$, where $\boldsymbol{D}_{s}^{1/2}$ is the square root of the diagonal matrix made from $\boldsymbol{s}$.
So just construct $R$ from $x, y, z$, and check if the eigenvalues are all $> 0$. If so, you're good, and you can transform any set of data to have a corresponding covariance matrix with arbitrary variances:
x <- 0.5
y <- 0.3 # changing this to -0.6 makes it not pos.def.
z <- 0.4
R <- matrix(numeric(3*3), nrow=3) # will be the correlation matrix
diag(R) <- 1 # set diagonal to 1
R[upper.tri(R)] <- c(x, y, z) # fill in x, y, z to upper right
R[lower.tri(R)] <- c(x, y, z) # fill in x, y, z to lower left
eigen(R)$values # get eigenvalues to check if pos.def.
gives
[1] 1.8055810 0.7124457 0.4819732
So our $\boldsymbol{R}$ here is positive definite. Now construct the corresponding covariance matrix from arbitrary variances.
vars <- c(4, 16, 9) # the variances
Sigma <- diag(sqrt(vars)) %*% R %*% diag(sqrt(vars))
Generate some data matrix $\boldsymbol{X}$ that we will transform to later have exactly that covariance matrix.
library(mvtnorm) # for rmvnorm()
N <- 100 # number of simulated observations
mu <- c(1, 2, 3) # some arbitrary centroid
X <- round(rmvnorm(n=N, mean=mu, sigma=Sigma))
To do that, we first orthonormalize matrix $\boldsymbol{X}$, giving matrix $\boldsymbol{Y}$ with covariance matrix $\boldsymbol{I}$ (identity).
orthGS <- function(X) { # implement Gram-Schmidt algorithm
Id <- diag(nrow(X))
for(i in 2:ncol(X)) {
A <- X[ , 1:(i-1), drop=FALSE]
Q <- qr.Q(qr(A))
P <- tcrossprod(Q)
X[ , i] <- (Id-P) %*% X[ , i]
}
scale(X, center=FALSE, scale=sqrt(colSums(X^2)))
}
Xctr <- scale(X, center=TRUE, scale=FALSE) # centered version of X
Y <- orthGS(Xctr) # Y is orthonormal
Transform matrix $\boldsymbol{Y}$ to have covariance matrix $\boldsymbol{\Sigma}$ and centroid $\boldsymbol{\mu}$.
Edit: what's going on here: Do a spectral decomposition $\boldsymbol{\Sigma} = \boldsymbol{G} \boldsymbol{D} \boldsymbol{G}^{t}$, where $\boldsymbol{G}$ is the matrix of normalized eigenvectors of $\boldsymbol{\Sigma}$, and $\boldsymbol{D}$ is the corresponding matrix of eigenvalues. Now matrix $\boldsymbol{G} \boldsymbol{D}^{1/2} \boldsymbol{Y}$ has covariance matrix $\boldsymbol{G} \boldsymbol{D}^{1/2} Cov(\boldsymbol{Y}) \boldsymbol{D}^{1/2} \boldsymbol{G}^{t} = \boldsymbol{G} \boldsymbol{D} \boldsymbol{G}^{t} = \boldsymbol{\Sigma}$, as $Cov(\boldsymbol{Y}) = \boldsymbol{I}$.
eig <- eigen(Sigma)
A <- eig$vectors %*% sqrt(diag(eig$values))
XX1ctr <- t(A %*% t(Y)) * sqrt(nrow(Y))
XX1 <- sweep(XX1ctr, 2, mu, "+") # move centroid to mu
Check that the correlation matrix is really $\boldsymbol{R}$.
> all.equal(cor(XX1), R)
[1] TRUE
For other purposes, the question might now be: How do I find a positive definite matrix that is "very similar" to a pre-specified one that is not positive definite. That I don't know.
Edit: corrected some square roots
Let $R$ be the correlation matrix and $S$ the vector of standard deviations, so that $S\cdot S$ (where $\cdot$ is the componentwise product) is the vector of variances. Then
$$ \text{diag}(S) R \text{diag}(S) $$ is the covariance matrix. This is fully explained here.
This can be implemented in R
as
cor2cov_1 <- function(R,S){
diag(S) %*% R %*% diag(S)
}
but is inefficient. An efficient implementation is
cor2cov <- function(R, S) {
sweep(sweep(R, 1, S, "*"), 2, S, "*")
}
and you can test yourself they give the same result.
TRUTH= 0.8
R <- as.matrix(data.frame(c(1, TRUTH), c(TRUTH, 1)))
S = c(sqrt(1), sqrt(1))
cor2cov_1(R,S)
outer(S,S) * R
smat = as.matrix(S)
R * smat %*% t(smat)
Here is a microbenchmark showing the efficiency of the functions:
library(microbenchmark)
microbenchmark::microbenchmark(outer(S,S) * R ,cor2cov_1(R,S), cor2cov(R,S), R * smat %*% t(smat), times = 10000)
Unit: microseconds
expr min lq mean median uq max neval cld
outer(S, S) * R 1.968 2.214 2.724639 2.337 2.460 3611.362 10000 a
cor2cov_1(R, S) 1.722 1.886 2.778045 1.968 2.091 3743.259 10000 a
cor2cov(R, S) 113.037 116.071 125.844711 118.039 120.663 5462.020 10000 b
R * smat %*% t(smat) 1.066 1.230 1.422712 1.435 1.517 12.177 10000 a
Best Answer
Have you tried using the morphometric approaches of Strauss & Bookstein (1982)? It seems like this may give you a relatively straightforward way to compare your populations. Here's a really brief summary, but there's much more in the paper and other morphometric publications.
Strauss, R. E. and F. L. Bookstein. 1982. The truss: Body form reconstructions in morphometrics. Systematic Biology 31:113–135.