So I have a matrix that is 330,000 observations = rows x 160 variables = columns. I'd like to compute the average distance between each observation in my matrix, but pdist() fails here, because apparently it would take 405.6 GB to store the distance matrix…
Is there a reasonable vectorized (or fast) way to do something like this?
Any thoughts appreciated.
Thanks.
Best Answer