Solved – ZCA-Whitening on ImageNet sized dataset

data transformationdataset

I would like to test ZCA-whitening on the ImageNet2015 dataset.

The ImageNet2015 dataset consists of over 8 million images and cannot fit in in memory.

What are the best practice methods for ZCA-Whitening a large datasets that cannot fit in memory?

Best Answer

I believe many object detection pipelines resize the ImageNet images to something like ~200x200, probably a bit bigger. AlexNet was 224x224 for example. So there are 224*224*3 = 150,528, 150K let's say, features. Your feature covariance matrix will be (150K, 150K) shaped.

First you have to compute this feature covariance matrix. This will involve a computation of $X^T X$ if $X$ has shape $(n, d)$ where $n \approx $ 8 million and $d \approx $ 150K. You can conceivably perform this matrix computation through mass parallelization as it is merely just inner products.

A priori, this feature covariance matrix is dense. Perhaps if it was sparse, you could cook up an algorithm that scales linearly with number of non-zeros or something to this effect. For example Facebook uses a Fast Randomized SVD for their friend adjacency matrix, but this is clearly going to be sparse as most people aren't friends with the ~ 1 billion other Facebook users.

Maybe you could look into the Spark platform? I vaguely remember them having some benchmarks on SVD for the Netflix dataset: Distributed SVD using Spark. I haven't looked too closely into this. Note that their Netflix example is still an order of magnitude (possibly magnitudes?) less than what you would want to do and also seems to leverage sparsity: one of their example has .04% of the entries being non-zero. I'm not too familiar with their actual implementation of the Distributed SVD, but it seems like they are leveraging sparsity to fight the dimensionality.

For a dense, large scale method, I'm not sure. Perhaps you can look into sparse approximation methods for large covariance matrices? Here's a link Sparse estimation of a covariance matrix by Bien and Tibshirani. So you could try finding a sparse approximation to your dense feature covariance matrix, apply a method that leverages sparsity to compute a distributed SVD/eigendecomposition.

EDIT: You can also try performing ZCA whitening in smaller blocks, say 8x8 or 16x16 as natural images have nice local properties and far away pixels might not be highly correlated.

Hope this helps, good luck.