MATLAB: How to speed up Bag of Visual Words Calculation

bag of wordsfeature vectorsMATLABsiftspeedStatistics and Machine Learning Toolbox

I am building an image retrieval system where I would give an input image and the output would be the most similar images from the dataset. I have created a bag of visual word from my dataset with 2000 words. I followed the following process:
1. Extracted SIFT features from the images in the dataset and performed k-means clustering with k = 2000. Then I stored the centroids in a mat file.
2. For an input image, I am creating its bovw as follows:
indices_f=[];
for i=1:size(features,1) % this loop is taking too much time.
for j=1:size(centroids,1)
d(j)=pdist2(features(i,:),centroids(j,:)); % Compute distance from cluster centroids
end
[~,indices_f(i)]=min(d); %Compute Minimum Distance
end
sum=1;
feature_hist_local=zeros(1,2000);
temp=indices_f;
%Compute histogram
for m=1:length(temp)
feature_hist_local(temp(m))=feature_hist_local(temp(m))+1;
end
For my specific use case, the features matrix contains 3127 key points of dimension 128 each. The code is converting it into a bovw representation of size 1×2000. But this code is taking approximately 40 minutes to calculate the bovw representation on a machine with 32 GB RAM and 6 cores.
Is there a better way to do so ? Also, are there alternate approaches improvising upon bag of visual word approaches, since I plan to work with large datasets now.

Best Answer

Looks like I glanced over the code too quickly and missed the key point, perhaps...
for i=1:size(features,1)
for j=1:size(centroids,1)
d(j)=pdist2(features(i,:),centroids(j,:));
...
In the above you're computing the full array of comparative distances on a row-by-row basis but then you only save the results for one column; the last index for i; all the rest are thrown away. I presume this isn't really what you intend? If you're really looking for the minimum across all combinations, then you get the same result at far better performance with
d=pdist2(features,centroids);
and since you're looking later on for the minimum, then the previous comment regarding the optional parameters is useful
[d,idxMin]=pdist2(features,centroids,'euclidean','smallest',1);