I need to reshape A to dimension 6x200000.
I assume you mean you need to reshape the 1x6x200000 submatrix to 6x200000.
reshaping should always be near instantaneous. All it involves is changing the header of the matrix to store the new size of each dimension. The actual content of the matrix stays untouched and does not need shuffling around.
What would take time however is the slicing, the extracting of the 1x6x200000 matrix out of the 300xXxY full matrix since that involves going over the whole matrix and picking out every 300th value. I would think you may be able to get a small gain of performance if you swapped the order of the dimensions:
newbigmatrix = permute(bigmatrix, [2 3 1]);
for page = size(newbigmatrix, 3):-1:1
submatrix = newbigmatrix(:, :, page);
end
The advantage of that new order is that the elements of each submatrix are already contiguous so the slicing should be faster. It also cuts out the reshape but as said, that shouldn't be what took time in the first place.
Best Answer
The intuition that it would be optimal to permute the dimension you are transforming to be the last dimension is actually the opposite of what you would do if you could gain in performance from permutation. By permuting the dimension you are transforming to the front, then you are performing the FFT down contiguous chunks of memory, which is always going to be better than computing the FFT across strided elements (i.e., computing across anything but the first dimension).
In practice, though, when you permute an array you make a copy of it, and it is likely that the cost of this copy is non-trivial. For example, consider an input:
Transforming down the first dimension of this input is certainly faster than transforming down the third dimension:
However, that does not make permuting it a good idea. For example, consider the following:
The cost of the two permutations needed here is quite large relative to the amount of time spent in FFT.
If you have control over how you store the data from the start, then we recommend you to try storing it such that you can operate on the first dimension. This is in general a good idea for any operation -- whenever you can operate along the first dimension of a matrix/N-D array, you will be accessing contiguous memory and things will likely go faster. However, permuting just to perform the FFT down the first dimension will most likely not be a good idea -- it either will not make a tangible difference or it will make the code slower as a whole.