MATLAB: Alternative to circshift function

circshiftr2016b

In reference to this question.
The documentation states that there was a modification during R2016b. I was wondering if there is any update to alternative ways to optimize for time since I end up calling this function almost 3.85 million times. Bear in mind that I did not originally make this code so I'm trying to understand how to use this function within the loop. Also, I know I could try to reduce the iterations, but I haven't yet figured that part out since there is a lot of math operations within this nested loop. I have a shift that is not constant, therefore the answer to the referenced question is not going to solve this problem.
Note: Sorry for the lack of details. I tried to keep it simple, the variable names and function names would add unnecessary complexity for those not familiar with it. Basically, if anyone can simply provide an alternative method to using circshift, I would appreciate it!
a_matrix = zeros(num_p_considered,len); %pre-allocate
for m = 1:6
kk = a_vector(m); %length is 6
[complex_vec(m), a_vector2(m)] = a_function(var); % 6x1 vector outputs
a_scalar = ceil(a_vector2(m));
a_scalar2 = floor((a_scalar - a_vector2(m)) * 1000);
if a_scalar2 == 0
a_scalar2 = 1000;
end
a_vector3 = circshift(a__predefined_matrix(:,a_scalar2), a_constant + a_scalar); % 81880x1 vector output
% some other math operations that use above stuff
end
Edit
I was able to make a custom function in a very simple format. There exists a vector of length 81880, to which I want to circularly shift and truncate to length of 8000. Here is the code snippet:
tempN = k2 + tmp_tau_m_l_integer2;single_vec(1 : (8000 - tempN)) ];
tmp_p_t3 = [single_vec(81880-tempN+1:81880); single_vec(1:8000-tempN) ];
% tmp_p_t3 is [8000x1]
tmp_p_t_m2 = circshift( single_vec, k2 + integer2);
tmp_p_t2(kth_element2,:) = abs( tmp_p_t_m2(1:len))';
% tmp_p_t2 is [8000x1]
for n = 1:8000
OK_if_true(n) = max(max(max(max(abs((tmp_p_t3(n) - tmp_p_t2(kth_element,n))^2))))) < 1e-3;
a = find(OK_if_true); % identify the ones exact
end
where kth_element is the current inner loop, k2 is the current outer loop, integer2 is the variable that changes every iteration, len = 8000, and single_vec is a constant set of numbers. Essentially, single_vec is a list of "guesses" in double precision and we are circularly shifting to guess in different ways.
Lastly, I use the last chunk of the code to compare for exactness. Unfortunately, all but elements 3437-3519 are true that the difference is either minute or they are exactly the same. So how could my indexing operation be incorrect in only 87 elements within the vector that was concatenated?
To include some more detail. I can change the 'eps' variable to 1e-7 and I see that there are 56 matching elements. A similar structured code in all integer format would prove that this should work. I'm not sure what is going on here.
Here is a simple example of the same structure by itself:
a = (2.7 : .22245786443 : 7.68).'; % length: 23
n = 5;
tic
b = circshift(a,n);
b = b(1:20); % truncate to length of 20
toc
tic
c = [ a(23-n+1 : 23) ; a(1 : 20-n) ]; % notice the structure here with the one above
toc
OK_if_true = max(max(max(max(abs(b - c))))) < eps

Best Answer

According to the documentation all you have to change for a perfect backward compatibility is:
a_vector3 = circshift(a__predefined_matrix(:,a_scalar2), a_constant + a_scalar, 1);
But as long as the input is a vector, omitting the dimension will produce the same result.
Before you try to optimize circshift (which can be done by inlining the code and removing the input checks), use the profile'r to find the bottlenecks of the code. If the main work is done inside a_function and circshift needs only 2% of the runtime, incresing its speed by a factor of 2 will gain in a total speedup of 1% only. Do not waste programming time with optimizing marginal functions.
The inlined circshift:
nShift = a_constant + a_scalar;
len = size(a__predefined_matrix, 1);
index = mod((0:len-1)-nShift, len) + 1;
a_vector3 = a__predefined_matrix(index, a_scalar2);