MATLAB: Faster codes in a loop

for loop

Hi,
I have a sequence of dates and firms. For each firm I want to compute the median return on equity for the peers that are in the same state excluding the firm itself. I wrote the following but it is taking ages since I am running the loop for each row in the dataset. I am wondering if there is a way to make it faster:
% ROEpeer is a vector of nans;
% ROApeer is a vector of nans;
%ROE is the return on equity for the firm
%ROA is the return on asset for the firm
%state is the state in which the firm operates
%ID is the ID of the firm
% year is the year of the financial statements
% Quarter is the quarter of the financial statements;
for i=1:length(ROEpeer)
x0a=find(Year==Year(i) & Quarter==Quarter(i) & State==State(i) & ID~=ID(i)& ~isnan(ROE));
x0b=find(Year==Year(i) & Quarter==Quarter(i) & State==State(i) & ID~=ID(i)& ~isnan(ROA));
if length(x0a)>2
ROEpeer(i)=median(ROE(x0a));
end
if length(x0b)>2
ROApeer(i)=median(ROE(x0b));
end
end

Best Answer

Are you sure that ROEpeer and ROApeer is pre-allocated properly?
isnumROE = ~isnan(ROE);
isnumROA = ~isnan(ROA);
for i = 1:length(ROEpeer)
tmp = (Year==Year(i) & Quarter==Quarter(i) & State==State(i) & ID~=ID(i));
x0a = (tmp & isnumROE);
x0b = (tmp & isnumROA);
if sum(x0a) > 2
ROEpeer(i) = median(ROE(x0a));
end
if sum(x0b) > 2
ROEpeer(i) = median(ROE(x0b));
end
end
Try this at first. It avoid to calculate ~isnan(ROE) in each iteration and determines the caomparison of Year, Quarter, State and ID once only. Omitting the find() allows for a faster "logical indexing". Perhaps this runs in the half time. But what does the profiler tell you about the bottleneck of the code? Is it median? Then improving the loop will not be very successful.
For optimizing the run time, test data are useful. Otherwise it is just some guessing and avoiding the repeated calculation of the same results.
Another idea: if x0a has more then one element, the median is calculated multiple times for the different i. Does this work:
isnumROE = ~isnan(ROE);
isnumROA = ~isnan(ROA);
for i = 1:length(ROEpeer)
tmp = (Year==Year(i) & Quarter==Quarter(i) & State==State(i) & ID~=ID(i));
x0a = (tmp & isnumROE);
x0b = (tmp & isnumROA);
if sum(x0a) > 2 && isnan(ROEpeer(i))
ROEpeer(x0a) = median(ROE(x0a));
end
if sum(x0b) > 2 && isnan(ROApeer(i))
ROEpeer(x0b) = median(ROE(x0b));
end
end
Now all ROEpeer(x0b) are replaced by the median at once and not repeatedly.
How are Yearm Quarter, State and ID defined? Do neighboring elements have the same value usually or are the data mixed? Are the values sorted?
Related Question