Hi All,
My apologies. Yet another question on how to vectorize a calculation. What I am trying to do, is recreate a script I wrote 15 years (!) ago in IDL, in Matlab, full description of the why and how here https://doi.org/10.1080/01431160500497119.
In short: I am bootstrapping a stepwise regression model, selecting the best predictor on statistics within the bootstrap. The current script results in the sort of output I am looking for. However, running it takes over an hour for just one response factor & one dataset. Taking that I have 10 response factors, and several datasets in this one study alone, I need this to become a lot quicker so it can become part of my standard data processing.
I know vectorizing is the way to go, but being a newbe, I cannot get my head around how to handle this. Someone willing to lend me a hand? I think that the actual regression is the one to be optimized, vectorizing for the "band" loop (Is that how you would refer to it?):
model = fitglm(predictors(train, [selectedbands,band]), response(train)); stats(thispredictor,band) = model.Rsquared.Ordinary;
I am not sure whether it is possible though: There are about 2000 wavebands in there. In other words: for each bootstrap iteration, 2k regressions are run. And I am running 10k iterations for each predictor.
The process it goes through (Full function in the attached file):
% Create the regression model one predictor at a time,
% as I want to bootstrap the sensitivity of each predictor
for thispredictor = 1:maxpredictors % Bootstrap loop
for iteration = 1:iterations % create the statistics for each iteration
stats = zeros(maxpredictors,numbands); outdata = zeros(numbands); % create a subset for training and testing
train = datasample(datarange, numsamples); % Correlation at each waveband
for band = 1:numbands % Set the dataset
predictorset = predictors(train, [selectedbands,band]); responseset = response(train); model = fitglm(predictors(train, [selectedbands,band]), response(train)); stats(thispredictor,band) = model.Rsquared.Ordinary; end %band loop
% Select the best band for the model
[maxrsq,maxband] = max(stats(thispredictor,:)); bandcount(thispredictor,maxband) = bandcount(thispredictor,maxband)+1; end %iterations loop
[frequency, peakband] = max(bandcount(thispredictor,:)); selectedbands = [selectedbands,peakband]; end %predictors loop
Best Answer