I would like to do a linear regression using polyfit, but only on part of the dataset. I have 2 arrays, Wavelength (x axis) and Flux (y axis). I would like to regress the data in the range of Wavelength >1515 & Wavelength < 1750, and then find the slope of the trend line that unites the fluxes (y values) in this range. I do to know how to restrict my data set in this way (without importing the data again!). I tried scaling my axes, but the polyfit function still considered all values in my dataset.
Here is what I have so far:
if true % code
%%Initialize variables.
filename = '/Users/lexiwilson/Documents/SURF/DataIrradiance/DEC/WSD_26DEC/WAIS1226201500166.asd.irr.pco.txt';delimiter = {'\t',' '};startRow = 39;datetime = strcat('/Users/lexiwilson/Documents/SURF/DataIrradiance/DEC/WSD_26DEC/','122615_','00:33:56'); %%Read columns of data as strings:
% For more information, see the TEXTSCAN documentation.
formatSpec = '%s%s%[^\n\r]';%%Open the text file.
fileID = fopen(filename,'r');%%Read columns of data according to format string.
% This call is based on the structure of the file used to generate this
% code. If an error occurs for a different file, try regenerating the code
% from the Import Tool.
textscan(fileID, '%[^\n\r]', startRow-1, 'ReturnOnError', false);dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'MultipleDelimsAsOne', true, 'ReturnOnError', false);%%Close the text file.
fclose(fileID);%%Convert the contents of columns containing numeric strings to numbers.
% Replace non-numeric strings with NaN.
raw = [dataArray{:,1:end-1}];numericData = NaN(size(dataArray{1},1),size(dataArray,2));for col=[1,2] % Converts strings in the input cell array to numbers. Replaced non-numeric
% strings with NaN.
rawData = dataArray{col}; for row=1:size(rawData, 1); % Create a regular expression to detect and remove non-numeric prefixes and
% suffixes.
regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\,]*)+[\.]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\,]*)*[\.]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)'; try result = regexp(rawData{row}, regexstr, 'names'); numbers = result.numbers; % Detected commas in non-thousand locations.
invalidThousandsSeparator = false; if any(numbers==','); thousandsRegExp = '^\d+?(\,\d{3})*\.{0,1}\d*$'; if isempty(regexp(thousandsRegExp, ',', 'once')); numbers = NaN; invalidThousandsSeparator = true; end end % Convert numeric strings to numbers.
if ~invalidThousandsSeparator; numbers = textscan(strrep(numbers, ',', ''), '%f'); numericData(row, col) = numbers{1}; raw{row, col} = numbers{1}; end catch me end endend%%Replace non-numeric cells with NaN
R = cellfun(@(x) ~isnumeric(x) && ~islogical(x),raw); % Find non-numeric cells
raw(R) = {NaN}; % Replace non-numeric cells
%%Allocate imported array to column variable names
Wavelength = cell2mat(raw(:, 1));Flux = cell2mat(raw(:, 2));%%Plot wavelength vs irradiance
figure()plot(Wavelength, Flux);title(filename);xlabel('Wavelength (nm)');ylabel('Irradiance (W/m^2)');axis([350,2200,-0.5,2]);%zoom to 1.6 micron window
figure()plot(Wavelength, Flux);title(filename);xlabel('Wavelength (nm)');ylabel('Irradiance (W/m^2)');axis([1374,1838,-0.05,0.15]);Ystartindx = find(Wavelength == 1515); %index of wavelength = 1515nm
Ystart = Flux(Ystartindx); %corresponding flux
Yendindx = find(Wavelength == 1750); %index of wavelength = 1750nm
Yend = Flux(Yendindx);%corresponding fluxhold on;%make linear fit and print slope to console
waverange = find(Wavelength > 1515 & Wavelength < 1750);fluxrange = find(Flux > Ystart & Flux < Yend);P = polyfit(waverange,fluxrange,1);fit = P(1)*waverange + P(2);plot(waverange,fit,'k');disp(P(1)); %print slope to console
%save plot in directory as jpeg
%saveas(gcf,datetime,'jpeg');
%%Clear temporary variables
clearvars filename delimiter startRow formatSpec fileID dataArray ans raw numericData col rawData row regexstr result numbers invalidThousandsSeparator thousandsRegExp me R;end
There errors I get claim that my arrays waverange & fluxrange are not the same size (which, they aren't). How can I make them the same size, and restrict the X & Y values to a range in the middle of my data set?
Best Answer