hi everyone, im having trouble to reach max speed reading huge csv files, and i would like to hear your ideas
i'm using datastore in order to pre – arrange 1910 headers, and than i want to fix my output to hold them in a specific struct, and start reading individually because not all 1910 headers are full of data. my main problem is that every header seperated with '_' and it makes it hard to read.
for example:
FruitDs = datastore(Fruits.csv) %im keeping it short, but the function is going fine
NumOfHeaders = length(FruitDs.Variablenames); %e.g Headers : "Apples_colour_S1_red_dated" "Apples_colour_S1_red_dated" "Apples_colour_S_green_dated"....
for n = 1:NumOfHeaders if strfind(Ds.VariableName{n},'Apples') tmp = Ds.VariableName{n}; A =strfind(tmp,'_'); tmp(A[1,3])='.'; Ds.SelectedVariableName = Ds.VariableName(n); ApplesData = readall(Ds); eval([tmp '= ApplesData']); % the struct i need is that out.Apples.colour_XXXXX will contain all data of the specific apple
Fruits.Apples = tmp; endend
this function works fine, so my questions as follows:
- is there any faster way to do it?
- do you have a smart and fast logical way to avoid reading empty headers (because 1910×390000 can be too much and not all of them are full (i filled them with NA in the datastore..))
- i have some cases which the headers are different only by number, and i do want to seperate them. let's say "Apples_colour_S1_….", "Apples_colour_S2_…". is there a way to avoid second loop (loop that runs over all the SX)?
thanks in advance
Best Answer