MATLAB: Fast move in datastore (TabularTextDatastore)

datastoretabulartextdatastore

Hi,
I'm using TabularTextDatastore to iteratively read from several huge (~GB) text files. Is there a way how to move in a datastore say ~millions of rows without actually reading the data (some kind of offset)?
Note:
a) One step reading is impractical due the memory requirements.
b) Iterative reading of smaller portions requires some time…
Thanks for your help.
Adam

Best Answer

Hi Adam,
Unfortunately, there's not a way to do this. (And I don't mean with datastore--text files are linear things... to find the rows you have to read the rows.) Assuming there was an API for it, the underlying code would still need to read the data in between to know where each "row" starts and stops to skip the right data.
If you know more about what's in the file than just text (like if each row is 3000 characters long) you could possibly implement something with a custom datastore. Check the documentation here: Custom Datastore in R2017b
I have a suspicion you're looking for a performance improvement here, and you might see something by looking at the SelectedVariableNames. The data is still being read but no import happens for variables that aren't selected. So if you don't need all the variables, you could possibly skip some, saving memory and time.
ds = tabularTextDatastore(files...)
ds.SelectedVariableNames = {'a';'c'}; % Only import two variables
data = read(ds)
Hope this helps,
Jeremy
Related Question