MATLAB: Way of conserving memory when extracting data from CSV

csvextractmemory

Hi everybody I have few questions. I have some HUGE CSV files which I need in Matlab for analysis. The CSV it self has 5 columns. The columns of relevance are:
Column 1 is our date starting from early 2007 all the way till till mid 2011 in the form of mm/dd/yyyy.
Column 3 is our respective prices
Column 5 is the number of trades.
The questions I have are these:
1) How can I extract these 3 columns into a Matrix in MATLAB without taking too much memory (bear in mind that some of these CSV files have around 60 million rows)? Is there a way to decrease the memory of each cell Matlab allocates for the matrix? Please help with code.
2) How can I extract all the information into a non-string matrix (for analysis) for a specific year….ie only for 2009. So I would require to store in Matrix all information for 2009 (bearing in mind the memory limitations in 1).
Thanks so much.

Best Answer

Something like this will do it
function mate2u
day_number = zeros( 60*1e6, 1, 'uint16' ); % day_number = 1 for 1/1/2007
price = zeros( 60*1e6, 1, 'uint32' ); % 1/100 of cents
volume = zeros( 60*1e6, 1, 'uint16' ); % volume
pivot_day = datenum( '1/1/2007', 'mm/dd/yyyy' );
chunk_size = 10; % choose 5*1e6
fid = fopen( 'mate2u.txt' );
while not( feof( fid ) )
cac = textscan( fid, '%s%*s%f32%*s%u16', chunk_size, 'Delimiter', ',' );
uint16( datenum( cac{1}, 'mm/dd/yyyy' ) - pivot_day )
uint32( cac{2}*10000 )
cac{3}
end
fclose( fid );
end
where mate2u.txt is
04/29/2008,38:52.0,71.35,CTN08,2
04/29/2008,38:53.0,71.35,CTN08,2
04/29/2008,38:56.0,71.35,CTN08,3
04/29/2008,38:56.0,71.35,CTN08,1
04/29/2008,38:56.0,71.35,CTN08,1
04/29/2008,38:57.0,71.35,CTN08,1
prints to command window
ans =
484
484
484
484
484
484
ans =
713500
713500
713500
713500
713500
713500
ans =
2
2
3
1
1
1
>>