MATLAB: Can I use Matlab to read in data that’s in an unusual layout

MATLABreadtabletextscan

I've been using this software called LAMMPS which is a molecular simulator and I want to extract certain pieces of information from it. The data is outputted in one of two ways. The first is in a .dat file and looks like this:
LAMMPS (7 Dec 2018)
Created orthogonal box = (0 -1 -0.25) to (50 11 0.25)
1 by 1 by 1 MPI processor grid
Created 1 atoms
Time spent = 6.12736e-05 secs
Created 1 atoms
Time spent = 0.000876665 secs
1 atoms in group fixed
1 atoms in group free
Per MPI rank memory allocation (min/avg/max) = 4.034 | 4.034 | 4.034 Mbytes
Step Time Temp TotEng E_pair v_2
0 0 0 -0.99995177 -0.99995177 29.4142
100 0.1 4111601.7 -0.94316771 -0.99993456 29.416675
200 0.2 142194.24 -0.99615598 -0.99811919 29.383619
300 0.3 3330578.7 -0.94969838 -0.99568203 29.367122
400 0.4 12247239 -0.8288457 -0.99793725 29.382028
500 0.5 2775369 -0.96146719 -0.99978534 29.405196
600 0.6 13813605 -0.80919796 -0.99991556 29.419406
700 0.7 3394332.4 -0.95195073 -0.99881459 29.437799
800 0.8 3690647.8 -0.94890506 -0.99986 29.407367
900 0.9 10817030 -0.85044571 -0.99979107 29.405362
1000 1 39449.019 -0.99796461 -0.99850926 29.441106
Loop time of 33.1504 on 1 procs for 50000000 steps with 2 atoms
Performance: 130315002188.459 ns/day, 0.000 hours/ns, 1508275.488 timesteps/s
91.9% CPU use with 1 MPI tasks x no OpenMP threads
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 3.4592 | 3.4592 | 3.4592 | 0.0 | 10.43
Neigh | 0.095202 | 0.095202 | 0.095202 | 0.0 | 0.29
Comm | 3.3945 | 3.3945 | 3.3945 | 0.0 | 10.24
Output | 7.3908 | 7.3908 | 7.3908 | 0.0 | 22.29
Modify | 10.004 | 10.004 | 10.004 | 0.0 | 30.18
Other | | 8.806 | | | 26.56
Basically loads of preable and post amble text but the information I want is in the middle. What I have been doing so far is manually cutting off the bottom and top bits of text so all I am left with in the file is:
Step Time Temp TotEng E_pair v_2
0 0 0 -0.99995177 -0.99995177 29.4142
100 0.1 4111601.7 -0.94316771 -0.99993456 29.416675
200 0.2 142194.24 -0.99615598 -0.99811919 29.383619
300 0.3 3330578.7 -0.94969838 -0.99568203 29.367122
400 0.4 12247239 -0.8288457 -0.99793725 29.382028
500 0.5 2775369 -0.96146719 -0.99978534 29.405196
600 0.6 13813605 -0.80919796 -0.99991556 29.419406
700 0.7 3394332.4 -0.95195073 -0.99881459 29.437799
800 0.8 3690647.8 -0.94890506 -0.99986 29.407367
900 0.9 10817030 -0.85044571 -0.99979107 29.405362
1000 1 39449.019 -0.99796461 -0.99850926 29.441106
And then I have used readtable to extract the information I want (which for me is the last column):
T = readtable('corrthermmid.dat');
X = T.v_2(:)-28;
Is there a way I could use something like textscan to get out this information without manually editing it?
What would be more useful for me actually is if anyone has any idea how I could use matlab to extract information presented in this way:
ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
2
ITEM: BOX BOUNDS mm mm pp
0.0000000000000000e+00 5.0000000000000000e+01
-1.0000000000000000e+00 1.1000000000000000e+01
-2.5000000000000000e-01 2.5000000000000000e-01
ITEM: ATOMS id type xs ys zs
1 1 0.3 0.5 0.5
2 2 0.588284 0.5 0.5
ITEM: TIMESTEP
100
ITEM: NUMBER OF ATOMS
2
ITEM: BOX BOUNDS mm mm pp
0.0000000000000000e+00 5.0000000000000000e+01
-1.0000000000000000e+00 1.1000000000000000e+01
-2.5000000000000000e-01 2.5000000000000000e-01
ITEM: ATOMS id type xs ys zs
1 1 0.3 0.5 0.5
2 2 0.588334 0.5 0.5
ITEM: TIMESTEP
200
ITEM: NUMBER OF ATOMS
2
ITEM: BOX BOUNDS mm mm pp
0.0000000000000000e+00 5.0000000000000000e+01
-1.0000000000000000e+00 1.1000000000000000e+01
-2.5000000000000000e-01 2.5000000000000000e-01
ITEM: ATOMS id type xs ys zs
1 1 0.3 0.5 0.5
2 2 0.587672 0.5 0.5
Where I have underlined the data points I would want to extract into a matrix, array or something similar.
Thanks for any help!

Best Answer

A little tedious to set up, but easily-enough handled for a regular file format such as this.
First you have to either know there's a fixed number of header lines to the data section in question or scan the file to find a marker line within the file that is consistent in its position relative to the beginning of the desired data. I this case for the first data set it appears the title line Step Time Temp TotEng E_pair v_2 is unique...
fmt1=repmat('%f',1,6); % first data section format
fid=fopen('yourfile.dat','r'); % open file
l=fgetl(fid); % read first line
while ~feof(fid) % loop through file by record
if strfind(l),'Step Time Temp TotEng E_pair v_2'), break, end % break when find first section
l=fgetl(fid); % next record
end
data=cell2mat(textscan(fid,'%')); % read the data; will fail on finding subsequent text line 'Loop...'
data=data(:,2)-28; % what's the -28 for???
To read the remaining sections, you just rinse and repeat similar logic to find the first timestep section, write a piece of code to parse that section and then place that code in a loop, catenating the desired data as you go.
It would help somebody to actually write a piece of code to attach a sample data file to work with, but that's the outline.