MATLAB: Converting a .txt into a series of matrices

chemistryfscanflammpsMATLABtext inputtext;

Hello, I am new matlab user and I need to convert mol2 files (a text document that stores data on positions of atoms in a molecule) into multiple matrices, so that I can manipulate the data. A sample mol2 is shown below. I've also attached the full mol2 file.
@<TRIPOS>MOLECULE
*****
83 89 0 0 0
SMALL
GASTEIGER
@<TRIPOS>ATOM
1 C -2.1071 -0.8238 0.0543 C.ar 1 LIG1 -0.0157
2 C -0.8284 -1.4433 0.0053 C.ar 1 LIG1 -0.0265
3 C 0.3551 -0.6761 -0.0339 C.ar 1 LIG1 0.0903
4 C 0.2486 0.7084 -0.0225 C.ar 1 LIG1 0.0691
5 C -0.9965 1.3355 0.0209 C.ar 1 LIG1 -0.0355
@<TRIPOS>BOND
1 1 2 ar
2 2 3 ar
3 3 4 ar
4 4 5 ar
5 5 6 ar
I want to convert the mol2 file into 3 arrays holding the information in @<TRIPOS>MOLECULE, @<TRIPOS>ATOM, and @<TRIPOS>BOND. For example, I want the "Molecule" array to look like {[83 89 0 0 0], SMALL, GASTEIGER}, and the "Atom" array to look like {[1, C, -2.1071, -0.8238, 0.0543, C.ar, 1, LIG1, -0.0157]...}.
Any help would be greatly appreciated.

Best Answer

Here is one way to achieve this:
fileLocator = 'Bip_LAMMPS_Pract.txt' ;
content = fileread( fileLocator ) ;
tokens = regexp( content, 'MOLECULE\s*\*+([\s\d]+)([^@]+)', 'tokens' ) ;
text = strtrim( regexprep( tokens{1}{2}, '\s+', ' ' )) ;
molecule = { sscanf( tokens{1}{1}, '%d' ), text } ;
tokens = regexp( content, 'ATOM([^@]+)', 'tokens' ) ;
atom = textscan( tokens{1}{1}, '%f %s %f %f %f %s %f %s %f' ) ;
tokens = regexp( content, 'BOND(.+)', 'tokens' ) ;
bond = textscan( tokens{1}{1}, '%f %f %f %s' ) ;
Have a look at the output and let me know if you have any question. As you are new to MATLAB, note that regular/numeric arrays cannot store mixed data (numeric and strings). For this kind of data, we use cell arrays. In short, they they have to be indexed using curly brackets for accessing cells content. In the present case, molecule is a cell array which contains two cells:
>> molecule
molecule =
[5x1 double] 'SMALL GASTEIGER'
For accessing the content of the first cell (which is a numeric array):
>> molecule{1}
ans =
83
89
0
0
0
For accessing element 2 of this numeric array:
>> molecule{1}(2)
ans =
89