MATLAB: How to extract a specific series of strings from a character array

arraycharacterexcelexportfileindexingtext;

Hi guys,
I have a massive character array (1×187253) that I imported in matlab. Here is a small sample:
NEW SCOMPONENT /JBRC200XT
DESC 'CARBON STEEL ORDINARY REDUCER JIS3452 BWD CON. 350Ax200A'
GTYP REDU
PARA 350 200 355.6 216.3 BWD $
330.2 0
END
NEW SCOMPONENT /JBRC200XV
DESC 'CARBON STEEL ORDINARY REDUCER JIS3452 BWD CON. 350Ax250A'
GTYP REDU
PARA 350 250 355.6 267.4 BWD $
330.2 0
END
What I want to do with this character array is obtain all the 9 letter codes that are located right next to the NEW SCOMPONENT row (e.g. JBRC200XT and JBRC200XV in this case) as well as the characters between the quotes that are located on the DESC line (e.g. 'CARBON STEEL ORDINARY REDUCER… ') and place those side by side on a table in matlab table to be exported in excel.
I know that this should be possible however I have been trying for the last few days being stuck in the first step of even obtaining the codes JBRC200…
Thanks for your help in advance,
KR,
KMT.

Best Answer

Regular expressions are your friends
regexp(text, 'NEW SCOMPONENT\s*\/?(?<component>[\w]{9})\s+DESC\s*''(?<desc>[^'']+)', 'names');
the result is a struct array with two fields - component and desc containing your strings
ans =
struct with fields:
component: 'JBRC200XT'
desc: 'CARBON STEEL ORDINARY REDUCER JIS3452 BWD CON. 350Ax200A'
I tried it on your sample which i duplicated into a giant text file and it works very fast