MATLAB: Capturing partial coordinates from text file…

gcodestring

I want to read a gcode-file into matlab.. The format looks something like below…
;LAYER:0
M107
G0 F9000 X89.19 Y90.68 Z0.30
;TYPE:SKIRT
G1 F1200 X89.40 Y91.83 E0.05822
G1 X89.67 Y92.97 E0.11703
G1 X90.01 Y94.10 E0.17553
I want to read the G0 and G1 commands, describing positions.. The input not complete and may not hold all coordinates in all lines.. The output I'm looking for is a array (or a table) with the X, Y and Z movements, with a NaN or zero when values are missing.. So something like this:
89.19 90.68 0.30
89.40 91.83 NaN
89.67 92.97 NaN
90.01 94.10 NaN
I've been trying to use regexp, but can't really wrap my head around it..
I've already made a loop based reader, that splits the strings and examines if with IF-statements, but this is timeconsuming, as these files may hold hundred of thousands of commands….

Best Answer

regular expressions may not be the right tool for this job, particularly, if the order is not guaranteed to be XYZ. If it is guaranteed, then it's possible (you can even do it on the whole file content at once). Unfortunately, Matlab's regular expression engine does not support captures within groups so it requires some post-processing:
filecontent = fileread(fullfile(path, filename));
coordinates = regexp(filecontent, '^G[01][^X\n\r]*(X\d+\.\d+)?[^Y\n\r]*(Y\d+\.\d+)?[^Z\n\r]*(Z\d+\.\d+)?.*$', 'tokens', 'dotexceptnewline', 'lineanchors');
coordinates = vertcat(coordinates{:});
coordinates = str2double(cellfun(@(x) x(2:end), coordinates, 'UniformOutput', false))
The cellfun above is to remove the X/Y/Z prefix that matlab regular expression engine can't remove due to its limitation.
Two assumptions are made in the regular expression:
  • X, if present, is always before Y, if present, which is always before Z, if present. That restriction can't easily be removed.
  • coordinates are of the form digits.digits. That can be changed if they can be just digits (or .digits) at the expense of a more complicated regular expression.
Explanation of the regular expression
^ match beginning of line (because of the lineanchor option)
G match G
[01] match 0 or 1
[^X\n\r]* match anything that is not X or the next line, as many times as necessary
(...)? start the first capture that may not be present, the content of the capture is
\d+\.\d+ match 1 or more digits, a dot, 1 or more digits
repeat the three lines above for Y and Z
.* match anything but new lines (because of the dotexceptnewline option) for the rest of the line
$ match the end of the line (because of the lineanchor option)
Matlab is the only regular expression engine I know (among .Net, C++, python, ruby, java, php) where dot also match a new line. Hence the need for the dotexceptnewline option.