MATLAB: Am I unable to read in GenBank data that does not contain the REFERENCE field in Bioinformatics Toolbox

Bioinformatics Toolboxgenbankgenpeptgetgenbankgetgenpept

I am attempting to read in data from the GenBank database with the GETGENBANK function from records that do not contain REFERENCE fields. When I try this as in the following code:
g = getgenbank('XM_125709');
I receive the following warning and the data is incomplete:
Warning: Problems reading the GenBank data. The structure may be incomplete.
> In genbankread at 262
In bioinfo\private\getncbidata at 204
In getgenbank at 69

Best Answer

As stated in the release notes for the GenBank format, the REFERENCE field is mandatory:
REFERENCE- Citations for all articles containing data
reported in this entry. Includes seven subkeywords and may repeat.
Mandatory keyword/one or more records.
The record is incomplete, because a REFERENCE field is not found. In order to work around this problem, you can modify the GENBANKREAD file. Change line 154 of the file from:
while isempty(regexp(gbtext(ln+1,:),'REFERENCE\s+(\w|\W)+','once'))
to
while isempty(regexp(gbtext(ln+1,:),'REFERENCE\s+(\w|\W)+','once')) ...
&& isempty(regexp(gbtext(ln+1,:),'COMMENT\s+(\w|\W)+','once'))
This way, even if a REFERENCE field is not found, MATLAB will continue processing the file when the COMMENTS field is found.