MATLAB: How to find the symbol given the gene id

functionhomeworkstringtable

ID and name conversion is one of the common tasks in Bioinformatics. In this problem, you will write a function symbol=geneidtosymbol(id,filename) that will return the symbol of a gene, given its GeneID. The GeneID to symbol conversion should be looked up from a file named "gene_info.txt". Each line in this file contains tab-delimited information for a gene. The first line of the file specifies what type of information is available in each column. Download and use the file available from http://sacan.biomed.drexel.edu/ftp/bmes201/final.20123/gene_info.txt (which contains the first 100 lines of the file available from: <ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz)>.
If filename is not given, use gene_info.txt. If it is given (may be different than gene_info.txt), use the filename provided as input.
Here is what I have so far:
function out = geneidtosymbol(x)
fid=fopen('gene_info.txt','r'); %open file
if fid<0
fprintf('I am not able to open the pdb file');
out=[];
return;
end
symbol=[];
if ~feof(fid)
line=fgetl(fid);
str2num(line(3:10)) = x;
line=strsplit(line);
symbol=line{3};
end
out = symbol;

Best Answer

S - lookfor is used to search for a keyword in all help entries, not to search for a substring within another string. Your line of code
lookfor(x,line)= id;
is probably generating the error Undefined function or variable 'id'. because you are trying to use the variable id before it has been defined. And even if it were, it is unclear why you are attempting an assignment. What is the intent of this line?
Since you want to find a string within another string, then you should be using strfind as
while ~feof(fid)
% get the next line of the file
line = fgetl(fid);
% does this line contain the gene id?
if strfind(line,x)>0
% split on the empty spaces
line=strsplit(line);
% third element is symbol
symbol=line{3};
% since symbol found, exit
break;
end
end
% close the file
fclose(fid);
Note that once we have found the symbol, since we assume only one per gene id, then we break out of the while loop and close the file.
Make sure you adjust your code to handle an input for a different data file as per the instruction If filename is not given, use gene_info.txt. If it is given (may be different than gene_info.txt), use the filename provided as input. So you will need to add the input parameter filename.