I have a textfile that I am I want to make into a list. I have asked two questions recently about this topic. The problem I keep coming across is that the I want to parse the textfile but the sections are of different length. So I cannot use
textscan(fid,'%s %s %s')
because the length of each gene varies. I have also had trouble using fields because when I use the code to set up the fields it only allows for one line iin each field for the "note" field below in the first gene I would like to be able to multiple lines in one field an be able to read them in. currently I am getting errors about the index exceeds matrix dimensions.
fieldname = regexp(line{1},'/(.+)=','tokens','once');
value = regexp(line{1},'="?([^"]+)"?$','tokens','once');
Another possible way I see this working is using some sort of isLineEmpty to be able to divide up the genes be the empty line that is between them. Is there a way to be able to have multiple lines in my field entry so I can get all the information associated with "note" ? or a way to use an isLineEmpty and skip using fields?
gene 218705..219367
/locus_tag="Rv0187"
/db_xref="GeneID:886779"
CDS 218705..219367
/locus_tag="Rv0187"
/EC_number="2.1.1.-"
/function="THOUGHT TO BE INVOLVED IN TRANSFER OF METHYL
GROUP."
/note="Rv0187, (MTCI28.26), len: 220 aa. Probable
O-methyltransferase (EC 2.1.1.-), similar to many e.g.
AB93458.1|AL357591 putative O-methyltransferase from
Streptomyces coelicolor (223 aa); MDMC_STRMY|Q00719
O-methyltransferase from Streptomyces mycarofaciens (221
aa), FASTA scores: opt: 327, E(): 2.4e-17, (35.9% identity
in 192 aa overlap). Also similar to Rv1703c, Rv1220c from
Mycobacterium tuberculosis."
/codon_start=1
/transl_table=11
/product="O-methyltransferase"
/protein_id="NP_214701.1"
/db_xref="GI:15607328"
/db_xref="GeneID:886779"
gene 219486..219917
/locus_tag="Rv0188"
/db_xref="GeneID:886776"
CDS 219486..219917
/locus_tag="Rv0188"
/function="UNKNOWN"
/experiment="experimental evidence, no additional details
recorded"
/codon_start=1
/transl_table=11
/product="transmembrane protein"
/protein_id="NP_214702.1"
/db_xref="GI:15607329"