views:

113

answers:

1

I'm using Ubuntu 10.04 and Perl 5.10.1. The BioPerl package has some nice scripts, such as bp_genbank2gff3.pl which converts files from genbank format to GFF3 format.

The problem: I get unexpected results when using bp_genbank2gff3.pl: the gene features get "Name=" instead of "locus_tag=" in the last GFF3 column.

A dear BioPerl mailing list member told me he uses the latest BioPerl version from the BioPerl repository and gets the correct result ("locus_tag="). I got a fresh copy, but it didn't work for me. Weird!

Steps to recreate the situation:

$ cd ~/src
$ git clone http://github.com/bioperl/bioperl-live.git
$ export PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB"
$ cd /tmp
$ wget ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
$ ~/src/bioperl-live/scripts/Bio-DB-GFF/genbank2gff3.PLS NC_009789.gbk

Following is a line #8 from my the resulting GFF3:

NC_009789    GenBank    gene    665    781    .    -    1    ID=EcE24377A_B0001;Dbxref=GeneID:5585816;Name=EcE24377A_B0001

while this is the same line from my colleague's results:

NC_009789    GenBank    gene    665    781    .    -    1    ID=EcE24377A_B0001;Dbxref=GeneID:5585816;**locus_tag**=EcE24377A_B0001

Note the "Name=" tag in my version (at the end of the line) is replaced by "locus_tag=" in my colleague's I have no idea what is going on here... Same input, presumably same script, but different outputs (the output my colleague gets is the desirable one). We even diffed the scripts (genbank2gff3.PLS) which are identical.

Any ideas? Could anyone see if he gets the same results as I or my colleague?

+3  A: 

Looking at the script source:

#?? should gene_name from /locus_tag,/gene,/product,/transposon=xxx
# be converted to or added as  Name=xxx (if not ID= or as well)
## problematic: convert_to_name ($feature); # drops /locus_tag,/gene, tags
convert_to_name($feature); 

And in convert_to_name:

elsif ($g->has_tag('locus_tag')) {
    ($gene_id) = $g->get_tag_values('locus_tag');
    $g->remove_tag('locus_tag');
    $g->add_tag_value('Name', $gene_id);
}

So it looks like the script is doing what it’s supposed to do?

zoul
what the script is supposed to do is debatable. I think it should include a "locus_tag_ tag if one exists. The whole things started when I wanted to ask for this feature to be added, and when of the bioperl coders told me for him it's already like that (i.e. locus tags are shown). The question is who's got it wrong... I just want to make sure it's not something weired at my side.
David B