I'm using Ubuntu 10.04 and Perl 5.10.1. The BioPerl package has some nice scripts, such as bp_genbank2gff3.pl which converts files from genbank format to GFF3 format.
The problem: I get unexpected results when using bp_genbank2gff3.pl: the gene features get "Name=" instead of "locus_tag=" in the last GFF3 column.
A dear BioPerl mailing list member told me he uses the latest BioPerl version from the BioPerl repository and gets the correct result ("locus_tag="). I got a fresh copy, but it didn't work for me. Weird!
Steps to recreate the situation:
$ cd ~/src
$ git clone http://github.com/bioperl/bioperl-live.git
$ export PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB"
$ cd /tmp
$ wget ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
$ ~/src/bioperl-live/scripts/Bio-DB-GFF/genbank2gff3.PLS NC_009789.gbk
Following is a line #8 from my the resulting GFF3:
NC_009789 GenBank gene 665 781 . - 1 ID=EcE24377A_B0001;Dbxref=GeneID:5585816;Name=EcE24377A_B0001
while this is the same line from my colleague's results:
NC_009789 GenBank gene 665 781 . - 1 ID=EcE24377A_B0001;Dbxref=GeneID:5585816;**locus_tag**=EcE24377A_B0001
Note the "Name=" tag in my version (at the end of the line) is replaced by "locus_tag=" in my colleague's
I have no idea what is going on here... Same input, presumably same script, but different outputs (the output my colleague gets is the desirable one). We even diff
ed the scripts (genbank2gff3.PLS
) which are identical.
Any ideas? Could anyone see if he gets the same results as I or my colleague?