Sorry about the odd title.
I am using eSearch & eSummary to go from
Accession Number --> gID --> TaxID
Assume that 'accessions' is a list of 20 accession numbers (I do 20 at a time because that's the maximum that NCBI will allow).
I do:
handle = Entrez.esearch(db="nucleotide", rettype="xml", term=accessions)
record = Entrez.read(handle)
gids = ",".join(record[u'IdList'])
This gives me 20 correspoding GIDs from those 20 accession numbers.
Followed by:
handle = Entrez.esummary(db="nucleotide", id=gids)
record = Entrez.read(handle)
Which gives me this error because one of the GIDs in gids has been removed from NCBI:
File ".../biopython-1.52/build/lib.macosx-10.6-universal-2.6/Bio/Entrez/Parser.py", line 191, in endElement value = IntegerElement(value)
ValueError: invalid literal for int() with base 10: ''
I could do try:, except: except that would skip the other 19 GIDs which are okay.
My question is:
How do I read 20 records at a time with Entrez.read and skip over the ones that are missing without sacrificing the other 20? I could do one at a time but that would be incredibly slow (I have 300,000 accession numbers, and NCBI only allows you to do 3 queries per second but in reality it's more like 1 query per second).