ansaurus

Question

How can I talk to UniProt over HTTP in Python?

Answer 1

+3 A:

question #1:

This can be done using python's urllibs:

import urllib, urllib2
import time
import sys

query = ' '.join(sys.argv)   

# encode params as a list of 2-tuples
params = ( ('from','ACC'), ('to', 'P_REFSEQ_AC'), ('format','tab'), ('query', query))
# url encode them
data = urllib.urlencode(params)    
url = 'http://www.uniprot.org/mapping/'

# fetch the data
try:
    foo = urllib2.urlopen(url, data)
except urllib2.HttpError, e:
    if e.code == 503:
        # blah blah get the value of the header...
        wait_time = int(e.hdrs.get('Retry-after', 0))
        print 'Sleeping %i seconds...' % (wait_time,)
        time.sleep(wait_time)
        foo = urllib2.urlopen(url, data)


# foo is a file-like object, do with it what you will.
foo.read()

vezult 2009-04-03 20:28:44

I'm afraid it doesn't work (even after replacing that '=' with '==')... Thanks.:)

R S 2009-04-03 20:40:39

Answer 2

+1 A:

Let's assume that you are using Python 2.5. We can use httplib to directly call the web site:

import httplib, urllib
querystring = {}
#Build the query string here from the following keys (query, format, columns, compress, limit, offset)
querystring["query"] = "" 
querystring["format"] = "" # one of html | tab | fasta | gff | txt | xml | rdf | rss | list
querystring["columns"] = "" # the columns you want comma seperated
querystring["compress"] = "" # yes or no
## These may be optional
querystring["limit"] = "" # I guess if you only want a few rows
querystring["offset"] = "" # bring on paging 

##From the examples - query=organism:9606+AND+antigen&format=xml&compress=no
##Delete the following and replace with your query
querystring = {}
querystring["query"] =  "organism:9606 AND antigen" 
querystring["format"] = "xml" #make it human readable
querystring["compress"] = "no" #I don't want to have to unzip

conn = httplib.HTTPConnection("www.uniprot.org")
conn.request("GET", "/uniprot/?"+ urllib.urlencode(querystring))
r1 = conn.getresponse()
if r1.status == 200:
   data1 = r1.read()
   print data1  #or do something with it

You could then make a function around creating the query string and you should be away.

Andrew Cox 2009-04-03 20:49:38

This unfortunately has the same effect as my other attempts - hanging on for minutes. Also, it's "GET", which to my understanding limits it to url size only..

R S 2009-04-03 21:06:31

I think this is the point of the limit and columns query values, not knowing what i am returning I can't give good values for these. The GET limits what you send not what you receive.

Andrew Cox 2009-04-03 22:07:38

Answer 3

+1 A:

You're probably better off using the Protein Identifier Cross Reference service from the EBI to convert one set of IDs to another. It has a very good REST interface.

http://www.ebi.ac.uk/Tools/picr/

I should also mention that UniProt has very good webservices available. Though if you are tied to using simple http requests for some reason then its probably not useful.

2009-09-08 11:00:26

ansaurus

tags:

views:

answers:

How can I talk to UniProt over HTTP in Python?

related questions