urllib2

Unescape Python Strings From HTTP

I've got a string from an HTTP header, but it's been escaped.. what function can I use to unescape it? myemail%40gmail.com -> [email protected] Would urllib.unquote() be the way to go? ...

Python interface to PayPal - urllib.urlencode non-ASCII characters failing

I am trying to implement PayPal IPN functionality. The basic protocol is as such: The client is redirected from my site to PayPal's site to complete payment. He logs into his account, authorizes payment. PayPal calls a page on my server passing in details as POST. Details include a person's name, address, and payment info etc. I need t...

Changing user agent on urllib2.urlopen

How can I download a webpage with a user agent other than the default one on urllib2.urlopen? ...

reading a stream made by urllib2 never recovers when connection got interrupted

While trying to make one of my python applications a bit more robust in case of connection interruptions I discovered that calling the read function of an http-stream made by urllib2 may block the script forever. I thought that the read function will timeout and eventually raise an exception but this does not seam to be the case when t...

Form Submission in Python Without Name Attribute

Background: Using urllib and urllib2 in Python, you can do a form submission. You first create a dictionary. formdictionary = { 'search' : 'stackoverflow' } Then you use urlencode method of urllib to transform this dictionary. params = urllib.urlencode(formdictionary) You can now make a url request with urllib2 and pass the var...

Downloading a web page and all of its resource files in Python

I want to be able to download a page and all of its associated resources (images, style sheets, script files, etc) using Python. I am (somewhat) familiar with urllib2 and know how to download individual urls, but before I go and start hacking at BeautifulSoup + urllib2 I wanted to be sure that there wasn't already a Python equivalent to...

Downloading file using post method and python

I need a little help getting a tar file to download from a website. The website is set up as a form where you pick the file you want and click submit and then the download windows opens up for you to pick the location. I'm trying to do the same thing in code (so I don't have to manual pick each file). So far I have gotten python 2.5.2 t...

Windows Authentication with Python and urllib2

Hi, I want to grab some data off a webpage that requires my windows username and password. So far, I've got: opener = build_opener() try: page = opener.open("http://somepagewhichneedsmywindowsusernameandpassword/") print page except URLError: print "Oh noes." Is this supported by urllib2? I've found Python NTLM, but that...

How to show characters non ascii in python?

Hi, I'm using the Python Shell in this way: >>> s = 'Ã' >>> s '\xc3' How can I print s variable to show the character Ã??? This is the first and easiest question. Really, I'm getting the content from a web page that has non ascii characters like the previous and others with tilde like á, é, í, ñ, etc. Also, I'm trying to execute a re...

wget Vs urlretrieve of python

I have a task to download Gbs of data from a website. The data is in form of .gz files, each file being 45mb in size. The easy way to get the files is use "wget -r -np -A files url". This will donwload data in a recursive format and mirrors the website. The donwload rate is very high 4mb/sec. But, just to play around I was also using p...

How do I draw out specific data from an opened url in Python using urllib2?

I'm new to Python and am playing around with making a very basic web crawler. For instance, I have made a simple function to load a page that shows the high scores for an online game. So I am able to get the source code of the html page, but I need to draw specific numbers from that page. For instance, the webpage looks like this: http:...

What is this function doing in Python involving urllib2 and BeautifulSoup?

So I asked a question earlier about retrieving high scores form an html page and another user gave me the following code to help. I am new to python and beautifulsoup so I'm trying to go through some other codes piece by piece. I understand most of it but I dont get what this piece of code is and what its function is: def parse_stri...

Why am I getting "'ResultSet' has no attribute 'findAll'" using BeautifulSoup in Python?

So I am learning Python slowly, and am trying to make a simple function that will draw data from the high scores page of an online game. This is someone else's code that i rewrote into one function (which might be the problem), but I am getting this error. Here is the code: >>> from urllib2 import urlopen >>> from BeautifulSoup import B...

User Authentication And Text Parsing in Python

Well I am working on a multistage program... I am having trouble getting the first stage done.. What I want to do is log on to Twitter.com, and then read all the direct messages on the user's page. Eventually I am going to be reading all the direct messages looking for certain thing, but that shouldn't be hard. This is my code so far ...

Python urllib2 timeout when using Tor as proxy?

Hi, I am using Python's urllib2 with Tor as a proxy to access a website. When I open the site's main page it works fine but when I try to view the login page (not actually log-in but just view it) I get the following error... URLError: <urlopen error (10060, 'Operation timed out')> To counteract this I did the following: import soc...

How to use cookielib with httplib in python?

In python, I'm using httplib because it "keep-alive" the http connection (as oppose to urllib(2)). Now, I want to use cookielib with httplib but they seem to hate each other!! (no way to interface them together). Does anyone know of a solution to that problem? ...

urllib2 read to Unicode

I need to store the content of a site that can be in any language. And I need to be able to search the content for a Unicode string. I have tried something like: import urllib2 req = urllib2.urlopen('http://lenta.ru') content = req.read() The content is a byte stream, so I can search it for a Unicode string. I need some way that wh...

Does urllib2 in Python 2.6.1 support proxy via https

Does urllib2 in Python 2.6.1 support proxy via https? I've found the following at http://www.voidspace.org.uk/python/articles/urllib2.shtml: NOTE Currently urllib2 does not support fetching of https locations through a proxy. This can be a problem. I'm trying automate login in to web site and downloading document, I have ...

Python 2.6 - Upload zip file - Poster 0.4

Hi Folks, I came here via this question: http://stackoverflow.com/questions/68477/send-file-using-post-from-a-python-script And by and large it's what I need, plus some additional. Besides the zipfile som additional information is needed and the POST_DATA looks something like this: POSTDATA =-----------------------------293432744627...

HTTPS log in with urllib2

I currently have a little script that downloads a webpage and extracta somw data I'm interested in. Nothing fancy. Currently I'm downloading the page like so: import commands command = 'wget --output-document=- --quiet --http-user=USER --http-password=PASSWORD https://www.example.ca/page.aspx' status, text = commands.getstatusoutput(co...