questions about urllib | ansaurus

urllib

parse query string with urllib in Python 2.4

Using Python2.4.5 (don't ask!) I want to parse a query string and get a dict in return. Do I have to do it "manually" like follows? >>> qs = 'first=1&second=4&third=3' >>> d = dict([x.split("=") for x in qs.split("&")]) >>> d {'second': '4', 'third': '3', 'first': '1'} Didn't find any useful method in urlparse. ...

How to auto log into gmail atom feed with Python?

Gmail has this sweet thing going on to get an atom feed: def gmail_url(user, pwd): return "https://"+str(user)+":"+str(pwd)+"@gmail.google.com/gmail/feed/atom" Now when you do this in a browser, it authenticates and forwards you. But in Python, at least what I'm trying, isn't working right. url = gmail_url(settings.USER, setting...

Python3: ssl cert information

I have been trying to get information regarding expired ssl certificates using python 3 but it would be nice to be able to get as verbose a workup as possible. any takers? So far i have been trying to use urllib.request to get this info (to no avail), does this strike anyone as foolish? I have seen some examples of similar work using o...

How can urllib2 / httplib talk HTTP 1.1 for HTTPS connections via a Squid proxy ?

When I use urllib2 to make a HTTP 1.1 connection via a squid proxy, squid makes a new ongoing connection in HTTP 1.0. How can I persuade Squid to talk 1.1 to the destination server? ...

Retrieve the source of a dynamic website using python (bypassing onclick)

I wish to retrieve the source of a website, that is dynamically generated upon clicking a link. The link itself is as below: <a onclick="function(); return false" href="#">Link</a> This stops me from directly querying for a URL that would allow me to get the dynamically generated website (urllib/2). How would one retrieve the source...

urllib2.urlopen() vs urllib.urlopen() - urllib2 throws 404 while urllib works! WHY?

import urllib print urllib.urlopen('http://www.reefgeek.com/equipment/Controllers_&_Monitors/Neptune_Systems_AquaController/Apex_Controller_&_Accessories/').read() The above script works and returns the expected results while: import urllib2 print urllib2.urlopen('http://www.reefgeek.com/equipment/Controllers_&_Monitors/...

http-status-code-404

A multi-part/threaded downloader via python?

I've seen a few threaded downloaders online, and even a few multi-part downloaders (HTTP). I haven't seen them together as a class/function. If any of you have a class/function lying around, that I can just drop into any of my applications where I need to grab multiple files, I'd be much obliged. If there is there a library/framework ...

Force python mechanize/urllib2 to only use A requests?

Here is a related question but I could not figure out how to apply the answer to mechanize/urllib2: http://stackoverflow.com/questions/1540749/how-to-force-python-httplib-library-to-use-only-a-requests Basically, given this simple code: #!/usr/bin/python import urllib2 print urllib2.urlopen('http://python.org/').read(100) This result...

Should I use urllib or urllib2?

In Python (2.5), should I use urllib or urllib2? What's the difference? They seem to do the same thing. (Bonus points, if I'm using Google App Engine, does this change the answer?) ...

Download file using urllib in Python with the wget -c feature

Hello, I am programming a software in Python to download HTTP PDF from a database. Sometimes the download stop with this message : retrieval incomplete: got only 3617232 out of 10689634 bytes How can I ask the download to restart where it stops using the 206 Partial Content HTTP feature ? I can do it using wget -c and it works pret...

Any way to set request headers when doing a request using urllib in Python 2.x?

I am trying to make an HTTP request in Python 2.6.4, using the urllib module. Is there any way to set the request headers? I am sure that this is possible using urllib2, but I would prefer to use urllib since it seems simpler. ...

Parsing Python Response using httplib

After connecting to a socket and capturing the response using .read() how do I parse the input stream and read lines? I see the data is returned without any CRLF <html><head><title>Apache Tomcat/6.0.16 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font...

Python: urllib2.urlopen(url, data) Why do you have to urllib.urlencode() the data?

I thought that a post sent all the information in HTTP headers when you used post (I'm not well informed on this subject obviously), so I'm confused why you have to urlencode() the data to a key=value&key2=value2 format. How does that formatting come into play when using POST?: # Fail data = {'name': 'John Smith'} urllib2.urlopen(foo_ur...

How do I unit test a module that relies on urllib2?

I've got a piece of code that I can't figure out how to unit test! The module pulls content from external XML feeds (twitter, flickr, youtube, etc.) with urllib2. Here's some pseudo-code for it: params = (url, urlencode(data),) if data else (url,) req = Request(*params) response = urlopen(req) #check headers, content-length, etc... #par...

Searching a website

import urllib import re import os search = (raw_input('[!]Search: ')) site = "http://www.exploit-db.com/list.php?description="+search+"&author=&platform=&type=&port=&osvdb=&cve=" print site source = urllib.urlopen(site).read() founds = re.findall("href='/exploits/\d+",source) print "\n[+]Search",len(founds),"Re...

Urlretrieve and User-Agent? - Python

I'm using urlretrieve from the urllib module. I cannot seem to find how to add a User-Agent description to my requests. Is it possible with urlretrieve? or do I need to use another method? ...

urllib ignore authentication requests

Hi, I'm having little trouble creating a script working with URLs. I'm using urllib.urlopen() to get content of desired URL. But some of these URLs requires authentication. And urlopen prompts me to type in my username and then password. What I need is to ignore every URL that'll require authentication, just easily skip it and continue...

Python problems with FancyURLopener, 401, and "Connection: close"

I'm new to Python, so forgive me if I am missing something obvious. I am using urllib.FancyURLopener to retrieve a web document. It works fine when authentication is disabled on the web server, but fails when authentication is enabled. My guess is that I need to subclass urllib.FancyURLopener to override the get_user_passwd() and/or p...

http-status-code-401

Python open Windows shared folder trought linux machine

Hello, I am using python 2.5 on Ubuntu, and there's a machine in the same network called "machine1". I wish to get a file in a specific folder of that machine. The folder is shared. So, how can I do it? I thought that something like urllib.urlopen('\machine1\folder\file.txt') would work. ...

Cannot fetch a web site with python urllib.urlopen() or any web browser other than Shiretoko

Hi, Here is the URL of the site I want to fetch https://salami.parc.com/spartag/GetRepository?friend=jmankoff&keywords=antibiotic&option=jmankoff%27s+tags When I fetch the web site with the following code and display the contents with the following code: sock = urllib.urlopen("https://salami.parc.com/spartag/GetRepository?fri...

1
2
3
4
5
...
7