views:

123

answers:

3

I have a website where I have to click a submit button on a form. This gives me a link. I know the link is made up with the paramter that is passed through a hidden value. I was wondering if I could make a python script or something else that would go to the website and click some buttons returning the link that the submit button generates, if so how could I pass the extra parameter that influences the creation of the link?

thanks in advance.

A: 

If it is python you're looking for then give the Mechanize library a shot. If you are just extracting small but unique elements of the HTML document then you may as well use regex's with python. To work with the HTML document more pragmatically then BeautifulSoup may be more beneficial, which you can combine with mechanize/python.

It's more straightforward than it may initially seem.

NeonNinja
I have look at mechanize but I'm wondering if it would work. because the website checks if the page has loaded 3 jpegs(advertising). Can this also be emulated with mechanize?
Silverrocker
If it's a javascript check, then yes you can use the spidermonkey javascript engine with mechanize: http://pypi.python.org/pypi/python-spidermonkey
NeonNinja
Also there is Scrapy: http://scrapy.org/ but I have not used it myself.
NeonNinja
A: 

Take a look at this documentation of the urlib2 package. Below is the code you would use, but the documentation explains (very well) what is happening.

Excerpt:

import urllib
import urllib2

url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
          'location' : 'Northampton',
          'language' : 'Python' }

data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()

You would need to use an HTML parser like BeautifulSoup to obtain the parameter name and value that is being posted when you click the button.

Edit:

Yes, you could use mechanize for this as well. You would do something like this (untested):

from mechanize import Browser

br = Browser()
br.open("http://www.example.com/")  # this would be your website
br.select_form(name="order")        # change this to the name of your form
response = br.submit()              # submits the form, just like if you clicked the submit button
print response.geturl()             # prints the URL you are looking for

You'll need to make this specific to your website/form, but something along these lines should do the trick.

Check out the examples/documentation for the ClientForm object if you find you need more control.

tgray
I've used urllib2 before but I think mechanize will be more of a solution because of the javascript that happens on the page. correct me if I'm wrong...
Silverrocker
will mechanize also pass all the coockies and such? if so then this is the perfect solution for me !:)
Silverrocker
The basic `mechanize.urlopen` function handles cookies automatically, but I couldn't get to the documentation for the Browser object to see if it does as well. Some more useful documentation: http://wwwsearch.sourceforge.net/mechanize/doc.html
tgray
I took a look at the source for the `Browser` object and the comments indicate the "cookie handling is [...] entirely automatic".
tgray
thank alot!. I wish I could somehow vote on you but I don't have any reputation :(.
Silverrocker
+1  A: 

Once you have downloaded your html data with mechanize as other users said, then you could use Beautifulsoup like this:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html_data)
hidden_tag = soup.find('input',name='hiddenId',type='hidden') 
hidden_value = hiddenId['value']

then you could forge a POST with urllib2 like this:

import urllib
import urllib2
url = 'http://yoursite.com'
values = {'yourhiddenname' : hidden_value}
request = urllib2.Request(url, urllib.urlencode(values))
response = urllib2.urlopen(request)
result = response.read()
systempuntoout