ansaurus

Question

html/javascript automaticly getting link from submit button (maybe automating with python?)

Answer 1

A:

If it is python you're looking for then give the Mechanize library a shot. If you are just extracting small but unique elements of the HTML document then you may as well use regex's with python. To work with the HTML document more pragmatically then BeautifulSoup may be more beneficial, which you can combine with mechanize/python.

It's more straightforward than it may initially seem.

NeonNinja 2010-04-15 20:34:57

I have look at mechanize but I'm wondering if it would work. because the website checks if the page has loaded 3 jpegs(advertising). Can this also be emulated with mechanize?

Silverrocker 2010-04-15 20:40:44

If it's a javascript check, then yes you can use the spidermonkey javascript engine with mechanize: http://pypi.python.org/pypi/python-spidermonkey

NeonNinja 2010-04-15 20:55:20

Also there is Scrapy: http://scrapy.org/ but I have not used it myself.

NeonNinja 2010-04-15 20:56:54

Answer 2

A:

Take a look at this documentation of the urlib2 package. Below is the code you would use, but the documentation explains (very well) what is happening.

Excerpt:

import urllib
import urllib2

url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
          'location' : 'Northampton',
          'language' : 'Python' }

data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()

You would need to use an HTML parser like BeautifulSoup to obtain the parameter name and value that is being posted when you click the button.

Edit:

Yes, you could use mechanize for this as well. You would do something like this (untested):

from mechanize import Browser

br = Browser()
br.open("http://www.example.com/")  # this would be your website
br.select_form(name="order")        # change this to the name of your form
response = br.submit()              # submits the form, just like if you clicked the submit button
print response.geturl()             # prints the URL you are looking for

You'll need to make this specific to your website/form, but something along these lines should do the trick.

Check out the examples/documentation for the ClientForm object if you find you need more control.

tgray 2010-04-15 20:39:49

I've used urllib2 before but I think mechanize will be more of a solution because of the javascript that happens on the page. correct me if I'm wrong...

Silverrocker 2010-04-15 20:42:32

will mechanize also pass all the coockies and such? if so then this is the perfect solution for me !:)

Silverrocker 2010-04-15 20:57:38

The basic `mechanize.urlopen` function handles cookies automatically, but I couldn't get to the documentation for the Browser object to see if it does as well. Some more useful documentation: http://wwwsearch.sourceforge.net/mechanize/doc.html

tgray 2010-04-15 21:05:15

I took a look at the source for the `Browser` object and the comments indicate the "cookie handling is [...] entirely automatic".

tgray 2010-04-15 21:21:22

thank alot!. I wish I could somehow vote on you but I don't have any reputation :(.

Silverrocker 2010-04-15 21:37:34

Answer 3

+1 A:

Once you have downloaded your html data with mechanize as other users said, then you could use Beautifulsoup like this:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html_data)
hidden_tag = soup.find('input',name='hiddenId',type='hidden') 
hidden_value = hiddenId['value']

then you could forge a POST with urllib2 like this:

import urllib
import urllib2
url = 'http://yoursite.com'
values = {'yourhiddenname' : hidden_value}
request = urllib2.Request(url, urllib.urlencode(values))
response = urllib2.urlopen(request)
result = response.read()

systempuntoout 2010-04-15 21:25:08

ansaurus

tags:

views:

answers:

html/javascript automaticly getting link from submit button (maybe automating with python?)

related questions