views:

393

answers:

3

hi im building a scraper using python 2.5 and beautifulsoup but im stuble upon a problem ... part of the web page is generating after user click on some button, whitch start an ajax request by calling specific javacsript function using proper parameters

is there a way to simulate user interaction and get this result? i come across a mechanize module but it seems to me that this is mostly used to work with forms ...

i would appreciate any links or some code samples thanks

+1  A: 

No, you can't do that easily. AFAIK your options are, easiest first:

  1. Read the AJAX javascript code yourself, as a human programmer, understand it, and then write python code to simulate the AJAX calls by hand. You can also use some capture software to capture requests/responses made live and try to reproduce them with code;
  2. Use selenium or some other browser automation tool to fetch the page on a real web browser;
  3. Use some python javascript runner like spidermonkey or pyv8 to run the javascript code, and hook it to your copy of the HTML dom;
nosklo
hi, first option wouldnt be that easy because the javascript is in packed versionthanks for the heads up i'll look over it first thing tommorow
nabizan
@nabizan: that's why I also suggested usage of capture software on option 1
nosklo
@nosklo: hi, so after a little digging on unpack javascript i find out what should i call and how. than it was quite easy (see my answer for more info)
nabizan
A: 

ok so i have figured it out ... it was quite simple after i realised that i could use combination of urllib, ulrlib2 and beautifulsoup

import urllib, urllib2
from BeautifulSoup import BeautifulSoup as bs_parse

data = urllib.urlencode(values)
req  = urllib2.Request(url, data)
res  = urllib2.urlopen(req)
page = bs_parse(res.read())
nabizan
+1  A: 
import urllib, urllib2
from BeautifulSoup import BeautifulSoup as bs_parse

data = urllib.urlencode(values)
req  = urllib2.Request(url, data)
res  = urllib2.urlopen(req)
page = bs_parse(res.read())