Hello everyone! I know the title is not very clear so i'll make an example: There are site A and site B, let's say they are financial sites. I need just one page (the one regarding italian pizza quotations) from both the sites, to compare some values and to know where and when to sell italian pizza at higher prices. Everything is really easy with site A, because it doesn't use javascript and using a browser and clicking on the voice menu "italy > italian pizza" i find the www.siteA.com/italy/italianPizzaValues url that i needed. Instead, when i surf site B, clicking on the voice menu italy will redirect to www.siteB.com/italy.do and clicking on the italy's menu voices like Pasta and Pizza won't change the url but just invoke javascript functions (usually very complex ones). So for site A i use libcurl to download the page www.siteA.com/italy/italianPizzaValues and then i parse it. What should i do with site B to obtain the same result and know my italian pizza values for site B?
In The Productive Programmer, Neal Ford suggests using Selenium for non-testing purposes such as yours. Selenium works by automating interactions with the web browser. It's designed for testing purposes but can be used for other purposes as Neal Ford suggests. Using the Selenium IDE, you can record your interactions with the web page, referencing HTML elements (including ones rendered by javascript) and then export the generated code to one of several high level programming languages (Java, .Net, PHP, Python, Perl or Ruby).
Before you go down the route of emulating a real browser and executing the JavaScript, try accessing the page in question in a real browser with a network monitor. Firefox with Firebug open on the ‘Net’ tab is one, or Fiddler for IE.
Look through the requests and responses that occur when you click on ‘Pizza’ and see if there's an obvious XMLHttpRequest that seems to contain the data you are looking for. If so, it'll be much quicker to just make that one request.