views:

96

answers:

3

What's the best way to create scripts for a browser?

I need to parse some html pages on different domains

I am on windows and use firefox most of all.

A: 

I'd recommend BeautifulSoup.

Hank Gay
A: 

If it's just about retrieving the pages to do whatever you want with it, the built-in urllib module in python will do that for you.

balpha
A: 

It sounds like you want to retrieve webpages and parse them to extract meaningful data? I would suggest something like TagSoup (for Java) which fires off nice SAX events which you can use directly, or using an XML module of your choice (raw DOM, JDOM, dom4j, XOM, etc...). The TagSoup page also lists a number of references for other languages, suck as Beautiful Soup for Python, Rubyful Soup for Ruby and others.

From there, I would suggest using something like XPath to retrieve the bits of data that you want. Another option would be XSLT to transform the HTML into some unified format that you can more easily manipulate.

Adam Batkin