Simulate Browser Resources Expansion Behavior With Python

views:

answers:

+1 Q:

Simulate Browser Resources Expansion Behavior With Python

I'm looking for a way to simulate browser resources expansion behavior.

The flow I'm trying to address is the following:

Access an initial URL (e.g. http://example.dmn/index.htm)
Parse the html response received (e.g. index.htm)
Find the resources that a browser will fetch as a result of the index parsing, e.g.:
- Images
- Flash
- Embedded videos/audio
- Frames /iFrames
Repeat the process recursively for each new resource found

I'm not expecting to follow links (href), only page resources that will be fetched automatically by a browser when the page is first accessed.

Do you have a suggestion how to preform this simulation?

Are there any Python projects/libraries that may help ?

Thanks

You may wish to look at the Windmill Testing Framework which allows you to write tests in Python for web apps.

msanders 2010-06-15 09:43:29

You might want to look at spider.py, and robotparser. Barring those doing what you want automatically, you can dig into the HTML soup yourself with BeautifulSoup.

Nick Bastin 2010-06-15 09:45:25

+1 A:

You may want to take a look at Scrapy.

It may not provide all the exact features you need, but can be easily extended to do so.

Pablo Hoffman 2010-06-15 14:16:01

ansaurus

tags:

views:

answers:

Simulate Browser Resources Expansion Behavior With Python

related questions