views:

43

answers:

3

I'm looking for a way to simulate browser resources expansion behavior.

The flow I'm trying to address is the following:

  • Access an initial URL (e.g. http://example.dmn/index.htm)
  • Parse the html response received (e.g. index.htm)
  • Find the resources that a browser will fetch as a result of the index parsing, e.g.:
    • Images
    • Flash
    • Embedded videos/audio
    • Frames /iFrames
  • Repeat the process recursively for each new resource found

I'm not expecting to follow links (href), only page resources that will be fetched automatically by a browser when the page is first accessed.

Do you have a suggestion how to preform this simulation?

Are there any Python projects/libraries that may help ?

Thanks

A: 

You may wish to look at the Windmill Testing Framework which allows you to write tests in Python for web apps.

msanders
A: 

You might want to look at spider.py, and robotparser. Barring those doing what you want automatically, you can dig into the HTML soup yourself with BeautifulSoup.

Nick Bastin
+1  A: 

You may want to take a look at Scrapy.

It may not provide all the exact features you need, but can be easily extended to do so.

Pablo Hoffman