views:

307

answers:

1

I am building a web app where I need to get all the images and any flash videos that are embedded (e.g. youtube) on a given URL. I'm using Python.

I've googled, but have not found any good information about this (probably because I don't know what this is called to search for), does anyone have any experience with this and knows how it can be done?

I'd love to see some code examples if there are any available.

Thanks!

+7  A: 

BeautifulSoup is a great screen-scraping library. Use urllib2 to fetch the page, and BeautifulSoup to parse it apart. Here's a code sample from their docs:

import urllib2
from BeautifulSoup import BeautifulSoup

page = urllib2.urlopen("http://www.icc-ccs.org/prc/piracyreport.php")
soup = BeautifulSoup(page)
for incident in soup('td', width="90%"):
    where, linebreak, what = incident.contents[:3]
    print where.strip()
    print what.strip()
    print
Ned Batchelder