ansaurus

Question

Embed section of HTML from another site?

Answer 1

A:

That sounds like something that IE8's Web Slices would be perfect for. However, it's only available in IE8, and the site of origin would have to implement for you to be able to take advantage of it.

Jeremy Sullivan 2009-06-15 20:53:35

Answer 2

+3 A:

Don't think image should really be last resort. You have no control over the HTML/CSS of the source page, so even if you craft a solution (probably by using JavaScript to parse out the desired snippet) there is no guarantee that tomorrow the site doesn't decide to change its layout.

Even Jeff, who has control over the layout of stackoverflow.com, still prefers to screen-capture the site, rather than pull in the contents live.

Now if your goals was to have the contents auto-update, that would be a different story. But still, unless you use some agreed-upon method of sharing content, such as RSS, your solution would be very fragile.

Eugene Katz 2009-06-15 21:05:58

Images also have the advantage of hack-free html support, total security, and not bandwidth leeching the target site who are unlikely to thank you for it.

annakata 2009-06-16 12:41:28

Answer 3

A:

I'd recommend using a server side solution with Python; using urllib2 to request the page, then using BeautifulSoup to parse out the bit that you need. BeautifulSoup has a very flexible selection api with which you can craft heuristics for the section you are interested in.

To illustrate:

soup = BeautifulSoup(html)
text = soup.find(text="Some text on the page that is unlikely to change")
print soup.parent.prettify()

That way if the webmaster later changes the markup on the page, your scraping script should still work.

EoghanM 2009-06-15 21:28:54

Answer 4

+1 A:

The concept you are describing is roughly what is called a "purple include" or "transclusions". There is a library out there for it, but its not exactly actively developed. Here's a couple ajaxian articles on it.

Russell Leggett 2009-06-15 21:36:38

Answer 5

A:

On client side <iframe> is the only practical option. It is possible to scroll it, but it might not work in the long term, because it's technically close to clickjacking attack.

There's also cross-site XHR, but requires opt-in from destination site, and today works only in few latest browsers.

Getting HTML on server side is easy (every decent web framework has ability to download page and parse HTML and you can use XPath/XSLT or DOM to extract bit you want).

Getting styles however is going to be tricky – CSS rules may not work with HTML fragment taken out of context. You'd have to parse CSS, extract and transform rules or use browser and read currentStyle of every node.

Obviously you have to heavily filter HTML you extract to avoid XSS. It's harder than it seems.

If you don't need to automate this, a good HTML+CSS WYSIWYG editor might be able to extract content fragment with styles.

porneL 2009-06-16 12:36:05

ansaurus

tags:

views:

answers:

Embed section of HTML from another site?

related questions