tags:

views:

52

answers:

4

I've got an input so the user can type either html or plain text. When the user copy & paste text from MS Word, for example, it generates a weird html. Then, when you view that topic, you can see the whole page's style is affected. I don't really know if the generated html has unclosed tags or something, but it looks like it does and thus, the style of the page is affected.

Does anybody know how to "isolate" the html of that div(or whatever the container be) from the whole page's style?

A: 

Copying text from word can include <style> tags. The only sure way to isolate these styles is to put the input control in an <iframe>

Michael La Voie
A: 

You can either sanitize the input or display it in an IFrame.

klausbyskov
+3  A: 

Short of showing the content in an IFRAME, you can't really do that. What I usually do in this situation is apply tag stripping logic to the content as it comes in. You really don't want to allow arbitrary HTML from a security perspective, but even if you don't care what your users input, you should be stripping out invalid HTML tags (Word has a habit of creating tags with weird namespace-looking things like o:p) and running something like Tidy over the result to ensure every tag is properly closed. There are a number of Tidy libraries for .NET out there; here's one.

Here's a quick cut-and-paste of how I've done this in the past. Note that the class implements an interface from the project I used it in, but you get the general idea.

Tom
How do I include it on my project? I tried including the dll on the project but I get an exception saying that the source was not found. Do I have to include the entire source code?
Brian Roisentul
You would need to link the TinyATL DLL to whatever project you're calling it from.
Tom
A: 

It it were me I'd strip all but basic formatting (e.g., bold, italics) and use Tidy. That's what I end up doing, I strip and convert all the CSS styles of word into <strong>, <em>, etc.

Xorlev