views:

186

answers:

9

if i have a user entering data into a rich text editor (tiny editor) and submitting data that i am storing into a database and then retrieving to show on other dynamic web pages, why do i need encoding here.

Is the only reason because someone might paste javascript into the rich text editor? is there any other reason ?

+15  A: 

Security is the reason.

The most obvious/common reason is Cross-Site-Scripting (XSS). It turns out to be the root cause of the security problems you might witness in your site.

Cross-site scripting (XSS) is a type of computer security vulnerability typically found in web applications that enables malicious attackers to inject client-side script into web pages viewed by other users. An exploited cross-site scripting vulnerability can be used by attackers to bypass access controls such as the same origin policy. Cross-site scripting carried out on websites were roughly 80% of all security vulnerabilities documented by Symantec as of 2007.1 Their impact may range from a petty nuisance to a significant security risk, depending on the sensitivity of the data handled by the vulnerable site, and the nature of any security mitigations implemented by the site's owner.

Additional, as shown in below comments, the layout of your site can also be screwed up.

You need Microsoft Anti-Cross Site Scripting Library

More Resources

http://forums.asp.net/t/1223756.aspx

Web Logic
Additionally, design wise if they enter <div>.. with no closing tag that would ruin the whole page layout. That's why public editors (like here on SO) don't make HTML, they have their own subset of tags and THEY do the formatting to make hmtl.
Dan Heberden
You're missing the point. He's accepting HTML-formatted text, so he cannot escape it.
SLaks
@Dan Heberden: That's true agreed :)
Web Logic
@SLaks: That's right, it can screw up the layout.
Web Logic
I just realized that tiny editor seems to do this for you so thats why i was confused why everything was working without me doing anything
ooo
Yes, but you **NEED** server-side validation, or anyone will be able to easily inject a `<script>` tag by bypassing the browser.
SLaks
@SLaks: That's the way to go, bad guys can disable javascript, so javascript validation won't do of course.
Web Logic
+2  A: 

Security is the main reason.

Abe Miessler
+2  A: 

Not only could a user enter javascript code or some other naughtiness, you need to use HTML encode in order to display certain characters on the page. You wouldn't want your page to break because your database contained: "Nice Page :->".

Also, if you are entering the code into a database, be sure to "sanatize" the inputs to the database.

Rice Flour Cookies
@Rising Star - are you saying i should encode before saving to the db.
ooo
Rice Flour Cookies
I just realized that tiny editor seems to do this for you so thats why i was confused why everything was working without me doing anything
ooo
Yes, but you **NEED** server-side validation, or anyone will be able to easily inject a `<script>` tag by bypassing the browser.
SLaks
+3  A: 

You're making some mistakes.

If you're accepting HTML-formatted text from the rich-text editor, you cannot call Html.Encode, or it will encode all of the HTML tags, and you'll see raw markup instead of formatted text.

However, you still need to protect against XSS.

In other words, if the user enters the following HTML:

<b>Hello!</b>
<script>alert('XSS!');</script>

You want to keep the <b> tag, but drop (not encode) the <script> tag.
Similarly, you need to drop inline event attributes (like onmouseover) and Javascript URLs (like <a href="javascript:alert('XSS!');>Dancing Bunnies!</a>)

You should run the user's HTML through a strict XML parser and maintain a strict white-list of tags and attributes when saving the content.

SLaks
The user isn't typing in HTML. the editor is rich text so they type in like ms word and when i grab the data i get the encoded html as an output
ooo
Yes, but you **NEED** server-side validation, or anyone will be able to easily inject a `<script>` tag by bypassing the browser. Also, you're grabbing HTML, not encoded HTML.
SLaks
@ SLaks - as tiny editor seems to give you already encoded HTML, are you suggestion that i call HTML.Encode() on already encoded data. wouldn't that cause issues in the normal use case.
ooo
**No, I'm not**. You need to filter the tags and attributes.
SLaks
+1  A: 

Yes, it is to prevent JavaScript from executing if someone were to input malicious string into the rich text editor. However, plain text javascript it not your only concern, for example this is a XSS:

<IMG SRC=&#0000106&#0000097&#0000118&#0000097&#0000115&#0000099&#0000114&#0000105&#0000112&#0000116&#0000058&#0000097&#0000108&#0000101&#0000114&#0000116&#0000040&#0000039&#0000088&#0000083&#0000083&#0000039&#0000041>

Take a look here for a range of different XSS options; http://ha.ckers.org/xss.html

Dustin Laine
A: 

Another reason is that some user can input a few closing tags </div></table> and potentially break the layout of your web site. If you are using an HTML editing tool make sure the produced html is valid before embedding it in the page without encoding. Some server side parsing is required in order to do this. You can use HtmlAgilityPack to do this.

korchev
No; you should use an XML parser and only accept well-formed XML.
SLaks
+1  A: 

As an aside..... MVC2 has implemented new functionality so you no longer need to call HTML.Encode

if you change your view syntax from

<%= %>

to

<%: %>

MVC will automatically encode for you. It makes thing much easier/quicker. Again, MVC2 only

John Ptacek
Yes, but he cannot HTML encode it.
SLaks
+3  A: 

I think you're confusing "encoding" with "scrubbing."

If you want to accept text from a user, you need to encode it as HTML before you render it as HTML. In this way, the text

a < b

is HTML-encoded as

a &lt; b

and rendered in an HTML browser (just as the user entered it) as:

a < b

If you want to accept HTML from a user (which it sounds like you do in this case), it's already in HTML format, so you don't want to call HTML.Encode again. However, you may want to scrub it to remove certain markup that you don't allow (like script blocks).

C. Dragon 76
A: 

The primary reason to do what your suggesting is to escape your output. Since you are accepting HTML and want to output it you can't do that. What you need to do is filter out thing that user's can do that are insecure, or at least not what you want.

For that, let me suggest AntiSamy.

You can demo it here.

What you are doing has a lot of inherit risks and you should consider it very carefully.

Flory