views:

104

answers:

4

Hi all. I've been grappling with the fraught area of escaping user (text) input for web pages. The ultimate goal is to have user input displayed and stored exactly as typed in, without breaking anything.

To that end I have been using the following test string :

'"$%^&()+=-£{}[]/n/<>\@~;|,.?#:!&amp;``&quot;&#39;

It seems to work well (even Stack Overflow or Twitter is not immune, hence the back ticks). My question is, will this string capture most escaping problems, for example going from a web page via Ajax and to a database and back again?

In fact how do I display this string in Stack Overflow without the back ticks?

Is there a better one, e.g. say one that will highlight encoding problems too?

+1  A: 

When I'm testing, I'm using something like this

a’b<’>",!"/%$?$&?%(()%/"!"/&?%$/"&$/"?%&?-f¯Ñ112üêù

This is generally sufficient to highlight encoding issues, at least from what I can see.

Philippe
A: 

That seems like it should be all of them. The smartest thing to do would be to (depending on the language you're using) use a library that has been well tested, that can sanitize user input. Just ask around what other websites use.

gburgoon
PHP, Javascript, MySQL, Oracle.
Jonathan Swift
+1  A: 

Including a mathematical symbol such as unicode x2202 might be useful too.

Erwin Smout
A: 

See here: http://gendoh.com/2511063

The post itself is written in Korean, but you could see what makes difference between several given patterns. (V1 to V3 are for generic web apps while V4 and V5 is for javascripts.)

Achimnol