views:

296

answers:

4

The problem is you can't really tell the user how many characters are allowed in the field, because the escaped value, obviously, has more characters than the unescaped one.

I see a few solutions, but none looks very good:

  • One whitelist for each field (too much work and doesn't quite solve the problem)
  • One blacklist for each field (same as above)
  • Use a field length that could hold the data even if all characters are escaped (bad)
  • Uncap the size for the database field (worse)
  • Save the data hex-unescaped and pass the responsibility entirely to output filtering (not very good)
  • Let the user guess the maximum size (worst)

Are there other options? Is there a "best practice" for this case?

Sample code:

$string = 'javascript:alert("hello!");';
echo strlen($string);
// outputs 27
$escaped_string = filter_var('javascript:alert("hello!");', FILTER_SANITIZE_ENCODED);
echo strlen($escaped_string);
// outputs 41

If the length of the database field is, say, 40, the escaped data will not fit.

+2  A: 

making some wild assumptions about the context here:

  • if the field can hold 32 characters, that is 32 unescaped characters
  • let the user enter 32 characters
  • escape/unescape is not the user's problem
  • why is this an issue?
    • if this is form data-entry it won't matter, and
    • if you are for some reason escaping the data and passing it back then unescape it before storage

without further context, it looks like you are fighting a problem that doesn't really exist, or that doesn't need to exist

Steven A. Lowe
A: 

This is an interesting problem.

I think the solution will be a problem if you assign any responsibility to them because of the sanitization. If they are responsible for guessing the maximum length, then they may well give up and pick something else (and not understand why their input was invalid).

Here's my idea: make the database field 150% the size of the input. This extra size serves as "padding" for the space of the hex-sanitization, and the maximum size shown to the user and validator is the actual desired size. Thus if you check the input length before sanitization and it is below that 66% limit on the length your sanitized data should be good to go. If they exceed that extra 34% field space for the buffer, then the input probably should not be accepted.

The only trouble is that your database tables will be larger. If you want to avoid this, well, you could always escape only the SQL sensitive characters and handle everything else on output.

Edit: Given your example, I think you're escaping far too much. Either use a smaller range of sanitization with HTMLSpecialChars() on output, or make your database fields as much as 200% of their present size. That's just bloated if you ask me.

The Wicked Flea
A: 
  • Why are you allowing users to type in escaped characters?
  • If you do need to allow explicitly escaped characters, then interpolate the escaped character before sanity-checking it

You should pretty much never do any significant work on any string if it is somehow still encoded. Decode it first, then do your work.

I find some people have a tendancy to use escaping functions like addSlashes() (or whatever it is in PHP) too early, or decode stuff (like removing HTML-entities) too late. Decode first, do your stuff, then apply any encoding you need to store/output/etc.

Dan
+8  A: 

Don't build your application around the database - build the database for the application!

Design how you want the interface to work for the user first, work out the longest acceptable field length, and use that.

In general, don't escape before storing in the database - store raw data in the database and format it for display. If something is going to be output many times, then store the processed version.

Remember disk space is relatively cheap - don't waste effort trying to make your database compact.

Peter Boughton
I just wanted to extra-agree with the point about storing the raw input in the database. If you pre-HTML-escape your data and discover a problem with your escaping routines later, you're out of luck.
Neall