The problem is you can't really tell the user how many characters are allowed in the field, because the escaped value, obviously, has more characters than the unescaped one.
I see a few solutions, but none looks very good:
- One whitelist for each field (too much work and doesn't quite solve the problem)
- One blacklist for each field (same as above)
- Use a field length that could hold the data even if all characters are escaped (bad)
- Uncap the size for the database field (worse)
- Save the data hex-unescaped and pass the responsibility entirely to output filtering (not very good)
- Let the user guess the maximum size (worst)
Are there other options? Is there a "best practice" for this case?
Sample code:
$string = 'javascript:alert("hello!");';
echo strlen($string);
// outputs 27
$escaped_string = filter_var('javascript:alert("hello!");', FILTER_SANITIZE_ENCODED);
echo strlen($escaped_string);
// outputs 41
If the length of the database field is, say, 40, the escaped data will not fit.