views:

298

answers:

3

How can I prevent double encoding of html entities, or fix them programmatically?

I am using the encode() function from the HTML::Entities perl module to encode HTML entities in user input. The problem here is that we also allow users to input HTML entities directly and these entities end up being double encoded.

For example, a user may enter:

Stackoverflow & Perl = Awesome…

This ends up being encoded to

Stackoverflow & Perl = Awesome…

This renders in the browser as

Stackoverflow & Perl = Awesome…

We want this to render as

Stackoverflow & Perl = Awesome...

Is there a way to prevent this double encoding? Or is there a module or snippet of code that can easily correct these double encoding issues?

Any help is greatly appreciated!

+1  A: 

Consider saving the call to encode() until you retrieve the value for display, rather than before you store it. So long as you are consistent in your retrieval mechanism, the extra data in your database probably isn't worth fretting over.

Edit

Re-reading your question I realize now my answer doesn't fully address the issue seeing as calling encode() later will still have the same results. Not knowing of an alternative myself, it may not be much help, but you may want to consider finding a more suitable method for encoding that will respect existing symbols.

Nathan Taylor
I think a method that respects existing entities would be ideal. I know that the corresponding method to encode in PHP has a flag to prevent double encoding. Does such a method exist in Perl?
Bob
+4  A: 

There is an extremely simple way to avoid this:

  1. Remove all the entities upon input (turn them into Unicode)
  2. Encode into entities again at the stage of output.
Kinopiko
Always. Always store data in a known format. Don't mix and match. Always decode (transform into a known format) on input. Always encode (transform into the necessary format for display or interaction) on output. Applies to HTML-entities just as much as it applies to Unicode.
hobbs
+6  A: 

You can decode the string first:

my $input = from_user();

my $encoded = encode_entities( decode_entities $input );
Eric Strom