views:

1130

answers:

1

Consider the following string. Its encoded in UTF-16-LE and saved into a PHP variable. I failed to get either mbstring or iconv to replace the ' with single quote. What would be a good way to sanatize it.

String : Carl Sagan's Cosmic Connection

+1  A: 

Unless I'm misunderstanding the question, ' isn't a UTF-16 issue. That string has had htmlspecialchars() or htmlentities() run on it, and the single quote was converted to the html entity represenation '.

To get it back to normal you need to do an html_entity_decode().

Generally you only want to do html encoding at output time so you avoid persisting that kind of conversion. If you are taking in HTML input somewhere to get these kind of strings, you probably want to decode entities before you do the UTF conversion and storage.

zombat
Hey thanks for the reply but It doesn't do it. Even a simple str_replace refuses to work which led me to the above conclusion.
gnosio
Support for UTF-16 with string functions is pretty low. I'm guessing you're going to have to convert your UTF-16 string to an encoding that can be handled by html_entity_decode(). Something like `$str = html_entity_decode(iconv('UTF-16','UTF-8',$str),ENT_QUOTES,'UTF-8');`. You could then convert back to UTF-16 if needed. I'm not sure if all these conversions will work for your purposes or not, but it's my hunch that you can't operate on UTF-16 with the basic string functions.
zombat