views:

36

answers:

1

For some reason, after submitting a string like this Jack’s Spindle from a text form to php, I get:

Jack%u2019s Spindle

This is not what PHP's urlencode() would do, which would be Jack%92s+Spindle or rawurlencode() = Jack%92s%20Spindle

Thus, urldecode() and the raw version don't work to decode that string... Is there another function for such strings?

--

Also, Jack’s Spindle would be the HTML-safe way to encode the above, but urlencode() and raw* for that yields: Jack%26%238217%3Bs+Spindle and Jack%26%238217%3Bs%20Spindle respectively...

Where is the %u2019 coming from? What does it represent? How do you get it back to just that innoculous apostrophe?

+1  A: 

Well, only you can tell us from where that came from. From are you getting your text and which transformations is it being submitted to? I confess I haven't seen that encoding strategy yet.

That said, it's very similar to the way Javascript encodes UTF-16 code units: \uXXXX where each X represents a hexadecimal character. To convert it to HTML entities, you could do:

preg_replace('/%u([a-fA-F0-9]{4})', '&#x\\1;', $string)
Artefacto
that reg ex did the trick exactly! quotes are fixed... ty :D
ina