views:

203

answers:

2

Assuming my project is utf-8 throughout and has always been used with utf-8 encoding, is there anything legit that could possibly break if I change all occurances of htmlspecialchars($var) to htmlspecialchars($var, ENT_QUOTES, 'utf-8')?

I do know one thing: Obviously, ENT_QUOTES differs from ENT_COMPAT in that it also escapes single quotation marks. Assuming I know that this alone won't break anything, is there anything else left over?

Differently worded:

Is there a conceivable result of htmlspecialchars() when used without the charset parameter, given data only from the charset, that would differ from htmlspecialchars() when used with the charset parameter?

(Is, at any point, htmlspecialchars($stringThatIsValidUTF8, ENT_QUOTES) !== htmlspecialchars($stringThatIsValidUTF8, ENT_QUOTES, 'utf-8')?)

My understanding says no, never. Another question here on stackoverflow suggests no, too. So far, browsing my sandbox of the project with the changes also says no. However, I'm not sure if I'm overlooking something. I'm in a rather paranoid mood at the moment! :)

+2  A: 

No, it wouldn't differ, becouse if you didn't provide any charset, PHP will guess it, therefore it will use UTF-8.

erenon
+4  A: 

I think the quote from the PHP manual in the other question answers it definitely:

For the purposes of this function, the charsets ISO-8859-1, ISO-8859-15, UTF-8, cp866, cp1251, cp1252, and KOI8-R are effectively equivalent, as the characters affected by htmlspecialchars() occupy the same positions in all of these charsets.

" & > and so on all have the same code in each of those encodings, and even in UTF-8 they require only one byte, because an UTF-8 character will occupy multiple bytes only when necessary. Therefore, even if you have been processing UTF-8 data with ISO-8859-1 until now, the output will be identical when you switch to explicit UTF-8 input.

Pekka
I love when people just quote the manual ! (i.e. +1 for RTXM)
Pascal MARTIN
It's a re-quote actually, from the question @pinkgothic is linking to, so it was not *really* meant as RTFM :)
Pekka
Yeah, I figured as much. Thanks for confirming and soothing my paranoia! :D Much appreciated.
pinkgothic