views:

151

answers:

2

I'm comparing a string from the database to a list of strings in an array:

if (in_array($entry, array('Söme string', 'other-string')))

This works for other-string, but not for Söme string, the main difference beeing that this string has an umlaut and an html entity in it. If $entry is Söme string in the database, the comparison fails, even though it should be the same string.

I also tried strcmp and direct comparison using === and ==, but the comparison is always negative. I also tried utf8_encode before comparison, but that did nothing.

The database is using UTF-8, I fetch the data using Drupal API functions and my php file is also UTF-8 encoded. If I print $entry and Söme string to the output HTML, they are indistinguishable.

Any idea what could be causing this behaviour?

Update

Thanks for the help. It seems the   is converted on the way and is stored as a real non-breaking space in the database, not as an HTML entity. Printing it converts it back to an HTML entity (or maybe Firebug does that when I look at it).

The output of var_dump() (using print function, taken from resulting html source):

$entry: string(14) "Söme string"

"Söme string": string(18) "Söme string"

(I've edited the string as the real one contains a name)

Update 2

I've changed the string to "Some string" and here's the output of

var_dump(bin2hex($entry));
var_dump(bin2hex('Some string'));

$entry: string(24) "536f6d65c2a0737472696e67"
"Some string": string(32) "536f6d65266e6273703b737472696e67"
+3  A: 

Then the strings are not the same. Perhaps:

  • $entry has an actual space instead of a non-breaking space.
  • One has the HTML entity   while the other has an actual non-breaking space.
  • In one of the scripts the character ö is decomposed and in the other it isn't.

Try to var_dump the array and $entry.

Artefacto
A: 

The problem was that $entry contained a UTF-8 encoded non-breaking space (0xc2a0). Just calling html_entities on it did not work, because I did not specify the charset. So my solution is the following:

htmlentities($entry, ENT_QUOTES, 'UTF-8')
Fabian