strlen
is a single-byte string function that fails on mutli-byte strings as it only returns the number of bytes rather than the number of characters (since in single-byte strings every byte represents one character).
For multi-byte strings use strlen
’s multi-byte counterpart mb_strlen
instead and don’t forget to specify the proper character encoding.
And to have HTML character references being interpreted as a single character, use html_entity_decode
to replace them by the characters they represent:
$str = html_entity_decode('Stackù', ENT_QUOTES, 'UTF-8');
var_dump(mb_strlen($str, 'UTF-8')); // int(6)
Note that �f9
is not a valid character reference as it’s missing a x
or X
after &#
for the hexadecimal notation and a ;
after the hexadecimal value.