(The tests I did here are with today's snapshot of PHP6 -- I've just recompiled it)
The following portion of code :
$str = 'éléphant';
echo PHP_VERSION . ' : ' . strlen($str) . "\n";
gives this output, depending on the version of PHP :
$ /usr/local/php-5.2.9/bin/php temp.php
5.2.10-dev : 10
$ /usr/local/php-6.0.0/bin/php temp.php
6.0.0-dev : 8
This string, encoded as UTF-8, is 10 bytes long (Which explains the result with PHP 5.2), but only has 8 characters (Which explains the result with PHP 6).
Using (binary)
like your proposed, with PHP 6 :
$str = 'éléphant';
echo PHP_VERSION . ' : ' . strlen((binary)$str) . "\n";
I get this output :
$ /usr/local/php-6.0.0/bin/php temp.php
6.0.0-dev : 10
So, it seems it is indeed counting the number of bytes, and not the number of characters anymore ;-)
Which means what you suggested seems OK.
As a sidenote, searching a bit more, I found this discussion ; as pointed out : strlen
, with multi-byte string, in PHP6, will not break : it'll start working ;-)
Another thing that might be interesting is the declare
construct, using the encoding
directive -- not sure what it allows one to do exactly, but, maybe...