tags:

views:

188

answers:

2
+4  Q: 

PHP 6 and strlen()

Hi,

I know PHP 6 will have unicode support for all build-in functions like strlen(). If this is the case, then counting the number of bytes in a variable can't be done with strlen() anymore. So, this will break probably some software? I'm curious, is the proper fix to do the following?

$bytes = strlen((binary) $var);
A: 

utf8_decode() would probably do the trick, or maybe you're looking for mb_strlen() where you can specify an encoding.

count_chars() also does something similar. I'm guessing at least one of these will remain byte-counters, and not work as utf8-symbol counters. There would be no need to have all these similar functions that do the same thing... And if any one should remain a byte-counter, then I'm guessing count_chars will, since it specifically mentions 0-255 values in the documentation.

Tor Valamo
+7  A: 

(The tests I did here are with today's snapshot of PHP6 -- I've just recompiled it)

The following portion of code :

$str = 'éléphant';
echo PHP_VERSION . ' : ' . strlen($str) . "\n";

gives this output, depending on the version of PHP :

$ /usr/local/php-5.2.9/bin/php temp.php
5.2.10-dev : 10

$ /usr/local/php-6.0.0/bin/php temp.php
6.0.0-dev : 8

This string, encoded as UTF-8, is 10 bytes long (Which explains the result with PHP 5.2), but only has 8 characters (Which explains the result with PHP 6).


Using (binary) like your proposed, with PHP 6 :

$str = 'éléphant';
echo PHP_VERSION . ' : ' . strlen((binary)$str) . "\n";

I get this output :

$ /usr/local/php-6.0.0/bin/php temp.php
6.0.0-dev : 10

So, it seems it is indeed counting the number of bytes, and not the number of characters anymore ;-)

Which means what you suggested seems OK.


As a sidenote, searching a bit more, I found this discussion ; as pointed out : strlen, with multi-byte string, in PHP6, will not break : it'll start working ;-)

Another thing that might be interesting is the declare construct, using the encoding directive -- not sure what it allows one to do exactly, but, maybe...

Pascal MARTIN