views:

49

answers:

1

I have PHP configured with mbstring.func_overload = 7, so all the single-byte-string functions are mapped to their multi-byte equivalents. But I still sometimes need to treat strings as byte arrays; for example, when calculating their size or doing encryption.

What's the best approach here? Can I just use the multi-byte functions and pass them a single-byte encoding, even though that's not actually how the string is encoded? For example:

mb_substr($utf8str, 0, 1, "latin1");
mb_strlen($utf8str, "latin1");

EDIT: I noticed when looking through PHP's source that they rename the original functions to mb_orig_X, as in mb_orig_strlen. Probably not safe to use, as they're not documented, but interesting.

+1  A: 

I think you shouldn't be overriding these functions if you need to use the original ones (i.e., if you really need to operate on binary strings), it is quite a dirty solution. This forces you to make an even dirtier workaround for that choice you made earlier. And it possibly breaks libraries you are using without you being aware of that (but the PHP team keeps inventing more and more stupid features like that).

But if you must keep it that way, you should:

  1. use a language-neutral encoding like ASCII (not for the interpreter, but for those reading your code - even if that's you in 2 years.) and
  2. document why you did that thoroughly, since it will be very confusing for everyone looking into that piece of code.
soulmerge
I don't think it's a dirty solution. Sometimes you just need to work with binary data. But I agree you have to be careful with it (see http://stackoverflow.com/questions/1647419/php-mbstring-funcoverload-vs-using-mbstring-functions). Also, an even better choice for the encoding name to use would be `binary` or `8bit`.
mercator
Overriding the behaviour of a well-documented function is *always* a bad idea. Think of it this way: the function is *lying* to you, i.e. it does not do, what it promises to do. Or here is another one: What would happen if your arrays would stop storing NULL values, silently ignoring them without even generating a key in the array? All by the configuration value `array.store_null_values = false` (I hope noone on the PHP team is reading this, I'm probably giving them bad ideas.)
soulmerge
Is `binary` a real encoding? I don't see it listed on http://php.net/manual/en/mbstring.supported-encodings.php, but it seems to work. Do you know what the differences are between `binary`, `8bit`, and `ascii`?
JW
Looked through the source. `binary` and `8bit` seem to be the same. `7bit` includes only 7-bit characters (of course), and `ascii` includes 0x20-0x80, plus 0, 0x09, 0x0a, and 0x0d.
JW
`binary` is an alias of `8bit` (http://bugs.php.net/bug.php?id=26699). All those differences shouldn't matter for just getting the string length in any case, except for readability, like soulmerge said.
mercator