views:

47

answers:

3

I'm building a web app in zend framework that needs UTF8 support for all languages. This seems to work fine except for functions like stripslashes and such.

On this URL, they talk about using MBSTRING http://developer.loftdigital.com/blog/php-utf-8-cheatsheet

Is it necessary to use mbstring on my server and replace ALL occurences of UTF8-incapable functions by their MB-variant?

Isn't Zend Framework suppost to support UTF8 ? If not, we'd have to replace all functions in the ZF-codebase to their mb_ alternatives, right? Which is an impossible task because an upgrade to a new ZF would break our code.

mail()      -> mb_send_mail()
strlen()    -> mb_strlen()  
strpos()    -> mb_strpos()
strrpos()   -> mb_strrpos()
substr()    -> mb_substr()
strtolower()    -> mb_strtolower()
strtoupper()    -> mb_strtoupper()
substr_count()  -> mb_substr_count()
ereg()      -> mb_ereg()
eregi()     -> mb_eregi()
ereg_replace()  -> mb_ereg_replace()
eregi_replace() -> mb_eregi_replace()   
split()     -> mb_split()

What's your advice on this, I might be completely wrong on this? I read about using:

mbstring.func_overload  = 7 ;

to overload all functions automatically.

Will this break an existing application that doesn't need UTF8 or does it "degrade gracefully"?

+1  A: 

Isn't Zend Framework suppost to support UTF8 ?

I don't know. Just grep through the code searching for strlen for example but you will still need to look at the code to determine if it's used in a context which is not multibyte safe. Quick googling revealed this http://www.iezzi.ch/archives/371 so it seems that ZF is prepared for UTF8 apps.

What's your advice on this, I might be completely wrong on this? I read about using: mbstring.func_overload = 7 ; Will this break an existing application that doesn't need UTF8 or does it "degrade gracefully"?

Of course it will work for non-multibyte strings as well and not break it. But before using it I would suggest to make sure that you really need it because it will cost performance.

Raoul Duke
+1  A: 

I don't think overloading all the functions with mb_string would be good , we all know that PHP doesn't handle utf8 natively so we use something like

"SET NAMES utf8" for the database & we use Zendmail + pass the encoding to it as a parameter to let Zend mail manage it self internally

another example is Zend_Validate_StringLength it had a parameter called encoding and it uses iconv in function called :

 public function setEncoding($encoding = null)
    {
        if ($encoding !== null) {
            $orig   = iconv_get_encoding('internal_encoding');
            $result = iconv_set_encoding('internal_encoding', $encoding);
            if (!$result) {
                require_once 'Zend/Validate/Exception.php';
                throw new Zend_Validate_Exception('Given encoding not supported on this OS!');
            }

            iconv_set_encoding('internal_encoding', $orig);
        }

        $this->_encoding = $encoding;
        return $this;
    }

but you would always use mb_string in your app in some logic which is not related to the framework .

for example yesterday i was sorting a utf8 array of post & comments from a database

i couldn't get the job done without using mb string because php doesn't handle utf8 natively :(

i love mb string it made my life easier

EDIT : what i meant to say is use the mbstring whenever you need it , and let the framework manage itself , i don't like overload all functions automatically.

tawfekov
+2  A: 

Do not, and I can only repeat, do not use mbstring overloading. It will most certainly break any method which, for instance, relies on strlen() returning the number of bytes. All components in Zend Framework expect UTF-8 by default, but can handle different charsets if you tell it to. That is done via iconv_*, which is built into PHP by default, so there are no dependencies on extra libraries like mbstring.

The only thing were you have to tell Zend Framework about UTF-8 is your database connection, which you can simply do via the charset option (see Zend_Db or Zend_Application documentation). You surely also want to tell the user agent which charset you deliver via the content-type header. And don't forget to add accept-charset="utf-8" in your tags.

DASPRiD
thanks, can you let me know why I'm reading that the standard string functions in php like strlen don't work well with UTF8 or languages like japanese, arabic, ... I've read that I need to use mbstring for that
Jorre
All PHP string functions are not multibyte-safe, means they are all operating byte-wise. If you need multibyte-safe functions, you should always first look at iconv_* (e.g. iconv_strlen), as the iconv library is built into PHP by default. If you cannot find an appropriate function there, then you may look into mb_*.There are even two functions which don't exist in either of those, which are word_wrap() and strpad(). I've implemented those via iconv in Zend_Text_MultiByte.
DASPRiD