views:

1717

answers:

4

I am using PHP 5.2.6 and my app's character set is UTF-8.

Now, how should I change PHP's default character set? NOT the one which specifies output's mime time and character set.

But which will change for all the PHP function like htmlspecialchars, htmlentities, etc.

I know, there is a parameter in those functions which takes the character set of the input string. But I don't want to specify for all the functions I use. And if somewhere I forget, it will be mess.

I also know, that I can wrap those functions and create my own wrapper like:

function myHtmlize($str)
{
  return htmlspecialchars($str, ENT_COMPAT, 'UTF-8');
}

I also, don't like this solution.

I really want to tell PHP, that by default take 'UTF-8' as the character set. Not 'iso-8859-1'.

Is it possible?

+2  A: 

Like this one ? http://us2.php.net/manual/en/function.setlocale.php

* LC_ALL for all of the below
* LC_COLLATE for string comparison, see strcoll()
* LC_CTYPE for character classification and conversion, for example strtoupper()
* LC_MONETARY for localeconv()
* LC_NUMERIC for decimal separator (See also localeconv())
* LC_TIME for date and time formatting with strftime()
* LC_MESSAGES for system responses (available if PHP was compiled with libintl)
Ahmet Kakıcı
I did not get what exactly what I have to do..
Sabya
Did you check this one ? http://us2.php.net/manual/en/function.setlocale.php
Ahmet Kakıcı
+2  A: 

There is a C-function determine_charset(char *charset_hint ...) which is used to find the "right" charset based on

in that order and depending on whether some extensions are built-in or not.
The "problem" is, when you call htmlentities('xyz') this determine_charset() is called with charset_hint=NULL and the first this function does is:

/* Guarantee default behaviour for backwards compatibility */
if (charset_hint == NULL)
    return cs_8859_1;

You have to call at least htmlentities('xyz', ENT_QUOTES, '')

VolkerK
That's what exactly I want to get rid of.
Sabya
I don't see how, unless you change php's source code (most likely for the htmlentities function)
VolkerK
A: 

I'm not entirely sure, but I think mbstring.func_overload works with htmlentities.

htmlspecialchars is charset-neutral btw. (At least as long as the charset supports the ascii subset, which utf-8 does).

troelskn
A: 

Hello Im having a similiar problem I am trying to retrieve data from oracle into PHP and I have successfully accomplished that, but the data appeared in question marks (??????), so I add the AMERICAN_AMERICA.AR8MSWIN1256 to NLS language in the registry for oracle and now the data appears as (ÓÇãÑ ÝíáíÈ ÚíÏ ÇáÚÏíáí). I tried encoding and all suggested character sets from IE and from the PHP code but had no results. I tried Windows-1256, UTF-8 & 16, I would appreciate some support for this issue By the way anything else in the page that is not generated from the database and written in arabic appears in arabic OK with no problems so i have no problem with that, only with what is generated from the database. windows-1256 generates (ÓÇãÑ ÝíáíÈ ÚíÏ ÇáÚÏíáí) UTF-8 generates (����� � ����� ) Thank you

Samer