tags:

views:

107

answers:

4

I have the following which replace all of å, ø, æ .... etc to just _.

$string = strtolower($string);
$regexp = '/( |å|ø|æ|Å|Ø|Æ|Ã¥|ø|æ|Ã…|Ø|Æ)/iU';
$replace_char = '_';
$data = preg_replace($regexp, $replace_char, $string);

Now I want to change them to according to the followings.

Replace,

space to _

å, Å, Ã¥ and Ã… to a,

ø, Ø, à and ظ to o,

æ, Æ, æ and Æ to e.

Can I use str_replace with array to do it? If yes, how to?

Or do I have to repeat the same regex three times?

Could anyone tell me better way to write the code?

--EDIT--

Pleas ignore the encoding at the moment. I am NOT asking advices about encoding now.

I asked the encoding problem here. http://stackoverflow.com/questions/1989806/norwegian-characters-problem

A: 

Why not turn around the problem and instead state which characters are going to stay the same (e.g. all letters a to z, digits 0 to 9 etc.)?

You could then match all other characters with a regex similar to [^a-z0-9].

Edit: Of course, this suggestion only makes sense if it's actually easier to list all characters that will stay the same than listing all characters that need to be replaced.

stakx
This would only _add_ a piece of code, without adding value, while the question was about shortening code... (-1)
xtofl
xtofl, that would not *add* a piece of code. You should've downvoted Gumbo's answer for this reason. (not that I really think the said answer should be downvoted).
Michael Krelin - hacker
+1  A: 

Like Gumbo said, you have some troubles with encoding, but leaving this fix to you the general idea would be

$data=preg_replace('/[ åøæÅØÆ]/iu','_',mb_strtolower($string,'utf-8'));

Note the mb_ variant of strtolower, in case you want to work with unicode.

Edit: And stakx's suggestion also makes sense, but it changes the logic.

Michael Krelin - hacker
+4  A: 

I would use strtr that you can pass a mapping:

$mapping = array(
    'å' => 'a', 'Å' => 'a', 'Ã¥' => 'a', 'Ã…' => 'a',
    'ø' => 'o', 'Ø' => 'o', 'Ã' => 'o', 'Ø' => 'o',
    'æ' => 'e', 'Æ' => 'e', 'æ' => 'e', 'Æ' => 'e'
);
$str = strtr($str, $mapping);

But you should rather fix your encoding issue before. Because then you could use transliteration with iconv:

$str = iconv("UTF-8", "ISO-8859-1//TRANSLIT", $str);
Gumbo
My development environment is XAMPP/Windows. It must be something to do with Windows. But I don't really know how to fix it.
shin
Not sure if `iconv` is available on all installations and if it's not if it is a concern for OP, but thought I'd comment on that.
Michael Krelin - hacker
shin, maybe you should (1) look into your php.ini for encodings and (2) use utf-8-enabled editor.
Michael Krelin - hacker
@Michael Krelin: What do I need to check with phpinfo? I have Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7 And iconv.input_encoding ISO-8859-1 ISO-8859-1iconv.internal_encoding ISO-8859-1 ISO-8859-1iconv.output_encoding ISO-8859-1 ISO-8859-1etc.
shin
@shin: You need to know how your data actually encoded. The second one is just an example how to transliterate from UTF-8 to ISO 8859-1.
Gumbo
A: 

An alternative solution utilizing mappings is to use str_replace. I used a minimal set of your mappings for an example. Each value of $search maps to the corresponding index in $replace.

$search = array(' ', 'å', 'ø', 'æ', 'Å', 'Ø','Æ','Ã¥');
$replace = array('_', 'a', 'o', 'e', 'a', 'o', 'e', 'a');
$string = str_replace($search, $replace, mb_strtolower($string, 'utf-8');
cballou