ansaurus

Question

PHP remove accents

Answer 1

+4 A:

If you have iconv installed, try this (the example assumes your input string is in UTF-8):

echo iconv('UTF-8', 'ASCII//TRANSLIT', $string);

(iconv is a library to convert between all kinds of encodings; it's efficient and included with many PHP distributions by default. Most of all, it's definitely easier and more error-proof than trying to roll your own solution (did you know that there's a "Latin letter N with a curl"? Me neither.))

Piskvor 2010-08-22 18:27:38

+1 Beat me to it. This should work best. However, note that this tends to fail if there are invalid characters in the input (using `ASCII//TRANSLIT//IGNORE` should help) and as so often, if encountering problems, the User Contributed Notes are a good read. http://www.php.net/manual/en/function.iconv.php

Pekka 2010-08-22 18:28:21

For some reason, sometimes I can't get this to work. See http://codepad.viper-7.com/SUufA4 But in another machine, I got "`E^au~N". Not was desired, though.

Artefacto 2010-08-22 18:38:36

Nice, simple and small and works...for me

Mark 2010-08-22 18:38:55

This inconv has some conflicts so I will ask a similar question

Mark 2010-08-22 18:40:35

Answer 2

+3 A:

You can use iconv to transliterate the characters to plain US-ASCII and then use a regular expression to remove non-alphabetic characters:

preg_replace('/[^a-z]/i', '', iconv("UTF-8", "US-ASCII//TRANSLIT", $text))

Another way would be using the Normalizer to normalize to the Normalization Form KD (NFKD) and then remove the mark characters:

preg_replace('/\p{Mn}/u', '', Normalizer::normalize($text, Normalizer::FORM_KD))

Gumbo 2010-08-22 18:28:53

`ISO-8859-1`? Are you sure? Won't this leave at least ÄÖÜ in place (as their 8859-1 counterparts)?

Pekka 2010-08-22 18:32:18

What’s the reason for the down vote?

Gumbo 2010-08-22 18:32:44

Downvote isn't mine. However, the OP is not asking to remove non-alphabetic characters, is he?

Pekka 2010-08-22 18:34:04

It was mine. Reverted now that you fixed it.

Artefacto 2010-08-22 18:35:50

@Pekka: The transliteration of `ÈâuÑ` using `iconv` gives `\`E^au~N`. That’s why the following cleanup is used.

Gumbo 2010-08-22 18:39:02

@Gumbo I see. I'm sorry, we have had this discussion in a duplicate somewhere already :) +1 for the most complete solution, then, that should be made the accepted one. *Update:* If I had any votes left

Pekka 2010-08-22 18:40:40

Cam you explain why NFKD?

Artefacto 2010-08-22 18:42:13

By the way, what you say and your code don't match once again. FORM_D makes more sense.

Artefacto 2010-08-22 18:47:14

@Artefacto: Thanks for the remark; fixed it. And take a look at figure 6 in http://unicode.org/reports/tr15/#Norm_Forms.

Gumbo 2010-08-22 18:52:11

@Gumbo OK, I guess it's a matter of preference, though strictly that normalization won't take care only of the marks. See also the other question of the OP. I took some, erm, inspiration from you (basically only replaced the [a-z] regex you then had with \p{M} and left Normalizer::FORM_D.

Artefacto 2010-08-22 19:09:44

ansaurus

tags:

views:

answers:

PHP remove accents

related questions