ansaurus

Question

Compare letters from different languages

Answer 1

+6 A:

It's not clear what you mean by "play the same role".

They are certainly not the same character, though they may appear to be when rendered.

This is exactly analogous as the confusion between "l" (lowercase L) and "I" (uppercase i) in many fonts.

If you want to consider A and А to be the same, you have to transliterate the Cyrillic into a Latin one. Unfortunately, PHP support for transliteration is sketchy. You can use iconv, which is not great -- if you transliterate to ASCII, you'll lose everything that cannot be represented in ASCII.

The Unicode PHP implementation (what was supposed to be PHP 6) had a function called str_transliterate that used the ICU transliteration API. Hopefully, transliteration will be added to the intl extension (the current ICU wrapper) in the future.

Artefacto 2010-09-03 21:24:00

Answer 2

+1 A:

They're certainly not the same. PHP doesn't use eyes or OCR to determine what letter a character is.

$latinA = 'A';
$cyrillicA = 'А';

var_dump($latinA == $cyrillicA); // bool(false)

BoltClock 2010-09-03 21:25:56

You cannot use `ord` in the cyrillic character. It's composed of two bytes. You're getting the leading byte only.

Artefacto 2010-09-03 21:27:41

@Artefacto: good catch, didn't know `ord()` isn't multibyte-compatible.

BoltClock 2010-09-03 21:31:24

Answer 3

+1 A:

You might be interested in the 'spoof detection' API in ICU. I think it is designed to report that your two As are 'visually confusable'.

Steven R. Loomis 2010-09-08 17:32:24

ansaurus

tags:

views:

answers:

Compare letters from different languages

related questions