ansaurus

Question

Check unicode in PHP

Answer 1

+1 A:

You'd usually do something like:

if (mb_strlen($ch) != strlen($ch)) ...

I should add: strlen counts bytes, while mb_strlen counts characters (properly handling multi-byte characters, which I guess is what you're really talking about rather than unicode - as unicode also covers over a hundred single-byte characters indistinguishable from ASCII)

searlea 2009-08-29 07:37:16

hi searlea,thanks for your fast response ! this is exactly what i was looking for.

Orion 2009-08-29 07:43:36

ive checked this and getting the result like below <?php$ch = 'രവീഷ്'; //my name in Malayalamecho mb_strlen($ch)."<br/>";echo strlen($ch)."<br/>";if (mb_strlen($ch) != strlen($ch)) echo "Unicode";else echo "Non-Unicode";?>its giving the result 1515 Non-unicodewhat could be the problem ?

Orion 2009-08-29 07:50:22

Answer 2

+2 A:

you can try with

mb_check_encoding($s,"UTF-8")

link

Svetlozar Angelov 2009-08-29 07:42:20

This code seems working !!.. and it will be helpful if you could help me to find a function to get the code point of a unicode charecter ?

Orion 2009-08-29 08:03:38

let $s be "somestring", so if (mb_check_encoding($s,"UTF-8")) then {the string is unicode} else {it is not unicode }

Svetlozar Angelov 2009-08-29 08:35:03

Answer 3

A:

(PHP 6 >= 6.0.0) is_unicode()

<?php

/*** create a unicode string ***/
$unicode = "ûПÌčöđę";

/*** check if value is unicode ***/
if(is_unicode($unicode))
    {
    echo 'Unicode string';
    }

?>

adatapost 2009-08-29 07:45:01

I am using PHP Version 5.2.9. Is there any function similer to this in 5.2.6 ?

Orion 2009-08-29 07:59:39

-1 And in PHP >= 9.0.0 the AI system will do this for you.

Alix Axel 2009-08-29 08:13:56

Answer 4

A:

Hi Searlea,

Ive checked this with this code :

<?php
     $ch = 'രവീഷ്'; // my name in Malayalam
     echo mb_strlen($ch)."<br/>";
     echo strlen($ch)."<br/>";
     if (mb_strlen($ch) != strlen($ch))
    echo "Unicode";
     else
    echo "Non-Unicode";
?>

and i am getting the result like :

15

Non-unicode

that means strlen and mb_strlen returns the same result. what could be the error ?

Orion 2009-08-29 07:55:31

This is not the proper place to add follow-ups.

Alix Axel 2009-08-29 08:12:43

sorry. i am a total newbie here and this is my first Qstion..

Orion 2009-08-29 08:13:53

Answer 5

A:

A unicode character will ALWAYS have the most significant byte set no matter what the value of the character is or if it's part of a multi-byte unicode character or what. You can't just check to see if the string has more bytes than characters since some unicode characters are only one byte. If any character in a string's byte value is greater than 127, that string contains unicode.

Jeff Tucker 2009-08-29 08:04:38

how can i get the code point of a unicode char ?

Orion 2009-08-29 08:05:54

This should help:http://www.joelonsoftware.com/articles/Unicode.html

Jeff Tucker 2009-08-29 10:16:07

Answer 6

+3 A:

Actually you don't even need the mb_string extension:

if (strlen($string) != strlen(utf8_decode($string)))
{
    echo 'is unicode';
}

And to find the code point of a given character:

$ord = unpack('N', mb_convert_encoding($string, 'UCS-4BE', 'UTF-8'));

echo $ord[1];

Alix Axel 2009-08-29 08:10:36

thanks eyeze !!!! this code worx .... thanks a lot ..

Orion 2009-08-29 09:26:33

@Raveesh: If my answer solved your problem you can mark it as accepted. =)

Alix Axel 2010-01-06 06:18:17

Answer 7

A:

Thanks for all these valuable comments!! thanks for the patience to explain the basics to a newbie like me..

Can anyone suggest a way to get the code point of a Unicode character ?

Orion 2009-08-29 08:13:12

Answer 8

A:

Thanks guys .. Finally i got the answer i was looking for .

Got an include file from http://hsivonen.iki.fi/php-utf8/.

The following code solved my problem:

<?php
  require_once("utf8.inc");
  /*** create a unicode string ***/
  $s = "حملة إلا صلاتي";
  $out = utf8ToUnicode($s);
  for ($i=0;$i < strlen($s);$i++)
    echo dechex($out[$i]).".";
?>

Orion 2009-08-29 09:22:38

Answer 9

A:

Strings in PHP are bytestreams - not character streams. You can't actually have unicode strings in PHP; You need to encode your characters with some encoding. If you want to cover the entire unicode range, UTF-8 is the most obvious choice.

If you want to get the codepoint of a utf-8 encoded bytestream, you can use this library: http://hsivonen.iki.fi/php-utf8/

However, I wonder what exactly you need this for? Most likely, you can solve all your woes by simply using utf-8.

troelskn 2009-08-29 19:54:00

hi guys.. my simple requirement was to find the code point of byte stream and i got it through the library from http://hsivonen.iki.fi/php-utf8/!!thanks a lot !!

Orion 2009-08-30 10:12:05

I still don't really see what the problem you were solving in the first place was. I'm quite sure you are digging a hole for your self.

troelskn 2009-08-30 13:52:24

ansaurus

tags:

views:

answers:

Check unicode in PHP

related questions