views:

87

answers:

3

Assuming all you have is the binary data and no pre-canned functions, is there a pattern or algorithm to categorize the type of character?

+8  A: 

You ask an API to tell you. In Java, you use the Character class. In C++, you can use ICU. If your language doesn't have this, you download the properties database from unicode.org and incorporate it.

In other words, there is no pattern or algorithm. There are tables published by the Unicode consortium that contain the information.

bmargulies
Well not what I was hoping to hear, but still a very implementable solution. Thanks:)
Oorang
+1  A: 

No, there's no pattern. You will need to create some look-up-tables. (Well, I suppose you could do it with a maze of if​s but it wouldn't be nice.)

Luckily in most environments there is a pre-canned API function to do it for you, because building the character class data tables is super-boring.

bobince
+1  A: 

dear oorang I have recently publishmy FOSS Unicode Converter and I'm using from Latest Unicode Character Database (Annex #44 - that contain Unicode 5.2)

in this (XML) database youcan search for your requested Character (Hex Code) and see if it is numeric or whatever you want.

you can test this atmy project and if it was usefull you can use its database

http://unicode.codeplex.com is the main repository for the project. you can just see the code or get the executable there

Nasser Hadjloo
I'll take a look @ it thanks!
Oorang