tags:

views:

76

answers:

6

Hi, I know unicode contains all characters from most world aphabets..but what about digits? Are they part of unicode or not? I was not able to find straight answer. Thanks

+1  A: 

Yes they are - codepoints 0030 to 0039, as you can see e.g. on decodeunicode.org

btw, codepoints 0000-007E are the same as ASCII (0-127, 128+ isn't ASCII anymore), so anything that you can find in ASCII you can find in Unicode.

Piskvor
And note that unicode contains a lot more digits than just 0-9.
Hans Kesting
@Hans Kesting: Indeed, e.g. subscripts and superscripts: http://www.decodeunicode.org/en/superscripts_and_subscripts , ancient Greek numbers: http://www.decodeunicode.org/en/ancient_greek_numbers and others
Piskvor
+1  A: 

The Unicode points below 128 are exactly the same as ASCII so, yes, they're at U+0030 through U+0039 inclusive.

paxdiablo
+1  A: 

Yes I think so: Information Taken From Here

U+0030  0   30  DIGIT ZERO
U+0031  1   31  DIGIT ONE
U+0032  2   32  DIGIT TWO
U+0033  3   33  DIGIT THREE
U+0034  4   34  DIGIT FOUR
U+0035  5   35  DIGIT FIVE
U+0036  6   36  DIGIT SIX
U+0037  7   37  DIGIT SEVEN
U+0038  8   38  DIGIT EIGHT
U+0039  9   39  DIGIT NINE
James
+1  A: 

You can answer that question yourself: if they weren’t part of Unicode, this would rather drastically reduce the usefulness of Unicode, don’t you think?

Basically, any text that needs to use numbers couldn’t be represented using Unicode code points. (This is assuming that you don’t switch to and fro between different character encodings in one text: I don’t know a single software / programming language that supports this, and for good reason.)

If such questions crop up, you badly need to read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky. Seriously. Go read it.

Konrad Rudolph
Well, as many languages use arabic numbers (russian eg.), I was not sure if they are not taken from ASCII.
Petr
@Petr: As I said, you cannot switch encoding in mid-text! “taken from ASCII” thus makes no sense. The whole text, every character, must be representable in Unicode.
Konrad Rudolph
You're actually very likely to be using software that allows you to "switch to and fro between different character encodings in one text" *right now* - ISO 2022 is basically a meta-encoding that allows you to switch between sub-encodings via escape sequences, and it's supported by all common web browsers.
Michael Borgwardt
@Konrad Rudolph: I'd say that "taken from ASCII" here would mean "incorporated into Unicode during its design phase" rather than "switching encoding in mid-text"
Piskvor
@Michael: Interesting, never heard of that.
Konrad Rudolph
@Piskvor: But that interpretation makes even less sense.
Konrad Rudolph
@Konrad Rudolph: What do you mean? Unicode codepoints 0000-007E correspond to ASCII 0-127; it can be said that they were "taken from ASCII".
Piskvor
@Piskvor: Of course. But what then is Petr’s comment supposed to mean, in particular as a response to my text? I don’t see any logical connection. Hence my different reading of his comment.
Konrad Rudolph
+5  A: 

As already stated, Indo-Arabic numerals (0,1,..,9) are included in Unicode, inherited from ASCII. If you're talking about representation of numbers in other languages, the answer is still yes, they are also part of Unicode.

//numbers (0-9) in Malayalam (language spoken in Kerala, India)
൦ ൧ ൨ ൩ ൪ ൫ ൬ ൭ ൮ ൯  
//numbers (0-9) in Hindi (India's national language)
० १ २ ३ ४ ५ ६ ७ ८ ९ 

You can use \p{N} or \p{Number} in a regular expression to match any kind of numeric character in any script.

This document (Page-3) describes the Unicode code points for Malayalam digits.

Amarghosh
+1  A: 

In short: yes, of course. There are three categories in UNICODE containing various representations of digits and numbers:

  • Number, Decimal Digit (characters) – e.g. Arabic, Thai, Devanagari digits;
  • Number, Letter (characters) – e.g. Roman numerals;
  • Number, Other (characters) – e.g. fractions.
Bolo