Hi, I know unicode contains all characters from most world aphabets..but what about digits? Are they part of unicode or not? I was not able to find straight answer. Thanks
Yes they are - codepoints 0030 to 0039, as you can see e.g. on decodeunicode.org
btw, codepoints 0000-007E are the same as ASCII (0-127, 128+ isn't ASCII anymore), so anything that you can find in ASCII you can find in Unicode.
The Unicode points below 128 are exactly the same as ASCII so, yes, they're at U+0030 through U+0039 inclusive.
Yes I think so: Information Taken From Here
U+0030 0 30 DIGIT ZERO
U+0031 1 31 DIGIT ONE
U+0032 2 32 DIGIT TWO
U+0033 3 33 DIGIT THREE
U+0034 4 34 DIGIT FOUR
U+0035 5 35 DIGIT FIVE
U+0036 6 36 DIGIT SIX
U+0037 7 37 DIGIT SEVEN
U+0038 8 38 DIGIT EIGHT
U+0039 9 39 DIGIT NINE
You can answer that question yourself: if they weren’t part of Unicode, this would rather drastically reduce the usefulness of Unicode, don’t you think?
Basically, any text that needs to use numbers couldn’t be represented using Unicode code points. (This is assuming that you don’t switch to and fro between different character encodings in one text: I don’t know a single software / programming language that supports this, and for good reason.)
If such questions crop up, you badly need to read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky. Seriously. Go read it.
As already stated, Indo-Arabic numerals (0,1,..,9) are included in Unicode, inherited from ASCII. If you're talking about representation of numbers in other languages, the answer is still yes, they are also part of Unicode.
//numbers (0-9) in Malayalam (language spoken in Kerala, India)
൦ ൧ ൨ ൩ ൪ ൫ ൬ ൭ ൮ ൯
//numbers (0-9) in Hindi (India's national language)
० १ २ ३ ४ ५ ६ ७ ८ ९
You can use \p{N}
or \p{Number}
in a regular expression to match any kind of numeric character in any script.
This document (Page-3) describes the Unicode code points for Malayalam digits.
In short: yes, of course. There are three categories in UNICODE containing various representations of digits and numbers:
- Number, Decimal Digit (characters) – e.g. Arabic, Thai, Devanagari digits;
- Number, Letter (characters) – e.g. Roman numerals;
- Number, Other (characters) – e.g. fractions.