ansaurus

Question

Answer 1

+1 A:

Yes they are - codepoints 0030 to 0039, as you can see e.g. on decodeunicode.org

btw, codepoints 0000-007E are the same as ASCII (0-127, 128+ isn't ASCII anymore), so anything that you can find in ASCII you can find in Unicode.

Piskvor 2010-09-17 08:22:52

And note that unicode contains a lot more digits than just 0-9.

Hans Kesting 2010-09-17 08:26:36

@Hans Kesting: Indeed, e.g. subscripts and superscripts: http://www.decodeunicode.org/en/superscripts_and_subscripts , ancient Greek numbers: http://www.decodeunicode.org/en/ancient_greek_numbers and others

Piskvor 2010-09-17 08:30:39

Answer 2

+1 A:

The Unicode points below 128 are exactly the same as ASCII so, yes, they're at U+0030 through U+0039 inclusive.

paxdiablo 2010-09-17 08:23:43

Answer 3

+1 A:

Yes I think so: Information Taken From Here

U+0030  0   30  DIGIT ZERO
U+0031  1   31  DIGIT ONE
U+0032  2   32  DIGIT TWO
U+0033  3   33  DIGIT THREE
U+0034  4   34  DIGIT FOUR
U+0035  5   35  DIGIT FIVE
U+0036  6   36  DIGIT SIX
U+0037  7   37  DIGIT SEVEN
U+0038  8   38  DIGIT EIGHT
U+0039  9   39  DIGIT NINE

James 2010-09-17 08:24:27

Answer 4

+1 A:

You can answer that question yourself: if they weren’t part of Unicode, this would rather drastically reduce the usefulness of Unicode, don’t you think?

Basically, any text that needs to use numbers couldn’t be represented using Unicode code points. (This is assuming that you don’t switch to and fro between different character encodings in one text: I don’t know a single software / programming language that supports this, and for good reason.)

If such questions crop up, you badly need to read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky. Seriously. Go read it.

Konrad Rudolph 2010-09-17 08:26:01

Well, as many languages use arabic numbers (russian eg.), I was not sure if they are not taken from ASCII.

Petr 2010-09-17 08:27:51

@Petr: As I said, you cannot switch encoding in mid-text! “taken from ASCII” thus makes no sense. The whole text, every character, must be representable in Unicode.

Konrad Rudolph 2010-09-17 08:57:31

You're actually very likely to be using software that allows you to "switch to and fro between different character encodings in one text" *right now* - ISO 2022 is basically a meta-encoding that allows you to switch between sub-encodings via escape sequences, and it's supported by all common web browsers.

Michael Borgwardt 2010-09-17 09:19:47

@Konrad Rudolph: I'd say that "taken from ASCII" here would mean "incorporated into Unicode during its design phase" rather than "switching encoding in mid-text"

Piskvor 2010-09-17 09:35:40

@Michael: Interesting, never heard of that.

Konrad Rudolph 2010-09-17 10:01:41

@Piskvor: But that interpretation makes even less sense.

Konrad Rudolph 2010-09-17 10:02:45

@Konrad Rudolph: What do you mean? Unicode codepoints 0000-007E correspond to ASCII 0-127; it can be said that they were "taken from ASCII".

Piskvor 2010-09-17 11:02:45

@Piskvor: Of course. But what then is Petr’s comment supposed to mean, in particular as a response to my text? I don’t see any logical connection. Hence my different reading of his comment.

Konrad Rudolph 2010-09-17 11:39:40

Answer 5

+5 A:

As already stated, Indo-Arabic numerals (0,1,..,9) are included in Unicode, inherited from ASCII. If you're talking about representation of numbers in other languages, the answer is still yes, they are also part of Unicode.

//numbers (0-9) in Malayalam (language spoken in Kerala, India)
൦ ൧ ൨ ൩ ൪ ൫ ൬ ൭ ൮ ൯  
//numbers (0-9) in Hindi (India's national language)
० १ २ ३ ४ ५ ६ ७ ८ ९

You can use \p{N} or \p{Number} in a regular expression to match any kind of numeric character in any script.

This document (Page-3) describes the Unicode code points for Malayalam digits.

Amarghosh 2010-09-17 08:35:11

Answer 6

+1 A:

In short: yes, of course. There are three categories in UNICODE containing various representations of digits and numbers:

Number, Decimal Digit (characters) – e.g. Arabic, Thai, Devanagari digits;
Number, Letter (characters) – e.g. Roman numerals;
Number, Other (characters) – e.g. fractions.

Bolo 2010-09-17 09:13:58

ansaurus

tags:

views:

answers:

Is digit (number) part of unicode?

related questions