Unicode versions in .NET

views:

215

answers:

Unicode versions in .NET

The documentation of CharUnicodeInfo.GetUnicodeCategory says:

Note that CharUnicodeInfo.GetUnicodeCategory does not always return the same UnicodeCategory value as the Char.GetUnicodeCategory method when passed a particular character as a parameter.

The CharUnicodeInfo.GetUnicodeCategory method is designed to reflect the current version of the Unicode standard. In contrast, although the Char.GetUnicodeCategory method usually reflects the current version of the Unicode standard, it might return a character's category based on a previous version of the standard, or it might return a category that differs from the current standard to preserve backward compatibility.

So, which version of the Unicode standard is reflected by CharUnicodeInfo.GetUnicodeCategory and Char.GetUnicodeCategory in which version of the .NET Framework?

This page has a wiki comment by Shawn Steele from microsoft, which I think should explain why using CharUnicodeInfo is preferred.

shahkalpesh 2009-07-16 02:45:22

That's not the question.

dtb 2009-07-16 02:50:21

But thats what it leads to. How does the Unicode version matter when it cannot work correctly. Did you try it with an example character (non-English) to find the difference instead?

shahkalpesh 2009-07-16 03:03:01

For example, the character `'\u0C58'` (http://www.fileformat.info/info/unicode/char/0c58/index.htm) was added in Unicode version 5.1.0 with category *Letter, Other*."The `CharUnicodeInfo.GetUnicodeCategory` method is designed to reflect the current version of the Unicode standard."But `CharUnicodeInfo.GetUnicodeCategory` returns `UnicodeCategory.OtherNotAssigned`. So it **does not** reflect the current Unicode version 5.1.0. Which version **does** it reflect?

dtb 2009-07-16 16:05:04

What do you expect it to return (if it returns OtherNotAssigned)?

shahkalpesh 2009-07-16 16:38:16

`OtherLetter` obviously.

dtb 2009-07-16 17:19:50

I found this page which says .net framework 1.1/2.0 supports Unicode version 3.1 (link: http://www.eggheadcafe.com/forumarchives/NETcsharp/Aug2005/post23858463.asp). Pardon my ignorance on this topic.

shahkalpesh 2009-07-17 06:09:19

The link in the answer of `devio` says Whidbey (i.e. Visual Studio 2005) is *4.1*. VS2005 was released alongside .NET Framework 2.0. So, one of the two (3.1 or 4.1) must be wrong...

dtb 2009-07-17 11:22:56

+1 A:

As far as I can tell, the unicode version isn't stored. The character lookup is implemented by storing the character info in an embedded resource called "charinfo.nlp" in mscorlib.dll, and this is used as a lookup table internally. There is a "version" property on the header to this lookup table data, but it is "0" in the binary data (offset 0x20), so I'm not sure what that's a version of, or if it's just not implemented.

codekaizen 2009-07-16 23:47:26

+1 A:

As Michael Kaplan states:

The version released by The Unicode Consortium.

Because there really is no definitive answer to this very non-specific question. The answer always depends entirely on the [usually one] specific issue that the person asking is looking for the answer to

So the polite answer in the end is IT DEPENDS ON WHAT YOU MEAN. CAN YOU ELABORATE A BIT?

devio 2009-07-16 23:49:40

*"Unicode properties ... depends on product version. Though Whidbey is 4.1 not 3.2 and Vista hasn't shipped yet but the latest CTP is 4.1."* Thanks for that link, that's what I'm looking for.

dtb 2009-07-17 11:15:17

While I'M still looking for the Unicode version numbers that previous .NET Framework versions conform to, the documentation for the String class on MSDN states the Unicode version that the .NET Framework 4 conforms to:

In the .NET Framework 4, sorting, casing, normalization, and Unicode character information has been modified so that it is synchronized with Windows 7 and conforms to the Unicode 5.1 standard.

dtb 2010-05-06 00:09:28

ansaurus

tags:

views:

answers:

Unicode versions in .NET

related questions