What's the internal format of a .NET String?

views:

219

answers:

+4 Q:

What's the internal format of a .NET String?

I'm making some pretty string-manipulation-intensive code in C#.NET and got curious about some Joel Spolsky articles I remembered reading a while back:

http://www.joelonsoftware.com/articles/fog0000000319.html
http://www.joelonsoftware.com/articles/Unicode.html

So, how does .NET do it? Two bytes per char? There ARE some Unicode chars^H^H^H^H^H code points that need more than that. And how is the length encoded?

+11 A:

Before Jon Skeet turns up here is a link to his excellent blog on strings in C#.

In the current implementation at least, strings take up 20+(n/2)*4 bytes (rounding the value of n/2 down), where n is the number of characters in the string. The string type is unusual in that the size of the object itself varies

John Nolan 2009-06-19 16:42:33

Bah humbug. Not a lot more for me to say, really :)

Jon Skeet 2009-06-19 16:46:17

That'll teach you to blog!

John Nolan 2009-06-19 16:47:02

It's not actually on my blog - it's on my articles site :) I think I ought to negotiate some sort of rep-sharing scheme. Pity a poor blogger/article poster...

Jon Skeet 2009-06-19 16:51:19

@Jon: rep-sharing for the poor would involve redistributing your points ;)

Jimmy 2009-06-19 16:52:50

@Jon: Don't be sad, I think there ARE a few things left to say. ;) The "memory usage" section of that post is more or less the answer to my question (plus the interesting tidbit about over-allocation), but how is(are) the length(s) stored? Signed dword? Unsigned word? Endianness?

JCCyC 2009-06-19 16:55:36

+4 A:

.NET uses UTF-16.

From System.String on MSDN:

"Each Unicode character in a string is defined by a Unicode scalar value, also called a Unicode code point or the ordinal (numeric) value of the Unicode character. Each code point is encoded using UTF-16 encoding, and the numeric value of each element of the encoding is represented by a Char object."

Reed Copsey 2009-06-19 16:44:52

ansaurus

tags:

views:

answers:

What's the internal format of a .NET String?

related questions