ansaurus

Question

Space in a .NET string returned by string.Format does not match space declared in source code - multiple representations?

Answer 1

+10 A:

I suspect your current culture is using an interesting "thousands" separator - U+00A0, which is the non-breaking space character. That's not an entirely unreasonable thousands separator, to be honest... it means you shouldn't get text like this displayed:

The size of the file is 1
023 bytes.

Instead you'd get

The size of the file is
1 023 bytes.

On my box, I get "1,023" instead. Do you want your FormatSize method to use the current culture, or a specific one? If it's the current culture, you should probably make your unit test specify the culture. I have a couple of wrapper methods I use for this:

internal static void WithInvariantCulture(Action action)
{
    WithCulture(CultureInfo.InvariantCulture, action);
}

internal static void WithCulture(CultureInfo culture, Action action)
{
    CultureInfo original = Thread.CurrentThread.CurrentCulture;
    try
    {
        Thread.CurrentThread.CurrentCulture = culture;
        action();
    }
    finally
    {
        Thread.CurrentThread.CurrentCulture = original;
    }            
}

so I can run:

WithInvariantCulture(() =>
{
    // Body of test
};

etc.

If you want to test for the exact string you're getting, you can use:

Assert.AreEqual("1\u00A0023 Bytes", size1023);

Jon Skeet 2009-09-25 08:26:38

Thanks Jon, great explanation!

Marek 2009-09-25 08:43:22

Answer 2

+4 A:

Unicode 160 in UTF8 is not represented by the single byte 160, but by two bytes. And without checking, I'd wager those to be 194 + 160.

In fact, any Unicode codepoint beyond 127 is represented by more than one byte.

And I guess that your CultureInfo uses a non-breaking space (160) as a thousands grouping separator, and not a simple space (32) like you type yourself.

Ruben 2009-09-25 08:27:52

Answer 3

+2 A:

194, 160 is utf8 for codepoint 160: the non-breaking space -   in html.

That makes sense, you don't want a single number to be considered several words.

In short, your test revealed a flawed assumption - great! However, in terms of a unit test, your test has issues; you should always include a CultureInfo object when converting to and from strings - otherwise your unit tests may fail depending on the logged-in user's culture settings. You expect a particular form of string formatting - make sure you explicitly state which CultureInfo you're expecting.

Eamon Nerbonne 2009-09-25 08:30:17

Thanks for your comment, the unit test is actually only part of regression testing before refactoring and I am including it here only for illustration of the problem, it is not an actual production unit test :)

Marek 2009-09-25 08:45:19

Answer 4

+1 A:

160 is a non breaking space, which sort of makes sense, cause you wouldn't want your number to be split between rows. But 194... Oh yeah. UTF8 doublebytes.

J. Steen 2009-09-25 08:31:27

Answer 5

A:

First of all, all strings in .NET are Unicode, so getting UTF8 bytes is useless. Second of all, when comparing strings you should specify culture info and when using string.format you should use an IFormatProvider. This way you control what characters are used in these functions.

Jonathan van de Veen 2009-09-25 08:31:47

Answer 6

+2 A:

Maybe you could change the test string in the Assert.Equal method to use CultureInfo.CurrentCulture.NumberFormat.NumberGroupSeparator instead of a single space character?

Konamiman 2009-09-25 08:31:49

ansaurus

tags:

views:

answers:

Space in a .NET string returned by string.Format does not match space declared in source code - multiple representations?

related questions