tags:

views:

374

answers:

3

I have the following test which seems to produce same strings but Assert.AreEqual fails.

[TestMethod]
public void Decompressed_test_should_equal_to_text_before_compression()
{
    TextCompressor compressor = new TextCompressor();
    Random r = new Random((int)DateTime.Now.Ticks);

    for (int i = 500; i < 1500; i++)
    {
        char[] testArray = new char[i];

        for (int j = 0; j < i; j++)
        {                    
            char randomChar = (char)(r.Next(256, 65536));
            testArray[j] = randomChar;
        }
        string testString = new String(testArray);
        string compressed = compressor.Compress(testString);
        string decompressed = compressor.Decompress(compressed);

        Assert.AreEqual(testString.Length, decompressed.Length);
        Assert.AreEqual(testString, decompressed, false, CultureInfo.InvariantCulture);
    }
}

compressor.Compress and compressor.Decompress does some compression and decompression with GZipStream.

It passes if I try (65, 90) instead of (256, 65536) so I am guessing it has something to do with unicode. I tried CurrentCulture and no culture at all instead of InvariantCulture and it still fails. But the resulting strings appear to be the same:

Assert.AreEqual failed.

Expected:

<☔ฺ疉鎷얚�᧏跨꿌沩얫嘹֨ز항們嵜浮䑹ﴰ靄斳薃픢萁⯬쫎ʛ�⫕蝺ꄗ穌넢뇌䶆멊큀퉆䐫̥괊⑆놸僥̅ᵀ㣚ꢅ뺓䇚녚伀讍홬䈕�캾撏Ჴ孢黮摠뮡䌦윃ᬳ狚䆙툾훶䏤ꛈṻ⟧㉖鮸蒵萗냤퇅서㪨瀲鰪残䓴ﯘ넃櫜㑦䢻쮓죣䕱䶘㴝姳뿝嘼ᷨ㗬꺬櫣涷꠶浒껅က㷕䩉毎覛�⧹䮯嬇힚艐Ὑ쇕횻鸙蹻硐���䈆쓖⸛錼鰙ኰ乒֐⺴썓힠䵓ꅄⵈ桃怅㾈枟⏠ﻪ폫ﺍ琖ퟰ乼�쩐鑈푷᫇﯎蕱늛�쭡�䙠ⲓᒇꪮ툅⃑ꦴ돻♹ᢋ麝熪뚭Ћ䌚�娯钮⡃㪿ᅰ⤩㥍車䎘磛蚾ㅸ擫떦蝳分鰽䠺ꭍ튘폻⥽ⳉ历⹼驿똮�⯴⋟Ḋ᛼룴꣜墭䐣앾郢�ᵸᮄ杗奪騑硼佑烑鄗䳘핬溴墽炁ࣘヲ栥풼ಃ斗狹就쵎⃺嬒瀃碂밎崹䎐貇஛汫踖�뢸숥퍞르뗿䭯䖝䱅�䵱꽔븽䢴ꁅ⟼�蒠癸ꩽ靔临䚝﹗⩏￸鍁Ꮨ䷇쁐쨒ʊ쪦鄭借滋铆ᮉ嚃ᩨ⶝ိ펇ꮼ뇄』ᰉ㕾枒鯅蛺䠿櫄築픆车똅렬㈆ﹼἋ荞괋랆偦뤰䝷핸⹝屑素蝨怀猔勛碉퀪睹�Ⓥ䍙ಗ䤮뾿谢ꁼ戻ﮖᆯ콧偪ﺯ븭碇쮢籍⁜왋壝駡暷샖଄ࣵ艫᜝䃴厫ᢉ慨䁆ꂴ೉溘欋옭螶䦗跠﨔膉痹邘⋫吪멚埣ꯕ扌옘广犵肖街�㶕畅몡ↇ꠫襤픧ၥ帻놤ਰ惘똞颤糴쫼鿋䬝穫⺁峁踷锝副鰀嗊⹀谲遲�䩢푑팾��糔뭯዇ࣷ䷴䬾갭ⶵ᾿틩魨㵻恬҅པᣄⲪ豩뛌꛵㥨몙〼△⏮큤�亃ꢡ웼ఐ칇뻻펂㢓吋䂃䨠䕱>.

Actual:

<☔ฺ疉鎷얚�᧏跨꿌沩얫嘹֨ز항們嵜浮䑹ﴰ靄斳薃픢萁⯬쫎ʛ�⫕蝺ꄗ穌넢뇌䶆멊큀퉆䐫̥괊⑆놸僥̅ᵀ㣚ꢅ뺓䇚녚伀讍홬䈕�캾撏Ჴ孢黮摠뮡䌦윃ᬳ狚䆙툾훶䏤ꛈṻ⟧㉖鮸蒵萗냤퇅서㪨瀲鰪残䓴ﯘ넃櫜㑦䢻쮓죣䕱䶘㴝姳뿝嘼ᷨ㗬꺬櫣涷꠶浒껅က㷕䩉毎覛�⧹䮯嬇힚艐Ὑ쇕횻鸙蹻硐���䈆쓖⸛錼鰙ኰ乒֐⺴썓힠䵓ꅄⵈ桃怅㾈枟⏠ﻪ폫ﺍ琖ퟰ乼�쩐鑈푷᫇﯎蕱늛�쭡�䙠ⲓᒇꪮ툅⃑ꦴ돻♹ᢋ麝熪뚭Ћ䌚�娯钮⡃㪿ᅰ⤩㥍車䎘磛蚾ㅸ擫떦蝳分鰽䠺ꭍ튘폻⥽ⳉ历⹼驿똮�⯴⋟Ḋ᛼룴꣜墭䐣앾郢�ᵸᮄ杗奪騑硼佑烑鄗䳘핬溴墽炁ࣘヲ栥풼ಃ斗狹就쵎⃺嬒瀃碂밎崹䎐貇஛汫踖�뢸숥퍞르뗿䭯䖝䱅�䵱꽔븽䢴ꁅ⟼�蒠癸ꩽ靔临䚝﹗⩏￸鍁Ꮨ䷇쁐쨒ʊ쪦鄭借滋铆ᮉ嚃ᩨ⶝ိ펇ꮼ뇄』ᰉ㕾枒鯅蛺䠿櫄築픆车똅렬㈆ﹼἋ荞괋랆偦뤰䝷핸⹝屑素蝨怀猔勛碉퀪睹�Ⓥ䍙ಗ䤮뾿谢ꁼ戻ﮖᆯ콧偪ﺯ븭碇쮢籍⁜왋壝駡暷샖଄ࣵ艫᜝䃴厫ᢉ慨䁆ꂴ೉溘欋옭螶䦗跠﨔膉痹邘⋫吪멚埣ꯕ扌옘广犵肖街�㶕畅몡ↇ꠫襤픧ၥ帻놤ਰ惘똞颤糴쫼鿋䬝穫⺁峁踷锝副鰀嗊⹀谲遲�䩢푑팾��糔뭯዇ࣷ䷴䬾갭ⶵ᾿틩魨㵻恬҅པᣄⲪ豩뛌꛵㥨몙〼△⏮큤�亃ꢡ웼ఐ칇뻻펂㢓吋䂃䨠䕱>.

What am I missing?

+1  A: 

Use byte not char.

Your Compress/Decompress methods should take a byte[] array, and whatever calls them should read your Unicode data and translate it before calling them.

You are aware that .NET 2.0 onwards contains the GZipStream class?

Mitch Wheat
As you will notice, I am using char array only for initializing the string. Compress and Decompress methods never work with chars. They use byte arrays internally.
Serhat Özgel
Text is char[], not byte[]. Byte[] should only be used *internally*. So I disagree that it should be used in the test method.
bzlm
@buyutec : As we can't see your implementation of those methods, it's a bit hard to tell that they use byte arrays internally...
Mitch Wheat
@Mitch Wheat You are right, that's my bad. But since it is legacy code and is a bit messy, I did not want to include it. Turns out the problem was not the compress and decompress methods btw.
Serhat Özgel
+2  A: 

(char)(r.Next(256, 65536)) can produce invalid characters, so you can't use it to create test content. If you want to generate sample text from all Unicode ranges, you have to be Unicode aware when you create it, and not just cast random to char. (I think you hit upon this when you stated in the question that it worked when you supplied a more narrow range for the random number.)

bzlm
You are right, when I used some hand crafted characters instead, it worked. Thanks.
Serhat Özgel
I'm not convinced !!! "(char)(r.Next(256, 65536)) can produce invalid characters". How is that? You can cast any int in this range to a Unicode char as the *char* docs say (U+0000 to U+ffff -> Unicode 16-bit character)
bruno conde
+1  A: 

Made some experiments:

string testString = new String(testArray);
string anotherString = new String(testArray);
Assert.AreEqual(testString.Length, anotherString.Length);
Assert.AreEqual(testString, anotherString, false, CultureInfo.InvariantCulture);

This is without compression. It works fine.

I suggest you to change your test to this:

for (int i = 256; i < 65536; i++)
{
  string testString = new String((char)(i), 2);

  string compressed = compressor.Compress(testString);
  string decompressed = compressor.Decompress(compressed);

  Assert.AreEqual(testString.Length, decompressed.Length);
  Assert.AreEqual(testString, decompressed, false, CultureInfo.InvariantCulture);
}

This tests exactly one character at a time, you don't have random values (no "sometimes-works" problem) and you'll see if there is a certain kind of characters that is not working.

Stefan Steinegger