tags:

views:

97

answers:

3

So I was beating my head against a wall for a while before it dawned on me. I have some code that saves a list of names into a text file ala ...

System.IO.File.WriteAllLines(dlg.FileName, this.characterNameMasterList.Distinct().ToArray());

The character names can contain special characters. These names come from the wow armory at www.wowarmory.com There are about 26000 names or so saved in the .txt file.

The names get saved to the .txt file just fine. I wrote another application that reads these names from that .txt file using this code

// download the names from the db
var webNames = this.DownloadNames("character");

// filter names and get ones that need to be added to the db
var localNames = new List<string>(System.IO.File.ReadAllLines(dlg.FileName));
foreach (var name in webNames)
{
    if (localNames.Contains(name.Trim())) localNames.Remove(name);
}

return localNames;

... the code downloads a list of names from my website that are already in the db. Then reads the local .txt file and singles out every name that is not yet in the db so it can later add it. The names that get read from the .txt file also get read just fine with no problems.

The problem comes in when removing names from the localNames list. localNames is a List type. As soon as localNames.Remove(name) gets called any names in the list that had special characters in them would get corrupted and be converted into ? characters. See for screen cap http://yfrog.com/12badcharsp

So i tried doing it another way using ...

 // download the names from web that are already in the db
var webNames = this.DownloadNames("character");

// filter names and get ones that need to be added to the db
var localNames = new List<string>(System.IO.File.ReadAllLines(dlg.FileName));
int index = 0;
while (index < webNames.Count)
{
    var name = webNames[index++];

    var pos = localNames.IndexOf(name.Trim());
    if (pos != -1) localNames.RemoveAt(pos);
}

return localNames;

.. But using localNames.RemoveAt also corrupts the items in the list converting special characters into ?.

So is this a known bug with the List.remove methods? Does anyone know? Has anyone else had this problem? I also used .NET Reflector to disassemble/inspect the list.remove and list.RemoveAt code and it appear to be calling some external Copy function.

Aside from the fact that this is prob not the best way to get a unique list of items from 2 lists am I missing something or should be aware of when using the List.Remove methods ?

I am running windows 7 vs2010 and my app is set for .net 4 (no client profile )

+4  A: 

Try forcing UTF-8 when you retrieve the names, when you save the names and when you read the names.

This might solve your problem.

Edit: while this suggestion may seem ambiguous - you simply need to ensure that your website is serving UTF-8 (most likely the case) and in all of your File operations, simply add Encoding.UTF-8. You will notice that all File methods have overloads that accept an Encoding.

Sky Sanders
+2  A: 

99 out 100 times you think there's a bug in the compiler or the framework... there's not. It's usually just your understanding of the problem that is faulty. That's not to say the framework is without bugs, but most of time what you think is a bug really isn't.

Based on your description, It seems more like the tool you're using to view the data is giving you the wrong view, not that the actual characters have been converted to question marks. Did you examine the hex code to see if the hex codes are equivelent to the encodings question mark character? It doesn't seem like you did.

Mystere Man
A: 

Turned out that the problem was with bad names in the *.txt file. As for how those bad characters ended up in the names in that list is another thing. The only other remote possibility is that maybe because I was using the Distinct method to filter out duplicates.

This problem had me confused for some time because the data IE: the character names were coming from the wowarmory.com site that uses utf-8 encoding. And because it was coming from that site I assumed I could trust it. Blizz restricts what characters you can use in a characters name.

I am still trying to nail down this issue in my code. But with so much data and only very few entries in that data being corrupted it's a real pain to track down the problem. Especially when the code works perfectly on over 99% of the entries.

Dean Lunz