views:

690

answers:

5

I have two computers. Both running WinXP SP2 (I don't really know ho similar they are beyond that). I am running MS Visual C# 2008 express edition on both and that's what I'm currently using to program.

I made an application that loads in an XML file and displays the contents in a DataGridView.

The first line of my xml file is:

<?xml version="1.0" encoding="utf-8"?>

...and really... it's utf-8 (at least according to MS VS C# when I just open the file there).

I compile the code and run it on one computer, and the contents of my DataGridView appears normal. No funny characters. I compile the code and run it on the other computer (or just take the published version from computer #1 and install it on computer #2 - I tried this both ways) and in the datagridview, where there are line breaks/new lines in the xml file, I see funny square characters.

I'm a novice to encoding... so the only thing I really tried to troubleshoot was to use that same program to write the contents of my xml to a new xml file (but I'm actually writing it to a text file, with the xml tags in it) since the default writing to a text file seems to be utf-8. Then I read this new file back in to my program. I get the same results.

I don't know what else to do or how to troubleshoot this or what I might fundamentally be doing wrong in the first place.

-Adeena

+4  A: 

This doesn't have to do with UTF-8 or character encodings - this problem has to do with line endings. In Windows, each line of a text file ends in the two characters carriage-return (CR) and newline (LF, for line feed), which are code points U+000D and U+000A respectively. In ASCII and UTF-8, these are encoded as the two bytes 0D 0A. Most non-Windows systems, including Linux and Mac OS X, on the other hand, uses just a newline character to signal end-of-line, so it's not uncommon to see line ending problems when transferring text files between Windows and non-Windows systems.

However, since you're using just Windows on both systems, this is more of a mystery. One application is correctly interpreting the CRLF combination as a newline, but the other application is confused by the CR. Carriage returns are not printable characters, so it replaces the CR with a placeholder box, which is what you see; it then correctly interprets the line feed as the end-of-line.

Adam Rosenfield
I understand exactly what you're saying... just not sure how to figure out what's different between my two computers and what I should be doing to make sure each computer interprets it correctly.
adeena
If transferring the file causes the problem you should be able to see the changes (e.g. changes in file size, different MD5 checksums, difference when viewing with a hex viewer/editor).
mweerden
+2  A: 

The square usually appears when you use different types of newlines.

  • Linux - (0A) LF
  • Win - (0D0A) CRLF
  • Mac - (0D) CR

The app was probably created using 1 type and the running app is expecting another.


Check out Environment.NewLine

And, you might try this: (no guarantees -- I don't write much C#)

strInput = Regex.Replace(strInput, "\\r?\\n?", Environment.NewLine)
Jonathan Lonowski
But they are both Win machines... ???
adeena
Sorry. Those are only "most-common" -- they're not required. So, they can still be mixed/switched when saving.
Jonathan Lonowski
I agree. the xml was created with diffent escape secuence. \r\n is windows default. Linux for example only uses uses \n ...
Igor Zelaya
A: 

Update: I've been fiddling with all the ideas in the answers and comments and no dice. I've been reading in my xml document as an XDocument and writing out the encoding to verify that the software thinks it's utf-8. It does.

...and when I write out my data back to a text file, the default is utf-8, but I was also able to select that and made it so.

I also tried the RegEx replacement idea.

No matter what I tried, I still see the same character issues.

It seems that the only place this is an issue is within the DataGridView display. My xml doc as a text file looks fine. It behaves in other applications the way it is meant to as an xml document. and when I use the data in the DataGridView in another windows form (I copy the contents and put it in a TextBox, there's no code problems).

Maybe my xml doc really is indeed just fine and really is utf-8.

Maybe I should be focusing more on the DataGridView display. anyone have any ideas here?

I checked on a third windows computer (although this one runs Vista) and my application runs fine... no weird boxes. It's just the one computer where it doesn't...

Is there anything about my computer... are there any MS Visual Runtime settings or something that might be different between computers?

-Adeena

adeena
This has nothing to do with character encodings or UTF-8, as I've already mentioned. How are you loading the XML document, and how are you loading the document into the DataGridView?
Adam Rosenfield
A: 

@ Adam: Sorry! Missed your earlier statement.

To load the document into the program and display in the DataGridView, I am currently doing (I say "currently", because I tried other things like use XDocument instead of Xelement):

XElement xe1 = XElement.Load(filePath);

DataTable myTable = new DataTable();
myTable = mkTable();   // calls a function that makes the table
var _categories = (from p1 in xe1.Descendants("category") select p1);
int numCat = _categories.Count();
int i = 0;

while (i < numCat)
{
    DataRow newrow;
    newrow = myTable.NewRow();

    if (_categories.ElementAt(i).Parent.Name == "topic")
    {
        string att1 = _categories.ElementAt(i).Parent.Attribute("name").Value.ToString();
        newrow["topic"] = att1.ToString();
    }
    // repeat the above for the different things in my document
    myTable.Rows.Add(newrow);

    i++;
}
myDataSet.Merge(myTable);
bindingSourceIn.DataSource = myDataSet;
myDataGridView.DataSource = bindingSourceIn;
myDataGridView.DataMember = "xmlthing";

(obviously things are a little abbreviated here... i.e., my bindingsource/datagridview etc is declared elsewhere.... but hopefully this is enough to make sense)

-Adeena

adeena
+1  A: 

I'm not sure of the cause of your problem, but one solution would be to to just strip out the carriage returns from your strings. For every string you add, just call TrimEnd(null) on it to remove trailing whitespace:

newrow["topic"] = att1.ToString().TrimEnd(null);

If your strings might end in other whitespace (i.e. spaces or tabs) and you want to keep those, then just pass an array containing only the carriage return character to TrimEnd:

newrow["topic" = att1.ToString().TrimEnd(new Char[]{'\r'});

Disclaimer: I am not a C# programmer; the second statement may be syntactically incorrect

Adam Rosenfield
I think you're on to something with the trims... that's working... Thanks!!!
adeena
It works... and I'm also using string.ToString().Replace("\r",""). The only issue there is that I can dblclick item in DataGridView and edit it, and if I enter a carriage return, it comes back when I go back to DataGridView. <sigh>
adeena