To expand on an answer someone else gave:
There are two possibilities:
- The file is really encoded as
UTF-8
, but is being interpreted by your xml parser as ISO-8859-1
.
- The file is really encoded as
ISO-8859-1
but is being interpreted by your xml parser as UTF-8
.
To determine which is which, look at what happens with the é
in Sébastien
. There are two possibilities I can imagine:
- "
é
" becomes two different characters - probably "é
"
- "
é
" becomes a single nonsense charact or "?
", and possibly the "b
" is also missing from the name Sébastien
.
In the first case, your file is not what you think it is. (It is getting to your program as UTF-8
data, but your program is trying to interpret it as ISO-8859-1
) Look at the xml file with a hex editor or something else that can show you what the bytes on the disk are.
In the second case, I'd check how the HTTP server on localhost is serving this file. (Your program is getting bytes in ISO-8859-1
format, but is interpreting them as UTF-8
) The easiest way to do that on windows is to open up a cmd
prompt, and run the command: telnet localhost 80
When that pops up a window, type the following line (or cut-and-paste from stackoverflow) and press enter twice. Warning: You won't be able to see what you're typing, and capitalization is important.
GET /Test/person.xml HTTP/1.0
In the response, look for a line beginning with Content-Type
. That will tell you how the webserver locally is serving up the file.
Update: Having looked at your file, it really is iso-8859-1, so what I would suggest is setting the .Encoding attribute of your Webclient
instance like so before you tell it to download the file:
client.Encoding = System.Text.Encoding.GetEncoding("iso-8859-1")
Alternatively, you could use the DownloadBytes
methods instead of the DownloadString
methods, and then parse the bytes into an xml file. The problem currently is that by the time the xml parser gets the file contents, the bytes have already been interpreted as a string, so it's too late to change the encoding there.