views:

23

answers:

1

I have an atom feed on a wordpress blog here: http://blogs.legalview.info/auto-accidents/feed/atom

When I download the text of the file and display it on my site, I get strange charactes like the accented 'A' here:

Recent studies are showing that car accident -related fatalities have declined almost 10% since 2008. The reason for this

I am using the following code in my C# web application to download the feed:

        WebClient client = new WebClient();
        client.Headers.Add(@"Accept-Language: en-US,en          
                           Accept-Charset: utf-8");
        string xml_text = client.DownloadString(_atom_url);

And xml_text.Contains("Â") returns true, but if I download the feed in my browser no such  exists. I'm pretty sure this is a character set issue, but I can't figure out why. By examining client.ResponseHeaders, I can see it is in fact downloading text in utf-8, and the response on my .Net site is UTF-8 as well, so I can't figure out why the weirdness appears

+1  A: 

I get ...fatalities when I force my browser to interpret the feed as ISO-8859-1 instead of UTF-8 (which definitely is the correct character set for the feed.)

I'm pretty sure either your WebClient somehow defaults to ISO-8859-1, or the output encoding on your site is ISO-8859-1, which obviously garbles the UTF-8 input.

Maybe start checking your site's output first. If that definitely is UTF-8, take a look at the WebClient.

Pekka
This got me on the right track. `client.Encoding = Encoding.UTF8;` fixed it. `client.Headers.Add(@"Accept-Language: en-US,en Accept-Charset: utf-8");` was unnecessary and insufficient
Tristan Havelick