views:

179

answers:

4

I'm building an automated RSS feed in ASP.NET and occurrences of apostrophes and hyphens are rendering very strangely:

"Here's a test" is rendering as "Here’s a test"

I have managed to circumvent a similar problem with the pound sign (£) by escaping the ampersand and building the HTML escape for £ manually as shown in in the extract below:

sArticleSummary = sArticleSummary.Replace("£", "£")

But the following attempt is failing to resolve the apostrophe issue, we stil get ’ on the screen.

sArticleSummary = sArticleSummary.Replace("’", "’"")

The string in the database (SQL2005) for all intents and purposes appears to be plain text - can anyone advise why what seem to be plain text strings keep coming out in this manner, and if anyone has any ideas as to how to resolve the apostrophe issue that'd be appreciated.

Thanks for your help.

[EDIT]

Further to Vladimir's help, it now looks as though the problem is that somewhere between the database and it being loaded into the string var the data is converting from an apostrophe to ’ - has anyone seen this happen before or have any pointers?

Thanks

A: 

I would just put "Here's a test" into a CDATA tag. Easy and it works.

<![CDATA[Here's a test]]>
Vladimir Kocjancic
Thank - just tested and it must be that the input string is populated with the ’ rather than the apostrophe because your example works well with hardcode, but as soon as the var is used, it still renders as ’
Chris
A: 

Transpires that the data (whilst showing in SQLServer plain) is actually carrying some MS Word special characters.

Chris
You will be better off handling the data correctly instead of trying to patch it. There are a huge number of characters that could be cause issues if imported from a Word document and you'd have to handle all of them.
devstuff
Chris
FYI: most of the decent JavaScript-based WYSIWYG editors have a "Paste from Word" function to de-Word-ify the content, so they may suit your input scenario (or lift the algorithm to do it server-side).
devstuff
+1  A: 

I would guess the the column in your SQL 2005 database is defined as a varchar(N), char(N) or text. If so the conversion is due to the database driver using a different code page setting to that set in the database.

I would recommend changing this column (any any others that may contain non-ASCII data) to nvarchar(N), nchar(N) or nvarchar(max) respectively, which can then contain any Unicode code point, not just those defined by the code page.

All of my databases now use nvarchar/nchar exclusively to avoid these type of encoding issues. The Unicode fields use twice as much storage space but there'll be very little performance difference if you use this technique (the SQL engine uses Unicode internally).

devstuff
We'll give this a go - the columns are indeed varchar.
Chris
A: 

Assuming you get Unicode-characters from the database, the easiest way is to let System.Xml.dll take care of the conversion for you by appending the RSS-feed with a XmlDocument object. (I'm not sure about the elements found in a rss-feed.)

  XmlDocument rss = new XmlDocument();
  rss.LoadXml("<?xml version='1.0'?><rss />");
  XmlElement element = rss.DocumentElement.AppendChild(rss.CreateElement("item")) as XmlElement;
  element.InnerText = sArticleSummary;

or with Linq.Xml:

  XDocument rss = new XDocument(
   new XElement("rss",
    new XElement("item", sArticleSummary)
   )
  );
Torbjörn Hansson