Take a look at A List Apart, as I suggested in HTML Apostrophe question.
The em dash — is represented by —
.
Take a look at A List Apart, as I suggested in HTML Apostrophe question.
The em dash — is represented by —
.
As this character is not an ASCII character, how do I encode it?
It's not an ASCII character, but it is a Unicode character, U+2014. If your page output is going to be UTF-8, which in this day and age it really should be, you don't need to HTML-encode it, just output the character directly.
Are there other characters which are likely to give me problems.
What problems exactly is it giving you? If you can't output '—', you probably can't output any other non-ASCII Unicode character, which is thousands of them.
Replace "\u2014" with "& #x2014;" if you really must, but really with today's Unicode-aware tools there should be no need to go around replacing every non-ASCII Unicode character with markup.
Bobince's answer gives a solution to what seems to be your main concern : replacing your use of HtmlDecode by a more straightforward declaration of the char to replace.
Rewrite
sWebsiteText.Replace(HttpUtility.HtmlDecode("–"), "–")
as
sWebsiteText.Replace("\u2013", "–")
('\u2014' (dec 8212) is em dash, '\u2013' (dec 8211) is en dash.)
For readability purpose it may be considered better to use "–" rather than "–", since the .Net declaration for the char ("\u2013") is in hex too. But, as decimal notation seems more common in html, I personaly would prefer using "–".
For reuse purpose, you probably should write your own HtmlEncode function declared in a custom HttpUtility, in order to be able to call it from anywhere else in your site without duplicating it.
(Have something like (sorry I have written it in C#, forgetting your examples were in VB):
/// <summary>
/// Supplies some custom processing to some HttpUtility functions.
/// </summary>
public static class CustomHttpUtility
{
/// <summary>
/// Html encodes a string.
/// </summary>
/// <param name="input">string to be encoded.</param>
/// <returns>A html encoded string.</returns>
public static string HtmlEncode(string input)
{
if (intput == null)
return null;
StringBuilder encodedString = new StringBuilder(
HttpUtility.HtmlEncode(input));
encodedString.Replace("\u2013", "–");
// add over missing replacements here, as for —
encodedString.Replace("\u2014", "—");
//...
return encodedString.ToString();
}
}
Then replace
sWebsiteText = _
"<![CDATA[" & _
HttpUtility.HtmlEncode(sSomeText) & _
"]]>"
'This is the bit which seems "hacky"'
sWebsiteText = _
sWebsiteText.Replace(HttpUtility.HtmlDecode("–"), "–")
With:
sWebsiteText = _
"<![CDATA[" & _
CustomHttpUtility.HtmlEncode(sSomeText) & _
"]]>"
)