I just tried my XElement.Parse solution. I created an extension method on the string class so I can reuse the code easily:
public static bool ContainsXHTML(this string input)
{
try
{
XElement x = XElement.Parse("<wrapper>" + input + "</wrapper>");
return !(x.DescendantNodes().Count() == 1 && x.DescendantNodes().First().NodeType == XmlNodeType.Text);
}
catch (XmlException ex)
{
return true;
}
}
One problem I found was that plain text ampersand and less than characters cause an XmlException and indicate that the field contains HTML (which is wrong). To fix this, the input string passed in first needs to have the ampersands and less than characters converted to their equivalent XHTML entities. I wrote another extension method to do that:
public static string ConvertXHTMLEntities(this string input)
{
// Convert all ampersands to the ampersand entity.
string output = input;
output = output.Replace("&", "amp_token");
output = output.Replace("&", "&");
output = output.Replace("amp_token", "&");
// Convert less than to the less than entity (without messing up tags).
output = output.Replace("< ", "< ");
return output;
}
Now I can take a user submitted string and check that it doesn't contain HTML using the following code:
bool ContainsHTML = UserEnteredString.ConvertXHTMLEntities().ContainsXHTML();
I'm not sure if this is bullet proof, but I think it's good enough for my situation.