Hi buddy, simple as:
public static string StripTags2(string html)
{
return html.Replace("<", "<").Replace(">", ">");
}
By this you escape all "<" and ">" in a string. Is this what you want?
Hi buddy, simple as:
public static string StripTags2(string html)
{
return html.Replace("<", "<").Replace(">", ">");
}
By this you escape all "<" and ">" in a string. Is this what you want?
If you have data that has HTML tags and you want to display it so that a person can SEE the tags, use HttpServerUtility::HtmlEncode.
If you have data that has HTML tags in it and you want the user to see the tags rendered, then display the text as is. If the text represents an entire web page, use an IFRAME for it.
If you have data that has HTML tags and you want to strip out the tags and just display the unformatted text, use a regular expression.
If you are talking about tag stripping, it is relatively straight forward if you don't have to worry about things like <script>
tags. If all you need to do is display the text without the tags you can accomplish that with a regular expression:
<[^>]*>
If you do have to worry about <script>
tags and the like then you'll need something a bit more powerful then regular expressions because you need to track state, omething more like a Context Free Grammar (CFG). Althought you might be able to accomplish it with 'Left To Right' or non-greedy matching.
If you can use regular expressions there are many web pages out there with good info:
If you need the more complex behaviour of a CFG I would suggest using a third party tool, unfortunately I don't know of a good one to recommend.
Depends on what you mean by "html." The most complex case would be complete web pages. That's also the easiest to handle, since you can use a text-mode web browser. See the Wikipedia article listing web browsers, including text mode browsers. Lynx is probably the best known, but one of the others may be better for your needs.
HTTPUTility.HTMLEncode()
is meant to handle encoding HTML tags as strings. It takes care of all the heavy lifting for you. From the MSDN Documentation:
If characters such as blanks and punctuation are passed in an HTTP stream, they might be misinterpreted at the receiving end. HTML encoding converts characters that are not allowed in HTML into character-entity equivalents; HTML decoding reverses the encoding. For example, when embedded in a block of text, the characters
<
and>
, are encoded as<
and>
for HTTP transmission.
HTTPUtility.HTMLEncode()
method, detailed here:
public static void HtmlEncode(
string s,
TextWriter output
)
Usage:
String TestString = "This is a <Test String>.";
StringWriter writer = new StringWriter();
Server.HtmlEncode(TestString, writer);
String EncodedString = writer.ToString();
I hope that helps.
The free and open source HtmlAgilityPack has a method:
var plainText = ConvertToPlainText(string html);
Feed it an HTML string like
<b>hello world!</b><br /><i>it is me! !</i>
And you'll get a plain text result like:
hello world!
it is me!