You can start by taking a look at the strip_tags function.
Konamiman
2009-11-24 08:23:09
What about htmlagilitypack
Similar thread available in stackoverflow
Try this function.
Dim pattern As String = "<(.|\n)*?>"
Return System.Text.RegularExpressions.Regex.Replace(strHtmlString, pattern, String.Empty).Trim()
I want to strip all tags, remove the [show][Hide] stuffs from wikipedia, or is there some website that makes pages in more readable format.
You should take a look at DBpedia, Wikipedia, but just the data.
You could use an HTML parser, BeautifulSoup (Python) or Simple HTML DOM for example. Or you could try using an XML parser.