+2  A: 

You can start by taking a look at the strip_tags function.

looks cool, Is there something in C# or some sort of webservice too, as I don't want to direct each page request to my webservers.
Priyank Bolia

What about htmlagilitypack


Similar thread available in stackoverflow

Is there a Wikipedia API?

Try this function.

Dim pattern As String = "<(.|\n)*?>"
Return System.Text.RegularExpressions.Regex.Replace(strHtmlString, pattern, String.Empty).Trim()
Bad choice, regex is not used for HTML parsing. There are lot of question and internet articles for details. http://www.codinghorror.com/blog/archives/001311.html
Priyank Bolia
that would create another problem in its own, how to create a webpage using the XML, then I have to write even bigger code to generate the html from the parsed XML
Priyank Bolia

I want to strip all tags, remove the [show][Hide] stuffs from wikipedia, or is there some website that makes pages in more readable format.

You should take a look at DBpedia, Wikipedia, but just the data.


doesn't look the right thing, its more like semantic webpage, it just have the heading and the links and meta info about the articles. I don't need the metainfo or semantic info, I need a very simple webpage that is similar to text file without much tags except image, paragraphs, etc.
Priyank Bolia

You could use an HTML parser, BeautifulSoup (Python) or Simple HTML DOM for example. Or you could try using an XML parser.

I think the simple HTML DOM looks the best, easy and extensible.
Priyank Bolia