ansaurus

Question

character count minus HTML characters C#

Answer 1

A:

Check this page out: http://stackoverflow.com/questions/787932/using-c-regular-expressions-to-remove-html-tags

GôTô 2010-10-08 14:50:08

Answer 2

+2 A:

Use the right tool for the problem.

HTML is not a simple format to parse. I would advise that you use a proven, existing parser rather than rolling your own. If you know that you will only ever parse XHTML - then you could use an XML parser instead.

These are the only reliable ways to perform operations on HTML that will preserve the semantic representation.

Don't try to use regular expressions. HTML is not a regular language and you can only cause yourself grief and misery going in that direction.

LBushkin 2010-10-08 14:53:22

Thanks for your advice. I looked at the parser and it doesn't look trivial to use. The only thing I have against the parser is that I don't want to parse a whole HTML document... just a snippet that will be added to the page dynamically.

Hristo 2010-10-08 15:30:27

@Hristo: Take a look at the `DocumentElement.SelectNodes` method. You should be able to select all nodes of all types, and then use the `InnerText` property to count the number of non-HTML characters.

LBushkin 2010-10-08 15:41:23

@LBushkin... before I can use any of the html agility pack features such as `DocumentElement.SelectNodes`, I need to get it working with Microsoft Visual Web Developer. Do you have any suggestions on how to get it "installed"?

Hristo 2010-10-11 14:55:36

Answer 3

A:

you can use regexp to remove html tags into another string and then count without them. Check out: http://stackoverflow.com/questions/787932/using-c-regular-expressions-to-remove-html-tags

Gmoliv 2010-10-08 14:53:51

ansaurus

tags:

views:

answers:

character count minus HTML characters C#

related questions