ansaurus

Question

Regexp that matches all the text content of a HTML input

Answer 1

+1 A:

Perhaps a regular expression is not the best choice for this job (I will spare you the obligatory tirade).

I would recommend that you look into an HTML parsing library to help you here, something like Html Agility Pack.

Andrew Hare 2009-12-06 15:06:49

Answer 2

+2 A:

I would recommend using an HTML parser, rather than relying on a regex. Parsing HTML with regex is generally a no-no and are nearly impossible to get right for all cases. There are many questions here on SO that arrive at the same conclusion.

EDIT looks like a couple of us had the same idea... Also, here is a question that discusses more parsers.

jheddings 2009-12-06 15:08:53

Answer 3

+1 A:

As people said, regex is not the most recommended way, but if you decide that regex is the way to go, this should get you started:

string pattern = @"(<(/?[^>]+)>)"
strippedString = Regex.Replace(str, pattern, string.Empty);

Elad 2009-12-06 15:12:17

Answer 4

A:

Hi,

Not sure if this helps but I have the ability to translate articles on my site into a readers preferred language, I done this using the Bing translation widget so I don't do any parsing of html it's all done for me.

Alan 2009-12-06 15:17:22

ansaurus

tags:

views:

answers:

Regexp that matches all the text content of a HTML input

related questions