ansaurus

Question

regex expression

Answer 1

A:

Obligatory "don't use regex to parse HTML" warning:

Using regex to parse HTML has been covered at length on SO. Please read the following post:

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

Would it be possible to convert your HTML to XHTML and parse it using xpath?

Using a tool like HTML Tidy or SGML you can do this conversion. Then you could use xpath to extract the desired data: //entry/link

Abe Miessler 2010-10-19 16:59:22

Answer 2

+4 A:

I would use this software to help with your regexes.

Free RegExBuilder software.

Jimmie Clark 2010-10-19 17:00:31

Answer 3

+1 A:

The best way to do this in .Net is via the HTML Agility Pack. Using regular expressions on html is not usually a good idea.

The exceptions are situations where you can make certain assumptions about the structure of the html, such as one-off jobs (where you can study the actual input for your program) or when the html is generated by a trusted source. For example, can you assume that the html is well-formed or that tags will not be nested beyond a certain depth? (Note that neither of those assumptions by themselves are good enough to build an expression that won't fall down given some edge case or other.)

If you meet this criteria we need to know exactly what assumptions you are allowed to make before we can write an accurate expression.

Joel Coehoorn 2010-10-19 17:06:31

ansaurus

tags:

views:

answers:

regex expression

related questions