ansaurus

Question

Answer 1

+1 A:

I don't know what system you are using, but it can be done to a certain extent. Look at this online flex-based application. Check out the Published > XML regex examples. You will get an idea.

dirkgently 2009-02-23 08:58:04

Can't find an example that helps me with the problem, but it's a great resource! I'm using ASP.net regex.

miccet 2009-02-23 09:12:20

Answer 2

+6 A:

There are many similar questions on SO:

etc. The general agreement is that it's best not to use regular expressions to parse HTML instead of doing it properly by applying a DOM parser and traversing the DOM tree.

David Hanak 2009-02-23 09:08:31

You might want to change that link text from the URL to the question text so it's more readable.

cletus 2009-02-23 09:14:28

Yea, I have seen them. I'm not really worried about best practice here though since it's not gonna end up in an application anyway. The biggest problem I see with what I want is to match the first char "<" but not include it in the match, if that makes sense.

miccet 2009-02-23 09:16:55

@miccet: use parentheses to group the stuff you are interested in.

dirkgently 2009-02-23 09:23:54

@cletus: I might, but I'm a lazy bastard. Besides, it's not really the title that matters, given that they are all related to the same problem.

David Hanak 2009-02-23 09:49:02

Answer 3

+3 A:

It's virtually impossible to regex HTML once you start considering all the special cases and malformed HTML that browsers sometimes happilly parse anyway. That said however I thought it might be fun to get the names without using capture groups and thus I present too you with the following sollution:

(?<=<)\w+(?=[^<]*?>)

For the record I hold little faith in it being at all useful in any but the most trivial of cases.

Kit Sunde 2009-02-23 09:16:38

It's just made for an example anyway, and doesn't need to be bullet proof. This worked perfectly, and I see how the exclude function works. Thanks a bunch.

miccet 2009-02-23 09:19:19

-1 Wrong on so many levels.

cletus 2009-02-23 09:32:17

@cletus: On what level is this wrong that he did not already cover?

Ant P. 2009-02-23 09:36:20

ansaurus

tags:

views:

answers:

Regex to read out HTML tags

related questions