tags:

views:

36

answers:

3

Im trying to do a regex where I can find all html tags, but for each one, each opening and closing tag must be the same. Heres what I mean: (Yes I only want max 3 letters)

preg_match_all("/\<[a-z]{1,3}\>(.*?)\<\/[a-z]{1,3}\>/", $string, $matches);

Where the 2 [a-z]{1,3} are, I want those to be the same, so it doesn't match <b> with <\i>, etc. Thanks... let me know if you need further explanation

+1  A: 

Don't parse HTML with regex. Use PHP Tidy instead.

Vivin Paliath
Im not really parsing HTML, its just the closest example and easiest explanation to show what Im trying to do..
David
So you're parsing XML? :P Sorry, whenever I see `regex` and HTML I laugh.
Nick T
It doesn't matter if you're parsing HTML/XML or if you're checking for specific closing-tags. HTML and Regex go together like gasoline and milk. i.e., not recommended. :)
Vivin Paliath
@David: If it's so much *like* HTML, could you just use an *ML parser anyways?
Nick T
@David what Nick said. Tidy works for XML/HTML/*ML.
Vivin Paliath
A: 

As Vivin Paliath said plus you can try to use PHP5's DomDocument with XPath

http://php.net/manual/en/class.domdocument.php

jakenoble
+1  A: 

you really shouldn't be parsing *ml with regex because of problems with nested elements, but if this is any help:

preg_match_all("/<([a-z]{1,3})>(.*?)<\/\1>/", $string, $matches);
stillstanding
Be aware that this won't handle tags that are enclosed in the same kind of tag. For example, given `<foo><foo></foo></foo>`, it will match `<foo><foo></foo>`.
Alan Moore