ansaurus

Question

Answer 1

A:

Forget regex's, use this instead:

http://simplehtmldom.sourceforge.net/

fredley 2010-08-25 22:13:39

Suggested third party alternatives to [SimpleHtmlDom](http://simplehtmldom.sourceforge.net/) that actually use [DOM](http://php.net/manual/en/book.dom.php) instead of String Parsing: [phpQuery](http://code.google.com/p/phpquery/), [Zend_Dom](http://framework.zend.com/manual/en/zend.dom.html), [QueryPath](http://querypath.org/) and [FluentDom](http://www.fluentdom.org).

Gordon 2010-08-25 23:00:32

Answer 2

+1 A:

You don't use regex. You use a real parser, because this stuff cannot be parsed with regular expressions. You'll never know if you've got all the corner cases quite right and then your regex has turned into a giant bloated monster and you'll wish you'd just taken fredley's advice and used a real parser.

For a humourous take, see this famous post.

Jack Kelly 2010-08-25 22:18:57

Answer 3

+1 A:

preg_replace('#<my_tag\b([^>]*)>(.*?)</my_tag>#',
   '<my_new_tag$1>$2</my_new_tag>', $source)

The ([^>]*) captures anything after the tag name and before the closing >. Of course, > is legal inside HTML attribute values, so watch out for that (but I've never seen it in the wild). The \b prevents matches of tag names that happen to start with my_tag, preventing bogus matches like this:

<my_tag_xyz>ooga-booga</my_tag_xyz><my_tag>tra-la-la</my_tag>

But that will still break on <my_tag> elements wrapped in other <my_tag> elements, yielding results like this:

<my_tag><my_tag>tra-la-la</my_tag>

If you know you'll never need to match tags with other tags inside them, you can replace the (.*?) with ([^<>]++).

I get pretty tired of the glib "don't do that" answers too, but as you can see, there are good reasons behind them--I could come up with this many more without having to consult any references. When you ask "How do I do this?" with no background or qualification, we have no idea how much of this you already know.

Alan Moore 2010-08-26 04:06:16

Good answer: practical, shows why the standard response is "don't do that", points out the pitfalls of the proposed solution.

Jack Kelly 2010-08-30 03:27:47

ansaurus

tags:

views:

answers:

Help with Regex in PHP

related questions