tags:

views:

25

answers:

1

Hi,

I'm working on a regular expression pattern to extract tag and attributes from an html element. But I have some problems with matching the attributes :s. Only the last attribute is stored into the matches array.

Here is the code:

<?php
    $subject = '<font face="arial" size="1" color="red">hello world!</font>';
    $find= '/<(?P<tag>\w+)\s+((?P<attr>\w+)=(?P<value>[^\s""\'>]+|"[^"]*"|\'[^\']*\')\s*)*\/?>/si';

    preg_match_all( $find, $subject, $matches );
?>

Can someone help me out?

Many thanks

+1  A: 

Some important points:

  • You shouldn't use regex to parse HTML. PHP has many excellent HTML parsing libraries.
  • A group that captures repeatedly in a match only keeps the last capture.
    • One notable exception is .NET regex

References

Related questions

polygenelubricants
This is the better read: http://www.regular-expressions.info/captureall.html - Capturing a repeated group vs repeating a capturing group.
polygenelubricants