views:

80

answers:

2

Hi,

I have the following Regular Expression from this post (Regular expression for extracting tag attributes).

(\S+)=["\']?((?:.(?!["\']?\s+(?:\S+)=|[>"\']))+.)["\']?

I've created the following PHP code and it works nicely. I get [id='gridview1' and 'id' and 'gridview1'] from the preg_match_all() function.

$regexp = '/(\S+)=["\']?((?:.(?!["\']?\s+(?:\S+)=|[>"\']))+.)["\']?/';
$text = '<asp:gridview id=\'gridview1\' />';

$matches = null;
preg_match_all($regexp, $text, $matches);

print_r($matches);

How should the regular expression be changed to also return 'asp' and 'gridview'? (or 'Foo' and 'bAR' when i use:

<Foo:bAR />

+1  A: 

([a-zA-Z]+)\:([a-zA-Z]+) would work for something like Foo:bar

<.*?([a-zA-Z])+.*?\:.*?([a-zA-Z])+.*?\/> would work for < Foo : BArrr />

Things can be optimized depending on your requirements and whether you know that a certain type of formatting is enforced.

Patrick Gryciuk
I'm gonna try tomorrow to see if you're right... After that i'll be working with my XML parser ;)
Ropstah