views:

128

answers:

5

Using JavaScript RegEx.

How can I match a <p> element(including attributes), but not <param> or other HTML elements starting with a "P".

+4  A: 

Try:

/(<p(?:\s+[^>]*)?>)/i

/
 (        #start capture group
  <p       #match '<p'
  (?:      #start non-capture group
    \s+     #match one or more white space characters
    [^>]*   #match zero or more characters that arent >
  )?       #end non-capture group - make it optional
  >        #match '>'
 )        #end capture group
/i        #end regexp - make case insensitive
gnarf
The problem is, that this doesn't match a simple `<p>`. -1
Boldewyn
I noticed that too - and fixed it ;)
gnarf
Yup, reverted the -1. +1 for comments.
Boldewyn
Same as Doldewyn, reverted +1 =)
Clement Herreman
A: 
/<(?:p|P)\s+/.exec(s);

Although it doesn't match the entire tag, but that's quite complicated considering that the tag closing symbol > is allowed inside an attribute.

erikkallen
Right, to avoid that, replace "<" by "<", etc.
Clement Herreman
Same as gnarf's answer. Doesn't match a plain `<p>`.
Boldewyn
@Clement: How would you do this. Consider <p title="Greater than (>)">. How would you match the entire tag with a regular expression? (I realize it's possible, that's why I said "quite comlicated", not "impossible").
erikkallen
A: 

<(p|P)([\s].*)?>

seems to work good =). But you shouldn't use RegEx when you can use DOM, or even XML/XPath/whatever.

Clement Herreman
the `.*` in your regexp will be greedy and continue matching beyond the end of the tag, up until the last available `>`. `[^>]*` or even `.*?` would be better.
gnarf
You're right, that is better indeed =)
Clement Herreman
+2  A: 
/<p\b[^>]*>/i

\b matches a word boundary; coming after the 'p' it means the next character (if there is a next character) is not a letter, digit or underscore.

Disclosure: [^>]* isn't really the correct way to match the rest of the tag, since attribute values can legally contain angle brackets. But it's probably good enough, and that's not what the question is about anyway.

Alan Moore
A: 

Here is my try:

/\<P(\s+\w+=\"?[^\"\s\>]*\"?)*\>/gi
Mic