views:

73

answers:

2

In the streams I am parsing I need to parse something in this pattern:

<b>PaintTitle</b></td><td class=detail valign="top" align=left><div align=left><font size=small><b>The new great album by Pet Shop Boys</b>

How would I get the string "The new great album by Pet Shop Boys" where <b>PaintTitle</b> is guaranteed to be once per album?

+1  A: 
(?:<b>PaintTitle<\/b>).*<b>(.*)<\/b>

Match group 1 is "The new great album by Pet Shop Boys" with that expression.

Ian C.
Thanks Ian, I will give it a try now.
Joan Venge
m.Groups [0].Value returns the whole string. Not just "The new great album by Pet Shop Boys", do you know why?
Joan Venge
m.Groups[1] also returns the wrong result. It returns a sample way ahead of the stream. Can we make it so that it matches the first <b> after the album title?
Joan Venge
@Joan Venge: I tested it in Perl against your string and it was okay for me. I'd need to see more of the stream to give you a better expression. I'll also echo what the comments to your question say: why do this with a regular expression? It's making things harder than they should be.
Ian C.
Thanks, the reason I use regex is, I just need a dumb string parser that gives me a list of values from a stream. Do you mean there are better ways for this? It's nothing like a compiler thing though.
Joan Venge
Replace the .* with the non-greedy version: .*? and it might work better for you.
Ian C.
Thanks Ian. Now it works.
Joan Venge
+1  A: 

If you insist on using regex, you can try this instead:

(?:<b>PaintTitle<\/b>).*?<b>(.*?)<\/b>
polygenelubricants
Thanks, this worked great.
Joan Venge