tags:

views:

256

answers:

5

Can anybody tell me how to identify the middle part interestedInThis and backreference the prefix: fontsize=12 and postfix: fontstyle=bold as ${1} and ${2}?

I'm dealing with this string:

<fontsize=12 interestedInThis fontstyle=bold>

Addendum: Sorry, I was not precise enough, here are the specifics:

  • prefix and postfix could be absent
  • prefix and postfix can be any string, not necessarily fontsize, resp. fontstyle
  • I know for sure, what I am looking for, namely interestedInThis and it will be separated through whitespaces.
+3  A: 
<([^>]*)interestedInThis([^>]*)>
chaos
Add a ? to the .* or else it will be greedy and match all the rest of the string: <(fontsize=\w+)\s+(.*?)\s+(fontstyle=\w+)>
Rob K
Or, better, change the .* to \S* (or \S+ since I presume it shouldn't match zero characters). While non-greedy * is useful, it's always better to specify what you actually want, and what you want here is non-whitespace characters (\S), not anything-but-newline characters (.).
Dave Sherohman
A: 

For your example, this could work

(<fontsize=\d+) (\w+) (fontstyle=bold>)

Unfortunatly, Perl doesn't seem to support named backreferences so I think you are stuck with <fontsize=12 in $1, ImInterestedInThis in $2 & fontstyle=bold> in $3.

regards, Lieven

Lieven
Perl 5.10 has named backreferences. http://perldoc.perl.org/perlreref.html
Dave Sherohman
Dave, time to update my Regexbuddy then. Thank you for letting me know.
Lieven
A: 

Basically

(<fontsize=12) (\S*) (fontstyle=bold>)

But, will the attribute values change? And, do you have to account for variable whitespace? If so, the above mutates into:

(<fontsize=\d+)\s+(\S*)\s+(fontstyle=.*>)

Also, in the above, by using \S, interestedInThis can contain anything that is not whitespace. If there is whitespace there too, for example interestedInThis is actually something like class="x" id="y", then maybe:

(<fontsize=\d+)(.*)(fontstyle=.*>)

Note that $2 is interestedInThis, and $1/$3 is actually your end pieces.

alphadogg
A: 

Try this:

my $result = m/(.*)(InterestedInThis)(.*)/;

Now:

  • $result is true if it found a match to the format.
  • InterestedInThis is in $2, though you already know what it is.
  • prefix (EVERYTHING before "InterestedInThis") is in $1.
  • postfix (EVERYTHING after "InterestedInThis") is in $3.
A: 

I think this is what you want;

<(.* )?InterestedInThis( .*)?>

It will return the the pre- and post-fix if they're there, but will still match if only one or neither are present.

It does have a minor problem that the spaces will be included in the tagged expressions, but that should be easy to remove after the match.

Alternatively, you could use lookahead / lookbehind to try to filter the spaces out as part of the match:

<(.*(?= ))? ?InterestedInThis ?((?<= ).*)?>
Whatsit