views:

1258

answers:

3

Here's an interesting problem. Is it possible to split a string on the last matching regex only?

Consider the following list of column headings from my data file (which read along the same line, being tab-separated):

Frequency Min
Frequency Avg
Frequency Max
Voltage L1 Min
Voltage L1 Avg
Voltage L1 Max
Active Power L1 Min
Active Power L1 Avg
Active Power L1 Max

At present, my data is appended as an array to each column (e.g. @{ $data{Frequency Min} }, @{ $data{Active Power L1 Avg} }). It would be nice to be able to create sub-hashes based on the Min, Max and Avg keywords (e.g. @{ $data{Frequency}{Min} }, @{ $data{Active Power L1}{Avg}), which is why I want to split on the last whitespace of each heading.

Notice that the situation is made more difficult by the fact that any number of whitespaces can occur before the final match is found.

I've thought of reversing the string, performing the split once and then re-reverse both strings separately, but that's too messy for my liking. Is there a neater way to do this?

+7  A: 

You can use a pure regex match rather than using split:

my ($label, $type) = /(.*)\s+(\S+)/;
Chris Jester-Young
Could you explain what the (.*) does within the regex?
Zaid
The .* matches everything ("any number of any character"). It's greedy, and will match as much as possible. The \S+ matches all the non-space characters ("at least one non-space character"); also being greedy, this means that in practice, the \S+ will match the last token only, because the .* will have taken all the preceding tokens.
Chris Jester-Young
.* matches any string of characters whatsoever, and the parens around it make that part of the expression into the first member of the list that is returned by the match expression.
Matt Kane
(Remember that \S doesn't match spaces, and so the \S+ can't span more than one token.)
Chris Jester-Young
`.` matches any character (except newline), `*` is a quantifier the says to match the preceding atomic unit zero or more times. The preceding atomic unit in this case is `.`, so `.*` means match anything zero or more times (without making the match fail).
Chas. Owens
Thanks for the explanations. Sometimes the ActivePerl User Guide just isn't clear enough...
Zaid
+1  A: 

Split only on spaces that have no other space after them?

($subname, $subtype) = split / (?!.*? )/, $heading, 2;
ysth
What does that second '?' achieve exactly?
Zaid
No effect in outcome, just lets the regex engine know that it can stop looking at the first space it finds. Without it, the .* matches as much as possible, which would be up until the last space character.
ysth
A: 

Assuming you know all the types:

($label, $type) = /(.*)\s*(min|avg|max)$/i;