tags:

views:

156

answers:

3

The project I recently joined abstracts logic into code and database elements. Business logic like xPaths, regular expressions and function names are entered in the database, while general code like reading files, creating xml from xpaths, etc are in the code base.

Most (if not all) of the methods that use regular expressions are structured thus:

if ( $entry =~ /$regex/ ) { $req_value = $1; }

This means that only $1 is available and you always have to write your regex to give you your desired result in $1.

The issue:

The result for the following strings should be either

'2.6.9-78.1.6.ELsmp (SMP)' or '2.6.9-78.1.6.ELsmp'

depending on the existence of SMP. $1 does not suffice for $entry[0].

$entry[0] = qq|Linux version 2.6.9-78.1.6.ELsmp ([email protected]) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-10)) #1 SMP Wed Sep 24 05:41:12 EDT 2008|;
$entry[1] = qq|Linux version 2.6.9-78.0.5.ELsmp ([email protected]) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-10)) #1 Wed Sep 24 05:41:12 EDT 2008|;

Hence my solution:

my $mutable = '';
my $regex = qr/((\d.*?)\s+(?:.*)?(SMP)((?{$mutable="$2 ($3)"}))|(\d.*?))\s+/;
if ($entry[$i] =~ /$regex/) {
    $req_value = $1; 
    $req_value = $mutable if ($mutable ne '');
    $mutable = '';
}

Unfortunately, the existence of a 'variable' in the database makes this solution unacceptable.

My questions are:

  1. How can I clean up the above solution to make it acceptable with the structure available?

    or

  2. How can I use a substitution regex with the structure 'if ($entry =~ /$regex/)'?

Thanks.

A: 

I don't fully understand your constraints. Are you limited to supplying a single regex that will always by processed using the code in your first excerpt? If so, you cannot do what you are trying to do. You are trying to extract two separate parts of the entry string, you simply can't return 2 values in a single scalar return value unless you can add the code to concatenate them.

Can you add perl code at all? For example, can you define the logic to be:

if ( $entry =~ /$regex/ ) { $req_value = '$1 $2'; }

where your $regex = qr/((\d.*?)\s+(?:.*)?(SMP)/; ?

Baring the ability to define some new perl code, you can't accomplish this.

Regarding part two, substiutions. I interpret your question to ask if you can compile both the PATTERN and REPLACEMENT parts of s/PATTERN/REPLACEMENT/ into a single qr//. If so, you cannot. qr// only compiles a matching pattern, and a qr variable can only be used in the PATTERN portion of a REPLACEMENT. In other words, to use s///, you'll need to write perl code that runs s///. I'm guessing that if you could write new perl code, you'd use the above solution.

One more thought: In your current architecture, can you define fields in terms of of other fields? In other words, could you extract the version string with one regex, the SMP string with another regex, and define a third field that combines the two?

@Jason Clark: I can add code, but this means not using this method for this particular element. { $req_value = '$1 $2'; } won't work because other regexes expect $1. The second part of my regex takes care of where SMP does not exist. The idea of defining fields in terms of other fields could work, but is unwieldy and could also confuse other developers after me, don't you think? You're right about s///. I tried it by splitting a (regex) string into 2 parts, but perl barfed, because qr// must hold a valid regex which the second part cannot be, not to mention the switch and modifiers. Ugh!
Dee
@ all readers: If something cannot be done, then it simply cannot be done. I just thought there may be a solution, but it may be there's none, given the architecture?
Dee
+1  A: 

You're stuck unless you can talk the folks who control the code you're using into generalizing it somehow. The good news is you need only a bit more, perhaps

if (my @fields = $_ =~ /$pat/) {
  $req_value = join " " => grep defined($_), @fields;
}

This works because a successful regular-expression match in list context returns all captured substrings, i.e., $1, $2, $3, and so on as appropriate.

With a single pattern,

qr/(\d+(?:[-.]\w+)*)(?:.*(SMP))?/

the code above yields 2.6.9-78.1.6.ELsmp SMP and 2.6.9-78.0.5.ELsmp in $req_value. The grep defined($_) filters out captures for subpatterns not taken. Without it, you get undefined value warnings for the non-SMP case.

The downside is every regular expression would need to be reviewed to be sure that all capturing groups really ought to go in $req_value. For example, say someone is using the pattern

qr/(XYZ) OS (version \d+|v-\d+)/

As it is now, only XYZ would go into $req_value, but using the above generalization would also include the version number. If that's undesired, the regular expression should be

qr/(XYZ) OS (?:version \d+|v-\d+)/

because (?:...) does not capture (that is, it does not produce a $2 for the pattern above): it's for grouping only.

Greg Bacon
You and Jason Clark are right. I think the best solution is to write a new method. The answer it seems cannot be found in this one. But in your answer above, how would you deliver the parenthesized SMP '(SMP)'? You still need some further processing after obtaining the matches.
Dee
The generalization would handle it. I added clarification in the updated answer.
Greg Bacon
This is the sort of thing I might solve with beer and pizza for the right people. Seriously.
brian d foy
@ gbacon: I like it, TMTOWTDI.
Dee
@ brian d foy: May I declare myself one of the right people? I AM one of the right people! I have pizza and beer. How can I get it to you? Are you currently in Europe perchance? :o)
Dee
Not for brian: for the people in charge of the inflexible code you'd like changed!
Greg Bacon
Ah gbacon! With this comment, I can't refer them to this discussion for their benefit.
Dee
@ gbacon: You still missed out parenthesizing the SMP. That's got to require processing outside the regular expression to achieve hasn't it? In any case, I found a number of regexes in the database that will need changing for this solution, so I don't think they'll go for it. I'm also exploring brian's suggestions of moving the processing (?{...}) and checking Regex::Grammars. I will revert when I have a solution. Bless...
Dee
A: 

As of 5.10.0, (?|pattern) is available to allow alternatives to use the same capture numbering. As you pointed out that you're still using 5.8, this may not be useful directly but perhaps as further incentive to your project to start moving to a modern Perl.

masto
Agreed, but the client is a storage coy and deliver hardware with certain configurations, so I can't really affect this. Named backreferences in 5.10 would have solved this problem instantly.:-)
Dee