tags:

views:

234

answers:

4

I have a string:

123 + FOO1[ccc + e_FOO1 + ddd + FOO2[b_FOO2]] = 123

Now, I need to check that the FOO1 shows along with the e_. That is, there can't be situation like this:

123 + FOO1[ccc + e_FOK1 ...]

My simple question is how can I tell Perl to catch the FOO1 word for example ?

I thought to search between 2 characters: " " and "["

and then check if it is written correctly after " e_" between the "[..]" for example.

HOW CAN I DO IT RECURSIVELY ?

+2  A: 

You need to write a parser for your mini-language: See Parse::RecDescent. The calculator demo would be a good starting place.

#!/usr/bin/perl

use strict;
use warnings;

my ($expr) = @ARGV;

my @tokens = split //, $expr;

my ($word, $inside) = (q{}, 0);

for my $token (@tokens) {
    $token =~ /\A\w\z/ and do { $word .= $token; next };

    if ( $inside ) {
        if ( $word =~ /FOO1/ ) {
            $word eq 'e_FOO1'
                or die "No FOO1 w/o e_ prefix allowed!\n"
        }
    }
    else {
        $word !~ /FOO1/
            or die "No FOO1 allowed!\n";
    }

    $token eq '[' and ++$inside;
    $token eq ']' and --$inside;
    $word = '';
}
C:\Temp> t.pl "123 + MOO1[ccc + e_FOO1 + ddd + FOO2[b_FOO2]] = 123"
C:\Temp> t.pl "123 + FOO1[ccc + e_FOO1 + ddd + FOO2[b_FOO2]] = 123"
No FOO1 allowed!
C:\Temp> t.pl "123 + MOO1[ccc + FOO1 + ddd + FOO2[b_FOO2]] = 123"
No FOO1 w/o e_ prefix allowed!

See also the FAQ Can I use Perl regular expressions to match balanced text?

Sinan Ünür
Parser could be one answer.But isn't a simple way in perl to catch a string between 2 characters ?
YoDar
YoDar: according to your description, it is not just "between 2 characters" that you want, but "inside a parenthesis with further nesting". Writing a little parser is exactly the needed way.
Svante
A: 

If your situation is more complex than you've described, this code won't work (for example, it does nothing to ensure than your left and right brackets balance each other). However, the code does illustrate how to use back-references (see \1 below), which might get you on the right track.

use strict;
use warnings;

while (<DATA>){
    warn "Bad line: $_" unless / (\w+) \[ .* e_\1 .* \] /x;
}

__DATA__
123 + FOO1[ccc + e_FOO1 + ddd + FOO2[b_FOO2]] = 123
123 + FOO1[ccc + e_FOOx + ddd + FOO2[b_FOO2]] = 123
FM
I added some condition, and it almost fit my case!thanks FM for your simple question.if ($_ =~ m/e_/) { warn "Bad line (e_): $_" unless / (\w+) \[ .* (e_\1) .* \] /x; }
YoDar
Hey FM, may you know how to do that in a recursive flow ??
YoDar
@YoDar I'm not certain I understand your requirements well enough to offer a useful answer. Perhaps you should post another question on StackOverflow.
FM
Thanks anyway :)
YoDar
+1  A: 

Based on some of your comments, I'm going to assume that your question is "between the '[' and ']' brackets, ensure that any 'e_' symbol is 'e_FOO' and not something else...

(Edit: okay, it appears like you need the "FOO" keyword to also appear just before the square brackets.. I'll revise the regex accordingly.)

if ($line =~ /
              ([A-Z]+)  # match a keyword in all caps, and save it for later
                        # (we can retrieve it with \1 or $1)
              \[        # match the first [
                [\]]*   # some number of any character that isn't ]
                e_      # a ha, here's our e_
                \1      # and here's our keyword that we matched earlier
                [\]]*   # some more of any character that isn't ]
              \]        # here's our closing ]
             /x)
{
     say "Good data";
}
else
{
     say "Bad data";
}

But please, start reading some of the tutorials in perldoc perlre.

Ether
Thanks, I'll read perldoc perlre
YoDar
A: 

since you said "I need to confirm that FOO1 is followed the "e_" string that inside its brackets", you just need to check for e_FOO1, right? no need for too complicated regex.

my $str="123 + FOO1[ccc + e_FOO1 + ddd + FOO2[b_FOO2]] = 123";
my $s = index($str,"[");
my $e = index($str,"]");
my $f = index($str,"e_FOO1");
if ( $f >=$s and $f <= $e ){
    print "found \n";
}
ghostdog74
FOO1 could be anything :)
YoDar
well, if its "anything", then since "e_" is one of the criteria for searching, just use "e_" in the index(), instead of e_FOO1
ghostdog74