tags:

views:

85

answers:

5

hi
I need to find the complement of this:

$_ = 'aaaaabaaabaaabacaaaa';

while( /([a][a][a][a])/gc){
    next if pos()%4 != 0;
    my $b_pos = (pos()/4)-1;
    print " aaaa at :$b_pos\n";
}

That is, a suite of 4 caracters that is not 'aaaa'.
The following doesn't work

$_ = 'aaaaabaaabaaabacaaaa';

while( /([^a][^a][^a][^a])/gc){
    my $b_pos = (pos()/4)-1;
    print "not a at :$b_pos\n";
}

Of course I can do this

$_ = 'aaaaabaaabaaabacaaaa';

while( /(....)/gc){
    next if $1 eq 'aaaa';
    my $b_pos = (pos()/4)-1;
    print "$1 a at :$b_pos\n";
}

Isn't there a more direct way?

To clarify the expected result, I need to find all 4 letter suite that are not 'aaaa' as well as there position.
1st code outputs

 aaaa at :0
 aaaa at :4

2nd code should output

not aaaa at :1
not aaaa at :2
not aaaa at :3

3rd code output, is what I'm looking for

abaa at :1
abaa at :2
abac at :3

I understand I haven't been clear enough, please receive my appologies.
What I'm trying to acheive is like dividing a string in groups of 4 letters, getting the value and position of the groups that doesn't match the pattern.

My third code gives me the expected result. It reads the string 4 letter at the time and process the those that aren't 'aaaa'.
I also found out, thank to all of your suggestions, that my first code doesn't work as expected, it should skip if pos()%4 != 0, which would mean that the pattern spans over two groups of 4. I corrected the code.

Against all expectations, from me and others, the following doesn't ouput anything at all

/[^a]{4}/

I should probably stick with my 3rd code.

+4  A: 
/(?!aaaa)/

This is a negative lookahead which matches at the first position where the pattern aaaa doesn't match.

Alternatively,

/[^a]{4}/

will match 4 characters together which are all not a.

Amber
That's what I tought at first, but none of this gives me the expected result
kaklon
Perhaps show us what you're actually doing?
Amber
@kaklon this is correct answer as per the current form of the question - if this is not what you want, may be you should explain a bit more. Add some sample strings from both sides - what should and should not match.
Amarghosh
A: 

How about this:

/[^a]{4}/
Curd
A: 

The complemented binding:

$string !~ /pattern/;
msw
A: 

Try this:

/(?:(?!aaaa)[a-z]){4}/g

Before each character is matched, the lookahead ensures they aren't aaaa.

Alan Moore
+1  A: 

EDIT: After some more fiddling and thought I found the proper solution, I'll leave the previous answer for reference...

It seems /aaaa(?!aaaa)....|(?!aaaa)..../gc is the complement of /aaaa/ for your purposes:

$_ = 'aaaaabaaabaaabacaaaa';
while( /aaaa(?!aaaa)....|(?!aaaa)..../gc ){
    my $b_pos = (pos()/4)-1;
    print substr($_,$b_pos*4,4)." at :$b_pos\n";
}

Gives as result:

abaa at :1
abaa at :2
abac at :3

Previous answer

The negative lookahead does not interact with "block" iteration, even in your small sample input:

use POSIX floor;
$_ = 'aaaaabaaabaaabacaaaa';
while( /(?!aaaa)..../gc ){
    my $b_pos = floor(pos()/4);
    print " !aaaa at :$b_pos str:".substr($_,$b_pos*4,4);
    print " c_pos:".(pos()-4)." str:".substr($_,(pos()-4),4)."\n";
}

With output:

 !aaaa at :1 str:abaa c_pos:2 str:aaab
 !aaaa at :2 str:abaa c_pos:6 str:aaab
 !aaaa at :3 str:abac c_pos:10 str:aaab
 !aaaa at :4 str:aaaa c_pos:14 str:acaa

This is because the lookahead will be evaluated character by character, not in blocks of 4. This means that in the case of aaaabaaa, it will check aaaa then aaab which will not lookahead match aaaa thus those will be consumed, not baaa as one would possibly want...

However judicious use of map, grep and split solve the problem:

my $c = 0;
print "!aaaa at positions: ", 
      join ",", map { $$_[1] } 
                    grep { $$_[0] !~ /aaaa/ } 
                         map { [$_, $c++ ] } 
                             grep /./, split /(.{4})/, $_;
print "\n";

results in:

!aaaa at positions: 1,2,3

Explanation:

  1. split /(.{4})/, $_ will split the input into a list of blocks of 4 characters
  2. However usage of regexp capture in split may cause empty blocks to be on the list, thus we eliminate them using grep /./
  3. Now we create tuples of the input plus the block number (thus we need a $c initialized to 0...)
  4. Now we filter the elements which do not match 'aaaa'
  5. Now we map to retrieve just the block number...

To match your exact output:

my $c = 0; 
print "",  
  join "\n",  
       map { $$_[0]." at: ".$$_[1] }  
           grep { $$_[0] !~ /aaaa/ }  
                map { [$_, $c++ ] }  
                    grep /./, split /(.{4})/, $_; 
print "\n"; 
njsf