ansaurus

Question

How can I find the first occurrence of a pattern in a string from some starting position?

Answer 1

+12 A:

$string=~/^   # from the start of the string
            (?:.{$p0}) # skip (don't capture) "$p0" occurrences of any character
            (?:...)*?  # skip 3 characters at a time,
                       # as few times as possible (non-greedy)
            (aaa|bbb|ccc) # capture aaa or bbb or ccc as $1
         /x;

(Assuming p0 is 0-based).

Of course, it's probably more efficient to use substr on the string to skip forward:

substr($string, $p0)=~/^(?:...)*?(aaa|bbb|ccc)/;

Mike G. 2008-09-23 09:44:27

will $1 contain the count of triplets?

slashmais 2008-09-23 09:47:09

(pls ignore my comment above) good one, but I need the count of triplets skipped - or maybe now that I know the first one, I can find it's pos and just use the difference from p0 as the count.

slashmais 2008-09-23 09:59:01

No, $1 will contain the matched triplet (aaa or bbb or ccc). But you can use $-[1] to access its starting position

moritz 2008-09-23 10:00:33

If you need the count of starting triplets, just remove the ?: from the triplets parens to make them capturing. Then length($1)/3 is the number of skipped triplets. But brian d foy's answer looks better...

Mike G. 2008-09-23 21:03:56

Answer 2

+9 A:

You can't really count with regexes, but you can do something like this:

pos $string = $start_from;
$string =~ m/\G         # anchor to previous pos()
            ((?:...)*?) # capture everything up to the match
            (aaa|bbb|ccc)
            /xs  or die "No match"
my $result = length($1) / 3;

But I think it's a bit faster to use substr() and unpack() to split into triple and walk the triples in a for-loop.

(edit: it's length(), not lenght() ;-)

moritz 2008-09-23 09:56:28

Answer 3

+11 A:

Moritz says this might be faster than a regex. Even if it's a little slower, it's easier to understand at 5 am. :)

             #0123456789.123456789.123456789.  
my $string = "alsdhfaaasccclaaaagalkfgblkgbklfs";  
my $pos    = 9;  
my $length = 3;  
my $regex  = qr/^(aaa|bbb|ccc)/;

while( $pos < length $string )    
    {  
    print "Checking $pos\n";  

    if( substr( $string, $pos, $length ) =~ /$regex/ )
        {
        print "Found $1 at $pos\n";
        last;
        }

    $pos += $length;
    }

brian d foy 2008-09-23 10:19:50

yes - I should write KISS on a poster stick it on the wall above my monitor! Thanx.

slashmais 2008-09-23 10:56:53

Answer 4

A:

The main part of this is split /(...)/. But at the end of this, you'll have your positions and occurrence data.

my @expected_triplets = qw<aaa bbb ccc>;
my $data_string      
    = 'fjeidoaaaivtrxxcccfznaaauitbbbfzjasdjfncccftjtjqznnjgjaaajeitjgbbblafjan'
    ;
my $place          = 0;
my @triplets       = grep { length } split /(...)/, $data_string;
my %occurrence_for = map { $_, [] } @expected_triplets;
foreach my $i ( 0..@triplets ) {
    my $triplet = $triplets[$i];
    push( @{$occurrence_for{$triplet}}, $i ) if exists $occurrence_for{$triplet};
}

Or for simple counting by regex (it uses Experimental (??{}))

my ( $count, %count );
my $data_string      
    = 'fjeidoaaaivtrxxcccfznaaauitbbbfzjasdjfncccftjtjqznnjgjaaajeitjgbbblafjan'
    ;
$data_string =~ m/(aaa|bbb|ccc)(??{ $count++; $count{$^N}++ })/g;

Axeman 2008-09-23 18:24:27

Answer 5

A:

If speed is a serious concern, you can, depending on what the 3 strings are, get really fancy by creating a tree (e.g. Aho-Corasick algorithm or similar).

A map for every possible state is possible, e.g. state[0]['a'] = 0 if no strings begin with 'a'.

Brian 2008-11-07 21:16:04

ansaurus

tags:

views:

answers:

How can I find the first occurrence of a pattern in a string from some starting position?

related questions