ansaurus

Question

How do I find the index location of a substring matched with a regex in Perl?

Answer 1

A:

You could split your string with the regex and output the size of the first array element, if there are more than one elemnts in the array. A simple sample:

my $test="123;456";
my @help=split(';', $test);
if ($#help>0) {
    print "Index is:".length($help[0]);
}

Edit: This fits to your simple example, but not fully with your text - if the regex gets more complex, then the size of the split criteria gets flexible again. Then you need to determine the index of the second element of the array to determine the size of the split criteria.

weismat 2009-05-07 04:34:52

I dont think that will quite work. In this case you are expecting that my regex will match begining of the string.

2009-05-07 04:38:09

This is an additional case which is not covered correctly - the correct condition would be then that the first element of the array is different than the original string and the correct index is the difference of the length between the first array element and the original string.

weismat 2009-05-07 06:14:43

Answer 2

+7 A:

I believe you are looking for pos:

 #!/usr/bin/perl

use strict;
use warnings;

my $sequence;
while (my $line = <DATA>) {
    if ($line=~ /(HDWFLSFKD)/g){
     print "its found index location: ", 
      pos($line)-length($1), "-",  pos($line), "\n";
    } else {
     $sequence .= $line;
     print "came in else\n";
    }
}

__DATA__
MLTSHQKKF*HDWFLSFKD*SNNYNSKQNHSIKDIFNRFNHYIYNDLGIRTIA
MLTSHQKKFSNNYNSKQNHSIKDIFNRFNHYIYNDLGIRTIA
MLTSHQKKFSNNYNSK*HDWFLSFKD*QNHSIKDIFNRFNHYIYNDLGIRTIA

You can also use the @- and @+ variables:

#!/usr/bin/perl

use strict;
use warnings;

my $sequence;
while (my $line = <DATA>) {
        if ($line=~ /HDWFLSFKD/){
                print "its found index location: $-[0]-$+[0]\n";
        } else {
                $sequence .= $line;
                print "came in else\n";
        }
}

__DATA__
MLTSHQKKF*HDWFLSFKD*SNNYNSKQNHSIKDIFNRFNHYIYNDLGIRTIA
MLTSHQKKFSNNYNSKQNHSIKDIFNRFNHYIYNDLGIRTIA
MLTSHQKKFSNNYNSK*HDWFLSFKD*QNHSIKDIFNRFNHYIYNDL

Chas. Owens 2009-05-07 04:41:05

are pos and @position functions same?

2009-05-07 05:13:52

what does $1 do?

2009-05-07 05:32:13

$1 holds the first capture. I am using it because I do not want to hardcode the length of the search string. The pos function tells you where the last match left off in a given string. The @- and @+ arrays are set to the beginning and ending of matches respectively. $-[0], $+[0] are the beginning and end of the whole match, $-[1], $+[1] are the beginning and end of the first capture, $-[2], $+[2] is the second capture, and so on.

Chas. Owens 2009-05-08 02:17:20

If I'm reading the problem right, you want the regex /H(DWFLSFKD)/ because he wants the position between of the D's. I take it the asterisks are there for highlighting rather than literal data.

brian d foy 2009-05-09 20:21:30

ansaurus

tags:

views:

answers:

How do I find the index location of a substring matched with a regex in Perl?

related questions