tags:

views:

200

answers:

2

I am iterating through a file and on each line I am looking for a regex. If the regex is found I just want to print "it's found" and then the index location of where it was found in that line.

Example:

looking for: 'HDWFLSFKD' need index between two Ds
line: MLTSHQKKF*HDWFLSFKD*SNNYNSKQNHSIKDIFNRFNHYIYNDLGIRTIA
output: 'its found' index location: 10-17

The above 'looking for' is quite simple but I am planning to have complex regex statements in there.
So basically Just want to know if a regex is found in a string then how do we get the index location of it?

Here is the code I have so far:

foreach my $line (@file_data)
{
        if ($line=~ /HDWFLSFKD/){
            print "it's found\n"; 
            print "but at what index are the two Ds";
          }   
        else {
            $sequence.=$line;
            print "came in else\n";
        }
}
A: 

You could split your string with the regex and output the size of the first array element, if there are more than one elemnts in the array. A simple sample:

my $test="123;456";
my @help=split(';', $test);
if ($#help>0) {
    print "Index is:".length($help[0]);
}

Edit: This fits to your simple example, but not fully with your text - if the regex gets more complex, then the size of the split criteria gets flexible again. Then you need to determine the index of the second element of the array to determine the size of the split criteria.

weismat
I dont think that will quite work. In this case you are expecting that my regex will match begining of the string.
This is an additional case which is not covered correctly - the correct condition would be then that the first element of the array is different than the original string and the correct index is the difference of the length between the first array element and the original string.
weismat
+7  A: 

I believe you are looking for pos:

 #!/usr/bin/perl

use strict;
use warnings;

my $sequence;
while (my $line = <DATA>) {
    if ($line=~ /(HDWFLSFKD)/g){
     print "its found index location: ", 
      pos($line)-length($1), "-",  pos($line), "\n";
    } else {
     $sequence .= $line;
     print "came in else\n";
    }
}

__DATA__
MLTSHQKKF*HDWFLSFKD*SNNYNSKQNHSIKDIFNRFNHYIYNDLGIRTIA
MLTSHQKKFSNNYNSKQNHSIKDIFNRFNHYIYNDLGIRTIA
MLTSHQKKFSNNYNSK*HDWFLSFKD*QNHSIKDIFNRFNHYIYNDLGIRTIA

You can also use the @- and @+ variables:

#!/usr/bin/perl

use strict;
use warnings;

my $sequence;
while (my $line = <DATA>) {
        if ($line=~ /HDWFLSFKD/){
                print "its found index location: $-[0]-$+[0]\n";
        } else {
                $sequence .= $line;
                print "came in else\n";
        }
}

__DATA__
MLTSHQKKF*HDWFLSFKD*SNNYNSKQNHSIKDIFNRFNHYIYNDLGIRTIA
MLTSHQKKFSNNYNSKQNHSIKDIFNRFNHYIYNDLGIRTIA
MLTSHQKKFSNNYNSK*HDWFLSFKD*QNHSIKDIFNRFNHYIYNDL
Chas. Owens
are pos and @position functions same?
what does $1 do?
$1 holds the first capture. I am using it because I do not want to hardcode the length of the search string. The pos function tells you where the last match left off in a given string. The @- and @+ arrays are set to the beginning and ending of matches respectively. $-[0], $+[0] are the beginning and end of the whole match, $-[1], $+[1] are the beginning and end of the first capture, $-[2], $+[2] is the second capture, and so on.
Chas. Owens
If I'm reading the problem right, you want the regex /H(DWFLSFKD)/ because he wants the position between of the D's. I take it the asterisks are there for highlighting rather than literal data.
brian d foy