views:

7225

answers:

4

I have a string from which I wish to extract a single word, but with a numerical appended to it, which might be different in each line:

This is string1 this is string
This is string11 
This is string6 and it is in this line

I want to parse this file and get the the values of "stringXXX", starting from 0 to 100

# suppose ABC.txt contains the above lines
FH1 = open "Abc.txt"; 
@abcFile = <FH1>;

foreach $line(@abcFile) {
    if ($pattern =~ s/string.(d{0}d{100});
        print $pattern;

The above prints the whole line, I wish to get only stringXXX

+11  A: 

you need to capture it:

while ($pattern =~/(string(100|\d{1,2}))/g) {
    print $1;
}

Explanation:

  • the parentheses capture what's in them into $1. If you have more than one set of parens, the 1st captures into $1, the 2nd into $2 etc. In this case $2 will have the actual number.
  • \d{1,2} captures between 1 and 3 digits, allowing you to capture between 0 and 99. The additional 100 there allows you to capture 100 explicitly, since it's the only 3-digit number you want to match.

edit: fixed the order of the numbers that are captured.

Nathan Fellman
thanks that helped :-)
gagneet
if // -> while //g
J.F. Sebastian
Thanks @J.F. I updated the answer
Nathan Fellman
(\d{1,2}) captures between 1 and *2* digits.
J.F. Sebastian
Your regexp captures '10' for 'string100'. It should be (100|\d{1,2}) to capture 100.
J.F. Sebastian
A: 

Just change print $pattern to print $&, which is already captured.

ididak
The problem is that the capturing is done wrong.
Nathan Fellman
mpeters
ididak
+4  A: 

Abc.pl:

#!/usr/bin/perl -w    
while(<>) {
    while (/(string(\d{1,3}))/g) {      
    print "$1\n" if $2 <= 100;
    } 
}

Example:

$ cat Abc.txt 
This is string1 this is string
This is string11 
This is string6 and it is in this line
string1 asdfa string2
string101 string3 string100 string1000
string9999 string001 string0001

$ perl Abc.pl Abc.txt
string1
string11
string6
string1
string2
string3
string100
string100
string001
string000

$ perl -nE"say $1 while /(string(?:100|\d{1,2}(?!\d)))/g" Abc.txt
string1
string11
string6
string1
string2
string3
string100
string100

Note the difference between the outputs. What is preferable depends on your needs.

J.F. Sebastian
A: 

Don't overspecify. To capture the numeric portion, just use (\d+) . This will capture a number of any length, so that some day when the monkeys who are providing you with this file decide to expand their range up to 999, you will be covered. It's also less thought, both now when you are writing, and later when you are maintaining.

Be strict in what you emit, but be liberal in what you accept.

skiphoppy
it actually depends on the spec you're given. If you're writing a throwaway script to capture only these numbers, you don't want to use (\d+)
Nathan Fellman
I can't figure it out, Nathan ... why not? If I'm just writing a throwaway script, I don't want to invest extra time to make the regex more complicated than that.
skiphoppy