views:

257

answers:

3

I am trying to put the stuff within parentheses into the value of a src attribute in an img tag:

while(<TOCFILE>)
{
    $toc_line = $_;
    $toc_line =~ s/<inlineFig.*?(\.\.\/pics\/ch09_inline99_*?\.jpg)*?<\/inlineFig>/<img src="${1}" alt="" \/\>/g;
    $new_toc_file .= $toc_line;
}

So I expected to see tags like this in the output:

<img src="../pics/ch09_inline99_00" alt="" />

But instead I'm getting:

<img src="" alt="" />
+11  A: 

There's an error in your regex so that phrase will never match anything:

inline99_*?\.jpg
        ^^^

I think you forgot \d in front of the star, judging by the example data you are trying to match.

You're not even asking that it'll match, as you put a *? after the captured group. So, it just doesn't match anything. So that's what you get: nothing.

Besides:

($PATTERN)*?

will only capture the last thing it matched. That probably isn't what you want, either. For example:

$_ = 'one two three';
s/(\w+\s*)*/$1/;
print;

prints "three".

bart
Indeed. Almost every time I have this problem, it's a problem with my pattern.
brian d foy
+3  A: 

1) could used some examples of what you are parsing.

2) if use use "x" on the end of the expression, you can put white space and comments in the regular expression to make it more understandable

3) Also, by breaking it down, you'll notice that the second part of the stuff inside of ( ) was missing the match for numbers... instead looking for 0 or more '_', and breaking when it saw the numbers, thus not matching.

while(<TOCFILE>)
{
    $toc_line = $_;
    $toc_line =~ 
      s/                  # replace the follwoing     

         <inlineFig                     # match this text             
         .*?                            # then any characters until the next sequence matches
         (                              # throw the match into $1
            \.\.\/pics\/ch09_inline99_  # ..\pics\cho9_inline99_
            \d*?\.jpg                   # folowed by 0 or more numbers
         )*?                            # keeping doing that until the next sequence matches
         <\/inlineFig>                  # match this text

       /                  # with the follwoing


         <img src="${1}" alt="" \/\>    # some text and the result of $1 above.

       /xg;  # <- the x makes it ignore whitespace and #comments
    $new_toc_file .= $toc_line;
}

4) as mentioned, ()*? only returns the last match into $1, but this shouldn't be a problem if your input is only going to be of a certain format.

Ape-inago
+1  A: 

Fix your pattern, as bart suggested, and consider using the "topic" variable $_ instead of explicitly assigning the data read from the filehandle to another variable.

#!/usr/bin/perl

use warnings;
use strict;

my $new_toc_file;

{
    # localizing $_ protects any existing value in the global $_
    # you should localize $_ even if you choose to assign it to a variable

    local $_;

    while(<DATA>) { 
        # in the absence of the bind operator =~, s/// operates against $_
        s!<inlineFig.*?(\.\./pics/ch09_inline99_.*?\.jpg)</inlineFig>!<img src="$1" alt="" />!g;
        $new_toc_file .= $_;
    }
}

print $new_toc_file, "\n";

__END__
<inlineFig>../pics/ch09_inline99_00.jpg</inlineFig>
converter42
good coding practice usually is to use a descriptive variable...perhaps while( my $topic = <DATA> ) { }...avoid the chance of rewriting $_ altogether.
Ape-inago
With a substitution, you'll often want to leave the original data as is and modify a copy. In a short example it seems odd, but this is only a short example rather than a useful program.
brian d foy