views:

82

answers:

2

Hi, I am trying to construct a regular expression to search and replace a file. The following is the script.

#!use/bin/perl 
use strict; 
use warnings; 
my $line = $ARGV[0]; 
my $find = "[^a-zA-Z0-9]+seqfile[^a-zA-Z0-9]+=[^a-zA-Z0-9]+[a-z]+.."; 
my $replace = "done"; open (FILE, ">>/home/user/Desktop/test") || die "cant open file \n"; 
my @body = <FILE>; 
foreach $line (@body) { 
if (my $line =~ s/$find/$replace/g){ 
print FILE $line;
} 
else { 
print "did not replace \n\n"; 
} 
} 
close(FILE); 
print "reached here\n"; 
exit;

The sample test file I am running to test my program consists of few lines of text. The string I want to replace is present on first line being " tobereplaced = file.aa ". I had to use carot (^) for characters other than alphabets/digits because regex for space "\s" is not accepted in my system. I know the program is executed because it prints 'reached here'. Can anyone suggest

  1. why is my program not able to search for string using the regex I specify.
  2. Why does my system not recognize '\s' and give error "Unrecognized escape \s passed through at test"
  3. And also, can anyone suggest some good source for studying regex.

Thanks

+2  A: 

\s is not being accepted because you are using a double quoted string. The double quoted string is trying to make sense of \s and doesn't know what to do with it, you can say any of the following things to make it work:

  • "\\s+seqfile\\s+=\\s+[a-z]+.."
  • '\s+seqfile\s+=\s+[a-z]+..'
  • qr/\s+seqfile\s+=\s+[a-z]+../

The last one is the preferred form because it creates a compiled regex that will be faster than a normal string. The compiled regex will stringify if you use it in a context that doesn't expect a regex, so you can say

print "$find\n";

and get back (?-xism:\s+seqfile\s+=\s+[a-z]+..).

Also, if you are going to negate a character class you must put the caret inside the character class: [^a-zA-Z0-9] means not alphanumeric (for ASCII at least), but ^[a-zA-Z0-9] mean match an alphanumeric at the start of the string (or the start of a line if the /m option is set).

Also, when a file is opened in >> mode you cannot read from it. I have changed your code to read from STDIN (or files on the commandline) and write to STDOUT. This is a standard Perl technique called filtering. It allows you to build pipelines of programs. You can run the script like this

./script.pl inputfile > outputfile

or this

cat inputfile | ./script.pl > outputfile

Here is the script

#!use/bin/perl 

use strict; 
use warnings; 

my $find    = qr{ \s+ seqfile \s+ = \s+ [a-z]+ .. }x; 
my $replace = "done";

while (<>) {
    s/$find/$replace/g;
    print;
}

It can also be boiled down to a one-liner:

perl -pe 's/\s+seqfile\s+=\s+[a-z]+../done/g' inputfile

Good sources for studying regexes would be:

Chas. Owens
As I mentioned, whenever I use \s in my script it gives me an error - "Unrecognized escape \s passed through file". So I am guessing it is not recogized?
shubster
This is because you are using a double quoted string instead of a regex quote-like operator (http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators). You could say \\s if you really want to use a string instead of qr//, but qr// is better for many reasons.
Chas. Owens
If i want to overwrite a file (open(FILE,'>testfile') by searching for exactly 6 spaces followed by word etc, will the following regex work? $find = '{\s,6}seqfile\s=\snew1.aa' and replace it by $replace = '{\s,6}seqfile\s=\snew2.aa' thanks.
shubster
A: 

You've opened a file in append mode and then tried to both read and write it. It is possible to both read and write to a file, but you need to use a different mode. But unless you are wanting to replace exactly the same number of characters, you are going to have to read from one file and write everything (both changed and unchanged parts) out to a second file.

ysth