tags:

views:

1151

answers:

7

Hello,

The main problem I'm having is that my script runs, opens the text file, finds the string, and copies it to a new file, but sometimes it doesn't copy the whole line. It gets cut off at different points in the line. I believe is a problem with my regex.

A line of txt may look like this:

E03020039: Unable to load C:\Documents and Settings\rja07\Desktop\DSMProduct\project\Database\Schema\Source\MDB_data_type.dsm into \DSM R17\projects\Databases\Schema\Source\MDB_data_type.dsm . Text file contains invalid characters .

However, when the Perl script runs it sometimes only copies up until the words "text file" or "text file contains", and the last part of the line is cut off. I need the complete line. This is what i have so far:

if ($error =~ /E03020039/)
{
    print $error;
    open (MF, '>>G:/perl/error.txt');
    print MF $error;
    $count ++;
    }

This is all inside a for each loop which scans each line of the file:

I tried:

if ($error =~ /E03020039/&&/characters\s\.\n/)

but that doesnt help me at all. Thanks for any help.

A: 

If you are using a match pattern (// is the same as m//), the ~= operator should not modify the error string.

Are you 100% confident you aren't mangling it prior to the regex check? I'd stick a print line prior to the match and ensure you're accurately duplicating the input.

Are you 100% confident that you aren't running into IO buffering issues? Typically perl file IO is buffered, so if you're expecting to see the full, last, line of the logfile via tail -f or something you may be disappointed until the program exits.

See http://www.rocketaware.com/perl/perlfaq5/How_do_I_flush_unbuffer_a_fileha.htm for some options for how to enable auto-flushing for your file handle.

easel
not really looking into modifying the original error.txt file, the file itself contains over 10.000 lines of code and only a few of them are error lines
$|++ will turn off buffering.
MadCoder
+2  A: 

I don't think your regex has anything to do with this. Are you at least getting all the right lines in your new file, even if they are truncated?

I think you need to go through the normal debugging steps:

  • Can you show us a complete but minimal program that demonstrates the error? The problem might be somewhere else.

  • What is in $error? Does it have all of the line when you print it to stdout? If not, work backward until you find the point where the text goes missing. Print its value before and after the suspect operations and work backward until you find the problem.

  • Are you sure all of that text is on one line, or there aren't any extra weird characters in the file? What does $error have in it on the next line read?

  • What happens if you print everything to the new file (i.e. match all lines)? Does all the text end up in the new file?

  • Are the lines always truncated at the same point?

brian d foy
+7  A: 

While we wait for the information brian d foy suggested you provide, here's a few possibly things you should check.

Why?

Well, looking at the code snippet you posted, style-wise at least, you appear to be using some more traditional Perlisms, instead of modern improved ones, and doing things the modern way will generally make your life easier.

Are You using Strictures?

use strict; 
use warnings;

These 2 lines at the top of your code can help point out many silly mistakes.

If you cant afford to turn them on everywhere because you have too many errors, you can do them within a scope, ie:

 blah;  #no strict or warnings

 {   # scope 

     use strict; 
     use warnings; 
     code(); # with strict and warnings

 }

 blah; # no strict or warnings

Use lexical file-handles

Bare filehandles are untidy because they're globally unique, and that can get a bit messy.

{  #scope

  open my $fh , '>' , 'bar.txt'; 
  print $fh "Hello\n";

}  # file cleaned up and closed by perl!

Use 3-Arg open where possible

Good:

open my $fh, '>', 'bar.txt'; 
open my $otherfh, '<', 'foo.txt'; 
open my $iofh , '-|' , 'ls', '-la' ;

Not Recommended:

open my $fh, '>bar.txt'; 
open my $otherfh , '<foo.txt'; 
open my $iofh , 'ls -la |';

See perldoc -f open for details

Check to see if Opens actually worked or not

Generally, if open for any reason dies, default behavior is to keep on trucking, and this can be a bit weird.

There are several ways to handle this:

Option 1:

 use Carp(); 
 open my $fh , '>', $filename  or Carp::croak("Oh no! cant open $filename , $! $@");

Option 2:

 use autodie;
 open my $fh , '>', $filename;

As For that second regex

Thats probably not doing what you think its doing.

 if ($error =~ /E03020039/&&/characters\s\.\n/)

Is fundamentally the same as

 if (  
         ( $error =~ /E03020039/ ) 
     &&  ( $_     =~ /characters\s\.\n/ ) 
 )

Which is probably not what you intended.

I think you meant:

 if (  
          ( $error =~ /E03020039/ ) 
      &&  ( $error =~ /characters\s\.\n/) 
 )
Kent Fredric
Mind reader. :) ++
daotoad
Is there a particular reason you don't import croak? When I use croak, I always import it. Most of the code that I have seen that uses a fully qualified Carp::croak, is using the 'feature' that carp is/was loaded with warnings, to avoid explicitly doing a 'use Carp;'.
daotoad
I tend to "use Carp ()" which imports nothing and/or "use namespace::clean" prior to things, and then do everything explicitly. It makes for longer code, but makes for quicker mental-backtracing through code.
Kent Fredric
namespace::clean looks very interesting. Thanks for the pointer!
daotoad
A: 

I see a couple of things that stand out immediately:

  1. You are using a global filehandle and not closing it when done.
  2. You are using a two argument open (this isn't causing your issue, but it is best to avoid).
  3. Your altered regex does not do anything like you seem to think it does.

For 1 and 2:

# For loop around this:
if ($error =~ /E03020039/) {
    print $error;

    open(my $mf, '>>', 'G:/perl/error.txt') 
        or die "Unable to open error file - $!\n";

    print $mf $error;
    $count ++;

    close $mf
        or die "Unable to close error file - $!\n";
}

By using a lexical handle you prevent any other code from touching your handle without having passed explicitly. By closing the handle, you flush the handle's buffers. By checking for errors opening and closing the handle, you prevent uncaught errors leading to lost data.

You may wish to move the open and close outside your for loop:

my $count = 0;
open( my $mh, '>>', 'errorlog.log' ) or die "oops $!\n";
for my $error ( <$log_h> ) {

    if ( $error =~ /E23323232323/ ) {
         print $mh $error;
         print $error;
         $count++;
    } 

}
close $mh or die "oops $!\n";

Your code was reopening the same file into a global filehandle. This could easily be the cause of the problems you are seeing. It might not be. Does the correct information for error print to STDOUT?

Regarding issue 3, $error =~ /E03020039/&&/characters\s\.\n/ is equivalent to:

($error =~ /E03020039/) && ($_ =~ /characters\s\.\n/)

If you had warnings enabled you would (probably) have gotten the Use of uninitialized value in pattern match (m//) error message. It may have been surprising, but it would have been a clue that something was wrong.

I believe you wanted something like:

$error =~ /E03020039.*?characters\s.$/

But there is no reason to extend the match, since you are not capturing any part of the match. It will have no effect on the value in $error or what will be written to the file.

Unless you have a specific reason not to, always start your perl programs with these two pragmas:

use strict;
use warnings;

Even if you have a good reason not to use them, it is nearly always best to disable these pragmas only over a limited scope:

use strict;
use warnings;

{    no warnings 'uninitialized';
     no strict 'vars';
     print "$foo\n";
}
daotoad
A: 

Your regex is fine.

There can be 2 other issues:

  1. Your outer foreach loop has some error.
  2. You append to error.txt using open (MF, '>>G:/perl/error.txt');. So if you have multiple instances of this script running in parallel, that may cause problems with the output if all of them try to write to the file at the same time.

Alternatively you can use this simple Perl one-liner which will achieve what you wish to do:

perl -nle 'print if /E03020039/' inputFile.txt >> G:/perl/error.txt
Nikhil
A: 

If the intention is simply to get the job done - rather than to learn how to program in Perl - then use 'grep' to find the lines. That also assumes you aren't doing anything else in the script. If the intention is to learn about Perl, then you would ignore this advice and pay heed to the other answers.

Jonathan Leffler
A: 

Thanks to everyone who replied, I think I should have let everyone know that I'm a beginner self thought PERL learner. Some of the lingo and rules are very new to me and currently I'm still trying to fix my little problem.

Reason why i'm not using strictures is because i never read about them until now, I will be adding them to my code and read about them to completely understand their purpose.