tags:

views:

91

answers:

3

Dear all, I have data that looks like below, the actual file is thousands of lines long.

 Event_time                 Cease_time                
 Object_of_reference                                                                                                                                                                                                                                             
 -------------------------- --------------------------
 ----------------------------------------------------------------------------------                       

    Apr  5 2010  5:54PM                       NULL
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=LUGALAMBO_900                                                                                                                                             
    Apr  5 2010  5:55PM        Apr  5 2010  6:43PM
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=LUGALAMBO_900                                                                                                                                             
    Apr  5 2010  5:58PM                       NULL
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=BULAGA                                                                                                                                                    
    Apr  5 2010  5:58PM        Apr  5 2010  6:01PM
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=BULAGA                                                                                                                                                    
    Apr  5 2010  6:01PM                       NULL
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=BULAGA                                                                                                                                                    
    Apr  5 2010  6:03PM                       NULL
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=KAPKWAI_900                                                                                                                                               
    Apr  5 2010  6:03PM        Apr  5 2010  6:04PM
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=KAPKWAI_900                                                                                                                                               
    Apr  5 2010  6:04PM                       NULL
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=KAPKWAI_900                                                                                                                                               
    Apr  5 2010  6:03PM        Apr  5 2010  6:03PM
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=BULAGA                                                                                                                                                    
    Apr  5 2010  6:03PM                       NULL
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=BULAGA                                                                                                                                                    
    Apr  5 2010  6:03PM        Apr  5 2010  7:01PM
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=BULAGA             

As you can see, each file has a header which describes what the various fields stand for(event start time, event cease time, affected element). The header is followed by a number of dashes. My issue is that, in the data, you see a number of entries where the cease time is NULL i.e event is still active. All such entries must go i.e for each element where the alarm cease time is NULL, the start time, the cease time(in this case NULL) and the actual element must be deleted from the file. In the remaining data, all the text starting from word SubNetwork upto BtsSiteMgr= must also go. Along with the headers and the dashes.
Final output should look like below:

    Apr  5 2010  5:55PM        Apr  5 2010  6:43PM
 LUGALAMBO_900                                                                                                                                                                                                                                                                                        
    Apr  5 2010  5:58PM        Apr  5 2010  6:01PM
 BULAGA                                                                                                                                                                                                                                                                                                                                                                            
    Apr  5 2010  6:03PM        Apr  5 2010  6:04PM
 KAPKWAI_900                                                                                                                                                                                                                                                                                       
    Apr  5 2010  6:03PM        Apr  5 2010  6:03PM
 BULAGA                                                                                                                                                                                                               
    Apr  5 2010  6:03PM        Apr  5 2010  7:01PM
 BULAGA                                                                                       

Below is a Perl script that I have written. It has taken care of the headers, the dashes, the NULL entries but I have failed to delete the lines following the NULL entries so as to produce the above output.

#!/usr/bin/perl
use strict;
use warnings;
$^I=".bak" #Backup the file before messing it up.
open (DATAIN,"<george_perl.txt")|| die("can't open datafile: $!"); # Read in the data
open (DATAOUT,">gen_results.txt")|| die("can't open datafile: $!"); #Prepare for the  writing
while (<DATAIN>) {
s/Event_time//g;
s/Cease_time//g;
s/Object_of_reference//g;
s/\-//g; #Preceding 4 statements are for cleaning out the headers
my $theline=$_;
if ($theline =~ /NULL/){
 next;
 next if $theline =~ /SubN/;
 }
 else{
   print DATAOUT $theline;
  }
 }
   close DATAIN;
   close DATAOUT;     

Kindly help point out any modifications I need to make on the script to make it produce the necessary output. Will be very glad for your help Kind regards George.

A: 
s/^.*NULL\r?\n.*\r?\n.*\r?\n//mg;

should filter out the lines that end in NULL plus the two following lines.

Tim Pietzcker
Hi Tim,I have tested it out by incorporating it into my script. It doesn't remove the NULL values. They are still there in the output. Please modify.Thanks for the effort.GL.
@george-lule: "Please modify?" You have to be joking! What a great way to ask for help.perldoc perlreand figure it out. Otherwise, I suggest you try APL. Or ADA. Or Pascal. Or Brainfuck.
xcramps
Changing the `/mg` to `/sg` should do the trick
Zaid
@Zaid: `/m` is intentional - I want `^` to match the start of the line (instead of start of string); I *don't* want the dot to match newlines (which is what `/s` would do).@george-lule: This regex action has to be applied to the entire input, not single lines. It works on the test data you provided.
Tim Pietzcker
@Tim: My bad, I should've paid attention to the `\n` s in your regex
Zaid
+1  A: 

Looks like a good candidate for a little input record separator ($/) trickery. The idea is to manipulate it so that it deals with one record at a time, rather than the default single line.

use strict;
use warnings;

$^I = '.bak';

open my $dataIn, '<', 'george_perl.txt' or die "Can't open data file: $!";
open my $dataOut, '>', 'gen_results.txt' or die "Can't open output file: $!";

{
    local $/ = "\n\t"; # Records have leading tabs

    while ( my $record = <$dataIn> ) {

        # Skip header & records that contain 'NULL'
        next if $record =~ /NULL|Event_time/;

        # Strip out the unwanted yik-yak
        $record =~ s/SubNetwork.*BtsSiteMgr=//s;

        # Print record to output file
        print $dataOut $record;
    }
}

close $dataIn;
close $dataOut;

Pay attention to the following:

  • use of the safer three-argument form of open (the two-argument form is what you've shown)
  • use of scalar variables rather than barewords for defining filehandles
  • use of the local keyword and extra curlies to modify the definition of $/ only when needed.
  • the second s in s/SubNetwork.*BtsSitMgr=//s allows matches over multiple lines as well.
Zaid
Hi Zaid,Your suggestion clears only the line containing the NULL value but leaves the rest of the entry associated with it(the two lines below). These 2 lines must also go. Like mentioned, for cease time NULL, that entire record including the element must go. Kindly modify.Thanks for the effortGL.
@george-lule: Are those tabs or 3 spaces at the beginning of each record? If you could explain how your records are separated, I can tell you what the `$/` string should be.
Zaid
Hi Zaid,They are tabs.RegardsGL.
@george-lule: Updated. Changed the `"\n "` to `"\n\t"`
Zaid
Thanks Zaid, but the issue has been solved by FM above. Let me try out your code though, never know when it can come in handy ;-)
@george-lule: I just tested out my code and updated it accordingly. It should work just as you want it to.
Zaid
+1  A: 

Your data arrives in sets of 3 lines, so one approach is to organize the parsing that way:

use strict;
use warnings;

# Ignore header junk.    
while (<>){
    last unless /\S/;
}

until (eof) {
    # Read in a set of 3 lines.
    my @lines;
    push @lines, scalar <> for 1 .. 3;

    # Filter and clean.
    next if $lines[0] =~ /\sNULL\s/;
    $lines[2] =~ s/.+BtsSiteMgr=//;

    print @lines[0,2];
}
FM
Hi FM,I am getting a blank output, after integrating your suggestion, code is below. Where could I be going wrong?#!/usr/bin/perluse strict;use warnings;open (DATAIN,"<george_perl.txt")|| die("can't open datafile: $!"); open (DATAOUT,">gen_results.txt")|| die("can't open datafile: $!"); while (<DATAIN>){ last unless /\S/;}until (eof) { my @lines; push @lines, scalar <> for 1 .. 3; next if $lines[0] =~ /\sNULL\s/; $lines[1] = ''; $lines[2] =~ s/.+BtsSiteMgr=//; print DATAOUT @lines;}close DATAIN;close DATAOUT;
@george-lule You are missing your `DATAIN` file handle in two places: `until (eof DATAIN)` and `... scalar <DATAIN> ...`.
FM
Thanks FM.The header was still a problem but everything is now working. Thanks alot once again.