views:

94

answers:

3

I have a file contents of which looks like this

123,1,ABC,DEF
123,1,ABC
345,4,TZY
456,3,XYZ
333,4,TTT,YYY
333,4,TTT

I want to ignore lines with the previous and next contents the same i.e lines containing 123 and 333

Output needs to be

345,4,TZY
456,3,XYZ

Any ideas on how to go about this

A: 

try the uniq utility

uniq -w 3 your_file.txt

would do the trick. no need for perl

George
You need to add the -u flag to produce the desired output (see the question)
Hai Vu
also, -w argument to uniq may not be supported on every Unix (it is not on Solaris 10)
DVK
+2  A: 

TMTOWTDI:

my $str = join '', <>;
$str =~ s/^(\d+).+\n(\1.+\n)+//mg;
print $str;

EDIT: first line can also be replaced with Randal L. Schwartz's slurp:

my $str = do { local $/; <HANDLE> }; #
FM
This is **really** cool, but performance will degrade dramatically on very large file since you're slurping. I posted a solution that is constant memory use (albeit, much wordier and less cool) below
DVK
Also, as per perl best practices, you should use File::Slurp to slurp the file in the first place.
DVK
@DVK Yeah, definitely no good for big files. I was trying to come up with a slick map-grep-map solution to the problem, but it got ugly fast. Still mulling it over...
FM
+1  A: 

TMTOWDI

my $last_prefix = ""; 
my $last_line = ""; 
while (<>) { check_line($_); }
check_line("");    sub check_line {
    my $line = shift;
    my ($prefix) = ($line =~ /^([^,]*),/); 
    if (($prefix || "") ne $last_prefix ) {
        print $last_line;
        $last_line = $_;
    } else {
        $last_line = "";
    };
    $last_prefix = $prefix; 

}

This is wordy but I suspect the performance might be better than regexp on a very large file.

DVK