tags:

views:

94

answers:

2

I have one text string which is having some duplicate characters (FFGGHHJKL). These can be made unique by using the positive lookahead:

$ perl -pe 's/(.)(?=.*?\1)//g']

For example, with "FFEEDDCCGG", the output is "FEDCG".

My question is how to make it work on the numbers (Ex. 212 212 43 43 5689 6689 5689 71 81 === output should be 212 43 5689 6689 71 81) ? Also if we want to have only duplicate records to be given as the output from a file having n rows

212 212 43 43 5689 6689 5689 71 81
66 66 67 68 69 69 69 71 71 52
..

Output:

212 212 43 43 5689 5689
66 66 69 69 69 71 71

How can I do this?

+2  A: 

The following is untested, but should print out only the duplicates.

my $line = "212 212 43 43 5689 6689 5689 71 81\n";
chomp $line;

my %seen;
my @order;
foreach my $elem (split /\s+/, $line) {
  ++$seen{$elem};
  push @order, $elem if $seen{$elem} == 2;
}

foreach my $elem (@order) {
  print "$elem " x $seen{$elem};
}
print "\n";

For removing duplicates, you can now:

print "$_ " for keys %seen;

BUT that doesn't retain the order. You can do something similar as I did for printing out the dupes only. Or use a module like Tie::Hash::Indexed (thanks, daxim) or Tie::IxHash

tsee
Teaching an old dog a new trick: promote `Tie::Hash::Indexed` over `Tie::IxHash`.
daxim
Hi thanks for the help :) I modified it a bit and the final code is (Hope someone else will also get benefit)#!/usr/bin/perl#open (MYFILE, "FILENAME");foreach $line (<MYFILE>) { chomp $line; my %seen; my @order; foreach my $elem (split /\s+/, $line) { ++$seen{$elem}; push @order, $elem if $seen{$elem} == 2; } foreach my $elem (@order) { print "$elem " x $seen{$elem}; } print "\n"; }close (MYFILE);Thank u all once again
manu
A: 

For the first part

$ cat prog.pl
#! /usr/bin/perl -lp

my %seen;
$_ = join " " => map $seen{$_}++ ? () : $_ => split;

$ echo 212 212 43 43 5689 6689 5689 71 81 | ./prog.pl
212 43 5689 6689 71 81

For the second part

$ cat prog.pl
#! /usr/bin/perl -lp

my %dups;
my @nums = split;
++$dups{$_} for @nums;

$_ = join " " => grep $dups{$_} > 1 => @nums;

$ cat input
212 212 43 43 5689 6689 5689 71 81
66 66 67 68 69 69 69 71 71 52

$ ./prog.pl input
212 212 43 43 5689 5689
66 66 69 69 69 71 71
Greg Bacon