tags:

views:

44

answers:

3

Hi, I'm having problems with removing duplicate lines in a file, and replacing them a with a non-duplicate line. Ideally, I would just like to replace it with a continuous sequence, so that the duplicate lines could be separated.

I was considering sed with some kind of wildcard (*):

sed -e "s/text_pattern/text_pattern*/g" my_file.txt

So that a new number is added to the text_pattern every time is returns. However, I haven't been able to find a proper solution in the man pages and on the internet. Does anybody have an idea of how to do something like this? Perhaps sed is not the best choice?

Thanks!

A: 

Awk seems more suitable for this task. I'm going to assume you don't really need a regex, but want to match the complete line with a fixed string. Then you can do this:

awk -v ln="text_pattern" '$0 == ln { $0 = $0 " " ++i };1' my_file.txt
schot
A: 

I don't believe sed is the tool for this. If you need regular expressions for tasks like these, you could go with perl (which builds upon both awk and sed).

cat test | perl -e '$i = 1; while (<>) { chomp($_); if (s/pattern/pattern$i/) { $i++ }; print $_."\n"; }'

That is, for each line in stdin: Remove newline, then append the counter to pattern, iff you find it. And, if you find it, increase it by 1. Then print the line.

EDIT: test is your input file.

steinar
A: 

uniq

  • uniq -c input.txt shows you the frequency of occurrences.
  • uniq -u input.txt prints unique lines.

awk

  • awk 'x[$0]++' input.txt prints the duplicate lines.
  • awk '!x[$0]++' input.txt deletes duplicate lines.

sed

  • sed '$!N; /^\(.*\)\n\1$/P; D' input.txt prints the duplicate lines.
  • sed '$!N; /^\(.*\)\n\1$/!P; D' input.txt deletes duplicate lines.
Babil