ansaurus

Question

find duplicate lines and remove using regular expression with replace feature

Answer 1

+6 A:

Regular-expressions.info has a page on Deleting Duplicate Lines From a File

Ben James 2009-10-15 16:13:55

Answer 2

+1 A:

See my request for more info, I'm answering in the easy way now.

If the order doesn't matter, just a

sort -u

will do the trick
If the order does matter but you don't mind re-run multiple passes (this is vim syntax), you can use:

%s/\(.*\)\(\_.*\)\(\1\)/\2\1/g

to preserve the last occurrence, or

%s/\(.*\)\(\_.*\)\(\1\)/\1\2/g

to preserve the first occurrence.

If you do mind re-run multiple passes, than it's more difficult, so before we work on that, please say so in the question!

EDIT: in your edit you weren't very clear, but it looks like you want just a single-pass duplicate ADJACENT lines removal! Well, that's much easier!

A simple:

/(.*)\1*/\1/

(/\(.*\)\1*/\1/ in vim) i.e. searching for (.*)\1* and replacing it with just \1 will do the trick

Davide 2009-10-15 16:46:42

`(.*)\1*` does not match duplicate lines because there's nothing in your regex that matches the line break between the line and its duplicate.

Jan Goyvaerts 2010-02-27 10:24:07

Answer 3

+2 A:

In RegexBuddy you can do this as follows:

On the Library tab, load the RegexBuddy.rbl library if not loaded by default.
In the lookup box, type "duplicate"
Click the Use button to load the "delete duplicate lines" regex.
On the GREP tab, specify the folder and file mask of the files you want to delete duplicates from.
In the drop-down menu of the GREP button, select Execute.

If you're only doing this on one file, you can use the Test tab instead of the GREP tab. Load the file on the Test tab, and then click the Replace button in the main toolbar.

Jan Goyvaerts 2010-02-27 10:16:45

ansaurus

tags:

views:

answers:

find duplicate lines and remove using regular expression with replace feature

related questions