views:

868

answers:

3

Not use any programming language. Only use regular expression. is it possible?

For example input>>

11
22
22  <-must remove
33
44
44  <-must remove
55

Output>>

11
22
33
44
55
+6  A: 

Regular-expressions.info has a page on Deleting Duplicate Lines From a File

Ben James
+1  A: 

See my request for more info, I'm answering in the easy way now.

  1. If the order doesn't matter, just a

    sort -u

    will do the trick

  2. If the order does matter but you don't mind re-run multiple passes (this is vim syntax), you can use:

    %s/\(.*\)\(\_.*\)\(\1\)/\2\1/g

    to preserve the last occurrence, or

    %s/\(.*\)\(\_.*\)\(\1\)/\1\2/g

    to preserve the first occurrence.

If you do mind re-run multiple passes, than it's more difficult, so before we work on that, please say so in the question!

EDIT: in your edit you weren't very clear, but it looks like you want just a single-pass duplicate ADJACENT lines removal! Well, that's much easier!

A simple:

/(.*)\1*/\1/

(/\(.*\)\1*/\1/ in vim) i.e. searching for (.*)\1* and replacing it with just \1 will do the trick

Davide
`(.*)\1*` does not match duplicate lines because there's nothing in your regex that matches the line break between the line and its duplicate.
Jan Goyvaerts
+2  A: 

In RegexBuddy you can do this as follows:

  1. On the Library tab, load the RegexBuddy.rbl library if not loaded by default.
  2. In the lookup box, type "duplicate"
  3. Click the Use button to load the "delete duplicate lines" regex.
  4. On the GREP tab, specify the folder and file mask of the files you want to delete duplicates from.
  5. In the drop-down menu of the GREP button, select Execute.

If you're only doing this on one file, you can use the Test tab instead of the GREP tab. Load the file on the Test tab, and then click the Replace button in the main toolbar.

Jan Goyvaerts