tags:

views:

198

answers:

4

How would you go about marking all of the lines in a buffer that are exact duplicates of other lines? By marking them, I mean highlighting them or adding a character or something. I want to retain the order of the lines in the buffer.

Before:

foo
bar
foo
baz

After:

foo*
bar
foo*
baz
+3  A: 

Run through the list once, make a map of each string and how many times it occurs. Loop through it again, and append your * to any string that has a value of more than one in the map.

Justin
Brilliant!
innaM
Any chance we could get some code?
technomalogical
+1  A: 

Try:

:%s:^\(.\+\)\n\1:\1*\r\1:

Hope this works.

Update: next try.

:%s:^\(.\+\)$\(\_.\+\)^\1$:\1\r\2\r\1*:
Zsolt Botykai
This will only detect adjacent duplicate lines, and will only mark the first copy, not the second.
rampion
You are right. I tried again.
Zsolt Botykai
+5  A: 

As an ex one-liner:

:syn clear Repeat | g/^\(.*\)\n\ze\%(.*\n\)*\1$/exe 'syn match Repeat "^' . escape(getline('.'), '".\^$*[]') . '$"' | nohlsearch

This uses the Repeat group to highlight the repeated lines.

Breaking it down:

  • syn clear Repeat :: remove any previously found repeats
  • g/^\(.*\)\n\ze\%(.*\n\)*\1$/ :: for any line that is repeated later in the file
    • the regex
      • ^\(.*\)\n :: a full line
      • \ze :: end of match - verify the rest of the pattern, but don't consume the matched text (positive lookahead)
      • \%(.*\n\)* :: any number of full lines
      • \1$ :: a full line repeat of the matched full line
    • exe 'syn match Repeat "^' . escape(getline('.'), '".\^$*[]') . '$"' :: add full lines that match this to the Repeat syntax group
      • exe :: execute the given string as an ex command
      • getline('.') :: the contents of the current line matched by g//
      • escape(..., '".\^$*[]') :: escape the given characters with backslashes to make a legit regex
      • syn match Repeat "^...$" :: add the given string to the Repeat syntax group
  • nohlsearch :: remove highlighting from the search done for g//

Justin's non-regex method is probably faster:

function! HighlightRepeats() range
  let lineCounts = {}
  let lineNum = a:firstline
  while lineNum <= a:lastline
    let lineText = getline(lineNum)
    let lineCounts[lineText] = (has_key(lineCounts, lineText) ? lineCounts[lineText] : 0) + 1
    let lineNum = lineNum + 1
  endwhile
  exe 'syn clear Repeat'
  for lineText in keys(lineCounts)
    if lineCounts[lineText] >= 2
      exe 'syn match Repeat "^' . escape(lineText, '".\^$*[]') . '$"'
    endif
  endfor
endfunction

command! -range=% HighlightRepeats <line1>,<line2>call HighlightRepeats()
rampion
+2  A: 

Why not use:

V*

in normal mode.

It simply searches all matches of current line, thus highlighting them (if the setting is enabled, which I think it's the default) Besides, you can then use

n

To navigate through the matches

Lonecat
Visual mode doesn't support * by default. It's probably a function you have in your .vimrc. Something like this:xno * :<c-u>cal<SID>VisualSearch()<cr>/<cr>xno # :<c-u>cal<SID>VisualSearch()<cr>?<cr>fun! s:VisualSearch() let old = @" | norm! gvy let @/ = '\V'.substitute(escape(@", '\'), '\n', '\\n', 'g') let @" = oldendf)
Michael
Arg, the formatting messed up. Here's what I meant: http://pastebin.com/f2ee37c92
Michael
Yep you're right :)
Lonecat
It would only match one thing at a time, whereas I'd prefer to indicate all lines that are duplicates of other lines all at once. Nice function though, seems handy.
Brian Carper