ansaurus

Question

Unable to search repeated sentences in Vim

Answer 1

+1 A:

Did you mean: removing duplicate lines

ldigas 2009-04-19 11:29:31

I do not mean that exactly. I want to find lines which have two first words the same.

Masi 2009-04-19 11:32:38

Answer 2

+1 A:

I can't find out short vim style solution for this. That's why here is the vim script.

function! s:escape(line)
    return escape(a:line, '[]*')
endfunction

function! s:highlight_similar(pattern, extraction)
    let sorted = sort(getline(1, '$'))
    let matched_lines = {}

    let pattern = '^\s*\(\w\+\)\s\+\(\w\+\).*$'
    let previous_part = ''
    let previous_line = ''

    for i in range(0, line('$') - 1)
        let line = sorted[i]
        if line !~ a:pattern
            continue
        endif
        let part = substitute(line, a:pattern, a:extraction, '')
        if empty(part)
            continue
        endif
        if part == previous_part
            let matched_lines[s:escape(line)] = i
            let matched_lines[s:escape(previous_line)] = i
        else
            let previous_part = part
            let previous_line = line
        endif
    endfor
    let @/ = join(keys(matched_lines), '\|')
endfunction

And commands definition that should be in the same file

command! -nargs=0 HighlightTwoWords
      \call <SID>highlight_similar('^\s*\(\w\+\)\s\+\(\w\+\).*$', '\1 \2')
command! -nargs=0 HighlightTwoRows
      \call <SID>highlight_similar('^\s*\(.*\)\s*$', '\1')

And then after using 'HighlightTwoWords' command you will be able to use 'n' and 'N' to move through lines you are interested in. Or by using 'hls[earch]' command you can highlight those lines.

Mykola Golubyev 2009-04-19 11:52:59

I have two lines. $1: set history=1000 "comment. and $2: set history=1000. It cannot find those duplicates.

Masi 2009-04-19 12:05:49

You might just need to change the \w's to [^\s] if your 'words' can consist of anything except whitespace (not sure how that would be formatted in VIM, but I'm sure someone can help out)

cmptrgeekken 2009-04-19 14:12:56

@cmptrgeekken: First comment is to first version of the answer. This version handles things.

Mykola Golubyev 2009-04-19 16:22:48

Answer 3

+1 A:

Do you mean: find any line for which the first two words are identical?

Try this:

/^(\w+) \1

or more generally

/^(\w+)\s\1

le dorfier 2009-04-19 14:15:18

he meant lines that identical in first two words

Mykola Golubyev 2009-04-19 16:22:04

Answer 4

+2 A:

You're being a little ambiguous.

If you want to find lines where the second word is a repeat of the first word, e.g.
```
  dog dog this line
  dog cat not this line
  cat cat this line
  cat dog not this line
```
Then use the following regex:
```
  /^\s*$\w\+$\s\+\1
```
The first word is captured by the $\w\+$, and matched again by the backreference \1

If you want to group lines by their first two words, e.g.

  a a is the first line in group 'a a'
  a b is the first line in group 'a b'
  a b is the second line in group 'a b'
  -------------nogroup---------------
  a b is the third line in group 'a b'
  b b is the first line in group 'b b'
  a b is the fourth line in group 'a b'
       b b is the second line in group 'b b'

then :sort is your friend. However, if you just run :sort, you'll get this:

       b b is the second line in group 'b b'
  -------------nogroup---------------
  a a is the first line in group 'a a'
  a b is the first line in group 'a b'
  a b is the fourth line in group 'a b'
  a b is the second line in group 'a b'
  a b is the third line in group 'a b'
  b b is the first line in group 'b b'

Note how the fourth line in group 'a b' is placed after the first, and the second line in group 'b b' is placed first, because of the leading spaces. This is because :sort, by default, uses the entire line. To get it to sort only by the first two words, and to preserve order otherwise, use :sort /^\s*\zs\w\+\s\+\w\+/ r

  -------------nogroup---------------
  a a is the first line in group 'a a'
  a b is the first line in group 'a b'
  a b is the second line in group 'a b'
  a b is the third line in group 'a b'
  a b is the fourth line in group 'a b'
  b b is the first line in group 'b b'
       b b is the second line in group 'b b'

The ^\s*\zs tells it to ignore leading spaces and the \w\+\s\+\w\+ tells it to use the first two words as a sort key. The r option tells :sort to use the given pattern as a sort key. For more, see :help :sort.

If you want to see what changed, there are two tactics that I can think of to help you:

You could save your file, sort the lines, then save a copy as a different name, then use vim's built in diff capabilities to compare the two:

:w                                 "save your file
:sort /^\s*\zs\w\+\s\+\w\+/ r      "sort it by the first two words
:w %.sorted                        "save the sorted version in a new file with a .sorted extension
:undo                              "undo your changes to the original
:vs %.sorted                       "open the sorted version in a new window
:windo diffthis                    "compare the two versions

However, this may not give you very useful feedback.

What might give you more useful feedback is to instead insert the line numbers before the sort, so you can see what line numbers in your original file went where. To do this, try the following:
```
:%s/^/\=line('.') . ' '
:sort /^\d\+\s*\zs\w\+\s\+\w\+/ r
```
The %s/^/\=line('.') . ' ' inserts the line number at the beginning of each line. Then the sort (slightly modified to ignore the line number) does its thing. For the above example, this produces:
```
4 -------------nogroup---------------
1 a a is the first line in group 'a a'
2 a b is the first line in group 'a b'
3 a b is the second line in group 'a b'
5 a b is the third line in group 'a b'
7 a b is the fourth line in group 'a b'
6 b b is the first line in group 'b b'
8      b b is the second line in group 'b b'
```
So now you know what line came from where.

rampion 2009-04-19 14:25:20

Thank you for your answer! The match :sort /^\s*\zs\w\+\s\+\w\+/ r is nearly perfect. Do you know how to ignore matches which start with if and ends with endif?

Masi 2009-04-19 16:31:00

My comments seems to be a problem in my vimrc at http://dpaste.com/35562/ . I need to somehow link them to the appropriate codes.

Masi 2009-04-19 16:37:58

ansaurus

tags:

views:

answers:

Unable to search repeated sentences in Vim

related questions