views:

1232

answers:

11

I often use the find command to search through source code, delete files, whatever. Annoyingly, because Subversion stores duplicates of each file in its .svn/text-base/ directories my simple searches end up getting lots of duplicate results. For example, I want to recursively search for uint in multiple messages.h and messages.cpp files:

# find -name 'messages.*' -exec grep -Iw uint {} +
./messages.cpp:            Log::verbose << "Discarding out of date message: id " << uint(olderMessage.id)
./messages.cpp:    Log::verbose << "Added to send queue: " << *message << ": id " << uint(preparedMessage->id)
./messages.cpp:                Log::error << "Received message with invalid SHA-1 hash: id " << uint(incomingMessage.id)
./messages.cpp:            Log::verbose << "Received " << *message << ": id " << uint(incomingMessage.id)
./messages.cpp:            Log::verbose << "Sent message: id " << uint(preparedMessage->id)
./messages.cpp:        Log::verbose << "Discarding unsent message: id " << uint(preparedMessage->id)
./messages.cpp:        for (uint i = 0; i < 10 && !_stopThreads; ++i) {
./.svn/text-base/messages.cpp.svn-base:            Log::verbose << "Discarding out of date message: id " << uint(olderMessage.id)
./.svn/text-base/messages.cpp.svn-base:    Log::verbose << "Added to send queue: " << *message << ": id " << uint(preparedMessage->id)
./.svn/text-base/messages.cpp.svn-base:                Log::error << "Received message with invalid SHA-1 hash: id " << uint(incomingMessage.id)
./.svn/text-base/messages.cpp.svn-base:            Log::verbose << "Received " << *message << ": id " << uint(incomingMessage.id)
./.svn/text-base/messages.cpp.svn-base:            Log::verbose << "Sent message: id " << uint(preparedMessage->id)
./.svn/text-base/messages.cpp.svn-base:        Log::verbose << "Discarding unsent message: id " << uint(preparedMessage->id)
./.svn/text-base/messages.cpp.svn-base:        for (uint i = 0; i < 10 && !_stopThreads; ++i) {
./virus/messages.cpp:void VsMessageProcessor::_progress(const string &fileName, uint scanCount)
./virus/messages.cpp:ProgressMessage::ProgressMessage(const string &fileName, uint scanCount)
./virus/messages.h:    void _progress(const std::string &fileName, uint scanCount);
./virus/messages.h:    ProgressMessage(const std::string &fileName, uint scanCount);
./virus/messages.h:    uint        _scanCount;
./virus/.svn/text-base/messages.cpp.svn-base:void VsMessageProcessor::_progress(const string &fileName, uint scanCount)
./virus/.svn/text-base/messages.cpp.svn-base:ProgressMessage::ProgressMessage(const string &fileName, uint scanCount)
./virus/.svn/text-base/messages.h.svn-base:    void _progress(const std::string &fileName, uint scanCount);
./virus/.svn/text-base/messages.h.svn-base:    ProgressMessage(const std::string &fileName, uint scanCount);
./virus/.svn/text-base/messages.h.svn-base:    uint        _scanCount;

How can I tell find to ignore the .svn directories?

+5  A: 

Create a script called ~/bin/svnfind:

#!/bin/bash
#
# Attempts to behave identically to a plain `find' command while ignoring .svn/
# directories.

OPTIONS=()
PATHS=()
EXPR=()

while [[ $1 =~ ^-[HLP]+ ]]; do
    OPTIONS+=("$1")
    shift
done

while [[ $# -gt 0 && ! $1 =~ '^[-(),!]' ]]; do
    PATHS+=("$1")
    shift
done

# If user's expression contains no action then we'll add the normally-implied
# `-print'.
ACTION=-print

while [[ $# -gt 0 ]]; do
    case "$1" in
       -delete|-exec|-execdir|-fls|-fprint|-fprint0|-fprintf|-ok|-print|-okdir|-print0|-printf|-prune|-quit|-ls)
            ACTION=;;
    esac

    EXPR+=("$1")
    shift
done

if [[ ${#EXPR} -eq 0 ]]; then
    EXPR=(-true)
fi

find "${OPTIONS[@]}" "${PATHS[@]}" -name .svn -type d -prune -o '(' "${EXPR[@]}" ')' $ACTION

This script behaves identically to a plain find command but it prunes out .svn directories. Otherwise the behavior is identical.

Example:

# svnfind -name 'messages.*' -exec grep -Iw uint {} +
./messages.cpp:            Log::verbose << "Discarding out of date message: id " << uint(olderMessage.id)
./messages.cpp:    Log::verbose << "Added to send queue: " << *message << ": id " << uint(preparedMessage->id)
./messages.cpp:                Log::error << "Received message with invalid SHA-1 hash: id " << uint(incomingMessage.id)
./messages.cpp:            Log::verbose << "Received " << *message << ": id " << uint(incomingMessage.id)
./messages.cpp:            Log::verbose << "Sent message: id " << uint(preparedMessage->id)
./messages.cpp:        Log::verbose << "Discarding unsent message: id " << uint(preparedMessage->id)
./messages.cpp:        for (uint i = 0; i < 10 && !_stopThreads; ++i) {
./virus/messages.cpp:void VsMessageProcessor::_progress(const string &fileName, uint scanCount)
./virus/messages.cpp:ProgressMessage::ProgressMessage(const string &fileName, uint scanCount)
./virus/messages.h:    void _progress(const std::string &fileName, uint scanCount);
./virus/messages.h:    ProgressMessage(const std::string &fileName, uint scanCount);
./virus/messages.h:    uint        _scanCount;
John Kugelman
+20  A: 

For searching, can I suggest you look at ack ? It's a source-code aware find, and as such will automatically ignore many file types, including source code repository info such as the above.

Brian Agnew
+1 `ack` is your friend, doing `grep` and `find` at once.
MikeSep
Oooooooooooooh!
John Kugelman
I like `ack` very much, but I have found it to be substantially slower than `find -type f -name "*.[ch]" | xargs grep` when dealing with a large codebase.
John Ledbetter
Try findrepo for speed. http://www.pixelbeat.org/scripts/findrepo
pixelbeat
John, I'm the author of ack, and if you can give me details of the speed problems of ack vs. grep, I'd appreciate it. They've been completely comparable in all the cases I've found. Either let me know at http://github.com/petdance/ack/issues or email me at andy at petdance.com. Thansk.
Andy Lester
+16  A: 

As follows:

find . -path '*/.svn*' -prune -o -print

Or, alternatively based on a directory and not a path prefix:

find . -name .svn -a -type d -prune -o -print
Kaleb Pederson
@Kaleb: Hi. I suggest **`find . -type d -name .svn -prune -o -print`** because it is a little bit faster. According to the [POSIX standard](http://www.opengroup.org/onlinepubs/9699919799/utilities/find.html), the expressions are evaluated one by one, in the order specified. If the first expression in `-a` is `false`, the second expression will not be evaluated (also called [short-circuit and evaluation](http://en.wikipedia.org/wiki/Short-circuit_evaluation)).
Siu Ching Pong - Asuka Kenji
@Kaleb: As comparing the **file type** (equivalent to testing whether a bit is set in an integer) is **faster** than comparing the **filename** (equivalent to a string comparison, which is O(n)), putting `-type d` before `-name .svn` is theoretically more efficient. However, it is usually insignificant except if you have a very very big directory tree.
Siu Ching Pong - Asuka Kenji
@Siu - Good point. Similarly, if you have any check that can be quickly performed (e.g. O(1)) and will avoid many additional checks, it's a good idea to place that check first.
Kaleb Pederson
+2  A: 

GNU find

find .  ! -regex ".*[/]\.svn[/]?.*"
ghostdog74
+1  A: 

Try findrepo which is a simple wrapper around find/grep and much faster than ack You would use it in this case like:

findrepo uint 'messages.*'
pixelbeat
+1  A: 

find ... | grep -v .svn

me
You have to escape `.` in the `.svn` regexp.
vladr
+3  A: 

Why dont you pipe your command with grep which is easily understandable:

your find command| grep -v '\.svn'
Vijay Sarathi
You have to escape `.` in the `.svn` regexp.
vladr
@Vlad Are you sure?
yclian
@Yclian without the shadow of a doubt; if you don't, directories called 'tsvn', '1svn', 'asvn' etc. will also be ignored since '.' is a regexp wildcard: 'match any character'.
vladr
Alright, I thought it would only happen for the case of -E and -G. I just tested, my bad. :(
yclian
+1  A: 

wcfind is a find wrapper script that I use to automagically remove .svn directories.

dave
A: 

Just thought I'd add a simple alternative to Kaleb's and others' posts (which detailed the use of the find -prune option, ack, repofind commands etc.) which is particularly applicable to the usage you have described in the question (and any other similar usages):

  1. For performance, you should always try to use find ... -exec grep ... + (thanks Kenji for pointing this out) or find ... | xargs egrep ... (portable) or find ... -print0 | xargs -0 egrep ... (GNU; works on filenames containing spaces) instead of find ... -exec grep ... \;.

    The find ... -exec ... + and find | xargs form does not fork egrep for each file, but rather for a bunch of files at a time, resulting in much faster execution.

  2. When using the find | xargs form you can also use grep to easily and quickly prune .svn (or any directories or regular expression), i.e. find ... -print0 | grep -v '/\.svn' | xargs -0 egrep ... (useful when you need something quick and can't be bothered to remember how to set up find's -prune logic.)

    The find | grep | xargs approach is similar to GNU find's -regex option (see ghostdog74's post), but is more portable (will also work on platforms where GNU find is not available.)

Cheers, V.

vladr
@Vlad: Please notice that there are two forms for the `-exec` switch in `find`: one is ending with `;` and the other is ending with `+`. The one ending with `+` replaces `{}` by a list of all matching files. Besides, your regex **`'/\.svn'`** matches file names like **`'.svn.txt'`** too. Please refer to my comments to the question for more information.
Siu Ching Pong - Asuka Kenji
@Vlad: [Here](http://www.opengroup.org/onlinepubs/9699919799/utilities/find.html) is the POSIX standard for the **`find`** utility. Please see the **`-exec`** part :-).
Siu Ching Pong - Asuka Kenji
@Kenji, thank you for the link! :)
vladr
+2  A: 

I use grep for this purpose. Put this in your ~/.bashrc

export GREP_OPTIONS="--binary-files=without-match --color=auto --devices=skip --exclude-dir=CVS --exclude-dir=.libs --exclude-dir=.deps --exclude-dir=.svn"

grep automatically uses these options on invocation

Ronny
A: 

why not just

find . -not -iwholename '*.svn*'

The -not predicate negates everything that has .svn anywhere in the path.

So in your case it would be

find -not -iwholename '*.svn' -name 'messages.*' -exec grep -Iw uint {} + \;
whaley