views:

802

answers:

5

How can I return a list of files that are named duplicates i.e. have same name but in different case that exist in the same directory?

I don't care about the contents of the files. I just need to know the location and name of any files that have a duplicate of the same name.

Example duplicates:

/www/images/taxi.jpg
/www/images/Taxi.jpg

Ideally I need to search all files recursively from a base directory. In above example it was /www/

+15  A: 

Try:

ls -1 | tr '[A-Z]' '[a-z]' | sort | uniq -c | grep -v " 1 "

Simple, really :-) Aren't pipelines wonderful beasts?

The ls -1 gives you the files one per line, the tr '[A-Z]' '[a-z]' converts all uppercase to lowercase, the sort sorts them (surprisingly enough), uniq -c removes subsequent occurrences of duplicate lines whilst giving you a count as well and, finally, the grep -v " 1 " strips out those lines where the count was one.

When I run this in a directory with one "duplicate" (I copied qq to qQ), I get:

2 qq

For the "this directory and every subdirectory" version, just replace ls -1 with find . or find DIRNAME if you want a specific directory starting point (DIRNAME is the directory name you want to use).

This returns (for me):

2 ./.gconf/system/gstreamer/0.10/audio/profiles/mp3
2 ./.gconf/system/gstreamer/0.10/audio/profiles/mp3/%gconf.xml
2 ./.gnome2/accels/blackjack
2 ./qq

which are caused by:

pax> ls -1d .gnome2/accels/[bB]* .gconf/system/gstreamer/0.10/audio/profiles/[mM]* [qQ]?
.gconf/system/gstreamer/0.10/audio/profiles/mp3
.gconf/system/gstreamer/0.10/audio/profiles/MP3
.gnome2/accels/blackjack
.gnome2/accels/Blackjack
qq
qQ

Update:

Actually, on further reflection, the tr will lowercase all components of the path so that both of

/a/b/c
/a/B/c

will be considered duplicates even though they're in different directories.

If you only want duplicates within a single directory to show as a match, you can use the (rather monstrous):

perl -ne '
    chomp;
    @flds = split (/\//);
    $lstf = $f[-1];
    $lstf =~ tr/A-Z/a-z/;
    for ($i =0; $i ne $#flds; $i++) {
        print "$f[$i]/";
    };
    print "$x\n";'

in place of:

tr '[A-Z]' '[a-z]'

What it does is to only lowercase the final portion of the pathname rather than the whole thing. In addition, if you only want regular files (no directories, FIFOs and so forth), use find -type f to restrict what's returned.

paxdiablo
Wow. That is one impressive command. Can't imagine ever being able to do that in Windows. I love *nix. Thank you so much.
Camsoft
You can do that on Windows just fine. Get yourself a copy of Cygwin or MinGW and enjoy :-)
paxdiablo
But you can't do it out of the box.
Camsoft
+1 for a good breakdown.
kjfletch
@paxdiablo could you edit your post as I accidentally removed my vote and now it won't let me re-vote unless the post is edited.
Camsoft
great answer, but 1 tiny suggestion for optimization: I think you don't need "-1" on ls if you're redirecting into a pipe.
Carl Smotricz
Actually, @Carl, I didn't realise that. I'm sure that never used to be the case with the older UNIXes, they would faithfully output the same to a pipe as to a tty. I suspect this changed in Linux. That could actually be rather annoying if you didn't expect it.
paxdiablo
On the off chance that multiple matching files have filenames containining " 1 ", you should use `grep -v "^ *1"`
Dennis Williamson
People that put spaces in their file names should be beaten to death with a wet celery stick (to make the pain last longer). :-)
paxdiablo
Has anyone tried this using cygwin? I'm not having luck with it, but I have a windows box with a large directory with many many sub directories and need to search for dups... I loaded up cygwin thinking linux/unix scripting would be more beenficial than bat scripting
ProfessionalAmateur
@ProfessionalAmateur, What do you mean by not working? Is it generating an error? Taking too long? Something else? And which part? I ask because most of my scripts are actually _done_ in CygWin (and tested later under Ubuntu if necessary).
paxdiablo
@paxdiablo - Sorry I should have clarified. It's not returning any results. I hit enter and it just goes to the next prompt entry line, doesn't find any dups (I even manually created a dup just as a test).
ProfessionalAmateur
+10  A: 

I'm sorry, i don't have enough reputation to leave a comment yet.

The other answer is great, but instead of the "rather monstrous" perl script i suggest

perl -pe 's!([^/]+)$!lc $1!e'

Which will lowercase just the filename part of the path.

Edit: In fact the entire problem can be solved with:

find | perl -ne 's!([^/]+)$!lc $1!e; print if 1 == $seen{$_}++'
Christoffer Hammarström
+1. I would strongly suggest accepting *this one* as the answer (instead of my currently accepted answer). It's far more elegant. My final version's a monstrosity since I came at it from a pipeline viewpoint and had to add perl to get around a problem with tr. This answer is proof positive that you can often get far better solutions by stepping back and starting again.
paxdiablo
And use "find -type f" if you want it restricted to regular files(no directories).
paxdiablo
A: 

I believe

ls | sort -f | uniq -i -d

is simpler, faster, and will give the same result

mpez0
Yes, for the current directory. But how about subdirectories? Note that you can only ignore case for the basename, not the entire path.
Christoffer Hammarström
A: 

Using Duplicate Finder 2009 to find and remove duplicate files... Fast and Easy to use software. http://www.duplicate-finder-pro.com

asmrk
Mate, I'm not going to downvote this since you're relatively new here, but did you even read the question? That product you link to doesn't even _have_ a Linux version.
paxdiablo
A: 

// Duplicate files with same name in same directory // Duplicate files are copies of original files that unnecessarily occupy precious hard disk space,but we should be careful while deleting these files as these may lead to human errors. I can suggest one duplicate file remover that is the nice optimizer.It will scan files based on each criteria or extensions chosen by the user.Use ASO's (Advanced system optimizer) duplicate file remover . Advanced system optimizer is a windows optimization tool. Try downloading it from download.com or CNET . Use http://download.cnet.com/Advanced-System-Optimizer/3000-2094_4-10147659.html?tag=mncol to download.

You can find your duplicates from network as well using this.

lombardo.g