views:

257

answers:

6

On a Linux machine I would like to traverse a folder hierarchy and get a list of all of the distinct file extensions within it.

What would be the best way to achieve this from a shell?

+2  A: 

Try this (not sure if it's the best way, but it works):

find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u

It work as following:

  • Find all files from current folder
  • Prints extension of files if any
  • Make a unique sorted list
Ivan Nevostruev
+1  A: 

Reursive version:

find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort -u

If you want totals (how may times the extension was seen):

find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort | uniq -c | sort -rn

Non-recursive (single folder):

for f in *.*; do printf "%s\n" "${f##*.}"; done | sort -u

I've based this upon this forum post, credit should go there.

ChristopheD
you are going to execute bash for each file name you find??
Good point, changed the solution...
ChristopheD
+2  A: 

Find everythin with a dot and show only the suffix.

find . -type f -name "*.*" | awk -F. '{print $NF}' | sort -u

if you know all suffix have 3 characters then

find . -type f -name "*.???" | awk -F. '{print $NF}' | sort -u

or with sed shows all suffixes with one to four characters. Change {1,4} to the range of characters you are expecting in the suffix.

find . -type f | sed -n 's/.*\.\(.\{1,4\}\)$/\1/p'| sort -u
No need for the pipe to 'sort', awk can do it all: find . -type f -name "*.*" | awk -F. '!a[$NF]++{print $NF}'
SiegeX
And it's output is also uniq! Nice!
+1  A: 

Since there's already another solution which uses Perl:

If you have Python installed you could also do (from the shell):

python -c "import os;e=set();[[e.add(os.path.splitext(f)[-1]) for f in fn]for _,_,fn in os.walk('/home')];print '\n'.join(e)"
ChristopheD
+1  A: 

None of the replies so far deal with filenames with newlines properly (except for ChristopheD's, which just came in as I was typing this). The following is not a shell one-liner, but works, and is reasonably fast.

import os, sys

def names(roots):
    for root in roots:
        for a, b, basenames in os.walk(root):
            for basename in basenames:
                yield basename

sufs = set(os.path.splitext(x)[1] for x in names(sys.argv[1:]))
for suf in sufs:
    if suf:
        print suf
Lars Wirzenius
+2  A: 

Powershell: dir -recurse | select-object extension -unique

thanks to http://kevin-berridge.blogspot.com/2007/11/windows-powershell.html

Simon R
Hey that's pretty cool, and very readable.
GloryFish
Thanks for linking to my blog!
Kevin Berridge