views:

217

answers:

3

I have a directory and a bunch of sub-directories like this: - directory1 (sub-dir1, sub-dir2, sub-dir3, sub-dir4, sub-dir5...........and so on, hundreds of them...)

How do I find out what is average size of the sub-directories? And how do I find what is the maximum size of the sub-directories?

All using Unix commands...

Thanks.

A: 

I once had an issue with ext3, which only allows 31998 sub directories per directory. Ext4 allows ~64k.

Willi
I have no more than 1,000 sub-directories under a directory, so this is not an issue.
Mircea
+1  A: 

If you only have directories and not files in directory1, then the following two "commands" should give you the size (in bytes) and name of the largest directory and the average of their sizes (in bytes), respectively.

$ du -sb directory1/* | sort -n | tail -n 1
$ du -sb directory1/* | awk ' { sum+=$1; ++n } END { print sum/n } '

If there is also ordinary files within directory1, these will be counted as well with the examples above. If ordinary files should not be counted, the following might be more appropriate.

$ find directory1/ -mindepth 1 -maxdepth 1 -type d -exec du -sb {} \; | sort -n | tail -n 1
$ find directory1/ -mindepth 1 -maxdepth 1 -type d -exec du -sb {} \; | awk ' { sum+=$1; ++n } END { print sum/n } '
Mikael Auno
Does the second command gives me the average size in KB? (the first one I understand it gives it to me in Bytes, right)
Mircea
@Mircea All sizes are in bytes, that's what the `-b` flag to `du` does. If you instead want the sizes in KiB, replace `-b` with `-k`.
Mikael Auno
Well, for the first command (using it with -b) I get the max size like 9893072, which I guess it's 9893 KiloBytes - not KiloBits, right?).If I run it with -k I get 992 (KiloBytes or KiloBits?) Which one is correct?The second command gives me 960 (the average) when I run it with -b. 960 Bytes?? That's not possible, I know all the sub-directories inside are having more that 960 bytes.
Mircea
And how can I eliminate certain sub-directories from being counted?
Mircea
@Mircea I made a slight mistake. `sort` should have `-n` added (as I have done now in the answer above) to make it sort numerically instead of lexicographically. Also, note that `-k` give sizes in KiB, not in kB. One KiB is 1024 bytes, one kB is 1000 bytes.
Mikael Auno
@Mircea You can add arguments to `find` to filter out the specific subdirectories you want. See the man page for `find(1)` for more details on that.
Mikael Auno
I thought 1 byte = 8 bits ..is not it?
Mircea
@Mircea 1 bytes = 8 bits, 1 KiB (kibibyte, what one normally mean when saying kilobyte) = 1024 bytes, 1 kB (kilobyte) = 1000 bytes.
Mikael Auno
I looked how to use find to eliminate certain sub-directories...but I don't see how to put it together with first command: du -sb directory/* | sort -n | tail -n 1
Mircea
@Mircea You put it together with my last example, the one actually using `find`. If you, for example, only want sub directories which has got the word "foo" in their names, then add `-name "*foo*"` after `-type d` (which specifies that it should be a directory).
Mikael Auno
Ok, for the first command I used this (I don't know how efficient is but it seems it's doing the job):du -sk directory1/* | sort -n | grep -v 'sub-dir1' | grep -v 'sub-dir2' | grep -v 'sub-dir3' | tail -n 1For the second command I used this:du -sk directory1/* | grep -v 'sub-dir1' | grep -v 'sub-dir2' | grep -v 'sub-dir3' | awk ' { sum+=$1; ++n } END { print sum/n; print n } 'The result is 1779.92 and 957...I suppose 957 is the actual result, right?
Mircea
@Mircea I see now that I made another mistake, the second line of output from `awk` was just for my testing when i wrote the line. It simply outputs the number of files accounted for in the average (simply remove the `print n` part to get rid of this). The average size is thus 1780 KiB or 1.74 MiB (if you meant that 1779.92 and 957 are both output by the second command).
Mikael Auno
Ok, thank you, Mikael!
Mircea
@Mircea A good way to say thank you on Stack Overflow is to accept the answers that you feel answers your questions.
Mikael Auno
A: 

to get the largest size (KB), use -b for bytes

du -sk */|sort -n|tail -1

to get average size (KB)

du -sk */|awk '{s+=$1}END{print s/NR}'
ghostdog74
Doesn't awk kind of defeat the purpose of only using unix commands? I mean, sure it is one, but so is perl
swampsjohn
i don't understand what you are saying. any tools/program that can be executed on the command line is technically a "unix command.". however, in a traditional sense, before there are languages like Perl/Python, these commands (eg du/awk etc) are already part of unix. Up till today, i still distinguish them as unix commands (as in those in Single unix specs), as opposed to calling Perl/Python as "unix commands"
ghostdog74
But awk IS a programming language. It's turing complete.
swampsjohn
yes, so? i still don't get your point. awk is a programming language but it doesn't have libraries/modules to find disk usage (aka du).
ghostdog74