I'd like to profile some VCS software, and to do so I want to generate a set of random files, in randomly arranged directories. I'm writing the script in Python, but my question is briefly: how do I generate a random directory tree with an average number of sub-directories per directory and some broad distribution of files per directory?
Clarification: I'm not comparing different VCS repo formats (eg. SVN vs Git vs Hg), but profiling software that deals with SVN (and eventually other) working copies and repos.
The constraints I'd like are to specify the total number of files (call it 'N', probably ~10k-100k) and the maximum depth of the directory structure ('L', probably 2-10). I don't care how many directories are generated at each level, and I don't want to end up with 1 file per dir, or 100k all in one dir.
The distribution is something I'm not sure about, since I don't know whether VCS' (SVN in particular) would perform better or worse with a very uniform structure or a very skewed structure. Nonetheless, it would be nice if I could come up with an algorithm that didn't "even out" for large numbers.
My first thoughts were: generate the directory tree using some method, and then uniformly populate the tree with files (treating each dir equally, with no regard as to nesting). My back-of-the-envelope calcs tell me that if there are 'L' levels, with 'D' subdirs per dir, and about sqrt(N) files per dir, then there will be about D^L dirs, so N =~ sqrt(N)*(D^L) => D =~ N^(1/2L). So now I have an approximate value for 'D', how can I generate the tree? How do I populate the files?
I'd be grateful just for some pointers to good resources on algorithms I could use. My searching only found pretty applets/flash.