I have a dataset contained in a directory that has about 30,000 sub-directories. Each of these directories contains a text file and another sub-directory. This sub-directory contains some number of text files (ranging from 0 text files, to hundreds). Many of my colleagues use this dataset, but as it is it takes at least 6 hours to transfer the dataset from one of the computers/hard disks in the lab to another - not because of the size of the dataset, but because of the cumbersome format in which it is stored. I would like to create some archive (such as .tar.gz) to store these data such that they can be quickly transfered between computers. I wanted to see if anyone has worked with something like this before and can tell me the fastest, best way to do it? I am thinking that a shell-script might be quicker than just creating the archive myself.
A:
Suggestion: NFS mount the directory. Then a windows box or a unix box can access the directory.
Comment: directory structures like that are bad news on inodes in a filesystem, and increase search times as well.
Answer: This will work on any POSIX compliant unix box, and assumes there is just one base directory for your repository--
cd /path/to/archive; tar cvf mycharhive.tar ./archive_dir; gzip myarchive.tar
This creates a relative path tar archive - meaning you can unpack it to a low-level directory, instead of off the root.
jim mcnamara
2010-06-30 19:38:18
Awesome, thank you
WordWalk
2010-06-30 21:03:11