views:

19

answers:

1

I have a dataset contained in a directory that has about 30,000 sub-directories. Each of these directories contains a text file and another sub-directory. This sub-directory contains some number of text files (ranging from 0 text files, to hundreds). Many of my colleagues use this dataset, but as it is it takes at least 6 hours to transfer the dataset from one of the computers/hard disks in the lab to another - not because of the size of the dataset, but because of the cumbersome format in which it is stored. I would like to create some archive (such as .tar.gz) to store these data such that they can be quickly transfered between computers. I wanted to see if anyone has worked with something like this before and can tell me the fastest, best way to do it? I am thinking that a shell-script might be quicker than just creating the archive myself.

A: 

Suggestion: NFS mount the directory. Then a windows box or a unix box can access the directory.

Comment: directory structures like that are bad news on inodes in a filesystem, and increase search times as well.

Answer: This will work on any POSIX compliant unix box, and assumes there is just one base directory for your repository--

cd /path/to/archive; tar cvf mycharhive.tar ./archive_dir;  gzip myarchive.tar

This creates a relative path tar archive - meaning you can unpack it to a low-level directory, instead of off the root.

jim mcnamara
Awesome, thank you
WordWalk