tags:

views:

130

answers:

4

I've large number of small files with sequential filenames & i want to create a single file out of it. What is the fastest way to do this?

e.g.

1.tgz.1 1.tgz.2 1.tgz.3 =========> 1.tgz

+8  A: 

You could concatenate the files from the shell.

In Windows (/b for binary mode):

copy /b   1.tgz.1 + 1.tgz.2 + 1.tgz.3   1.tgz

In Unix/Linux:

cat   1.tgz.1 1.tgz.2 1.tgz.3   > 1.tgz
Zach Scrivena
that's taking quite some time...can it be further optimized?
I think cat is about the quickest operation you're going to get!
David Grant
Even if it could be optimized, I guess that the optimization would gain nothing compared to the time spent by gunzip and tar on the big file.
mouviciel
On Unix/Linux, you can save some disk space by piping cat and tar: cat 1.tgz.1 1.tgz.2 1.tgz.3 | tar xzf -
mouviciel
A: 

This is bash (your shell may vary):

for n in *.tgz.* ; do cat $n >> ${n/tgz.*/tgz} ; done
Brent.Longborough
A: 

You will probably get better performance using dd with a high block size:

for n in *.tgz.* ; \
  dd if="$n" conv=notrunc oflag=append bs=4M of="somefile.tgz" ; \
done
codelogic
+2  A: 

If it's a large number of small files, you don't want to be messing around with a huge number of arguments.

Since most UNIX shells expand wildcards alphabetically, you should use:

cat 1.tgz.? 1.tgz.?? 1.tgz.??? >1.tgz

That's assuming there are between 100 and 999 files inclusive, adjust the arguments to handle more or less (e.g., add 1.tgz.???? if there's between 1,000 and 9,9999 inclusive). You're not going to get better performance since your bottle neck is the disk speed which is always going to be slower than the code running on the CPU.

The only other possibilities I can think of are:

  • create 1.tgz on a separate physical disk. This may give you the advantage of interleaving disk accesses.
  • run as root and use nice to bump up your priority (see man nice for details). This will improve your power to get more CPU but again, if you're bound by disk I/O, that won't help much.
paxdiablo