tags:

views:

335

answers:

6
+1  Q: 

Unix cat command

I am joining about 20 files with a total size of 40Gb using the following command.

cat hda1.ntfs-ptcl-img.gz.* > hda1.ntfs-ptcl-img.gz

Just wondering how long this process should usually take as it has been running for some time now.

Thanks.

A: 

In a different terminal, check the size of the output file using "ls -al", that way you'll know how far along you are.

For a progress bar, try "pv" (pipe viewer): http://www.catonmat.net/blog/unix-utilities-pipe-viewer/

dmazzoni
This is on a live cd of clonezilla so there is only one terminal. Great advice though, wouldn't have thought of that.
Daniel
A: 

You can create more complex procedure which will show you which file is already processing. It will not show the time of whole process, but will be more verbose than yours.

for f in `ls hda1.ntfs-ptc1-img.gz.*`; do
     echo $f
     cat $f >> hda1.ntfs-ptc1.img.gz
done
kogut
This should be >> rather than > if you want them all to be appended to the output file.
dmazzoni
Yes you were right. Thank you. I fixed it.
kogut
So using > won't work at all?
Daniel
Yes it won't work. Use >> instead.
kogut
Why it will if the number of files == 1 ;-) But if you already started it with `>` be sure to remove the output file before restarting with `>>`.
Michael Krelin - hacker
`>>` means "append to", so if the file exists, the extra output from this command will be appended to the end. Hence, as Michael said, if the file already exists (perhaps because you started this earlier and then cancelled it), make sure you remove it first. `>` means "send output to" - if the file already exists, it will first be deleted, then a new zero-length file with the same name will be created. If you do this inside a for loop, each pass through the loop will destroy what you put in the file last time through.
James Polley
+7  A: 

Press Ctrl+Z to pause the job, then use the 'bg' command to run it in the background. Then you can use 'ls -l' to see the size of the output file, or 'ls -l /proc/*/fd | grep hda1' to show you which file is being processed.

Roger Lipscombe
Thanks, that helped heaps. Will be using this command a lot more often.
Daniel
+4  A: 

Just wondering how long this process should usually take

That's impossible to answer on the information provided here: it depends on where you're reading from, where you're writing to, etc. If you're both reading from and writing to a local disk, there's going to be some contention. If you're reading or writing across a network, it could be even slower, depending on what your network speed is.

To get some more information out of the process, you could break this from a single cat command into a for loop:

for file in hda1.ntfs-ptcl-img.gz.*
do
  echo "Starting $file at `date`"
  cat $file >> hda1.ntfs.ptcl-img.gz
done

Or, you can use the pv (pipeview) utility to get some more information out of your pipeline. From man pv:

pv allows a user to see the progress of data through a pipeline, by giving information such as time elapsed, per‐centage completed (with progress bar), current throughput rate, total data transferred, and ETA.

To use it, insert it in a pipeline between two processes, with the appropriate options. Its standard input will be passed through to its standard output and progress will be shown on standard error.

pv will copy each supplied FILE in turn to standard output (- means standard input), or if no FILEs are specified just standard input is copied. This is the same behaviour as cat(1).

So, just replace cat in your commandline with pv:

pv hda1.ntfs-ptcl-img.gz.* > hda1.ntfs-ptcl-img.gz

Since you've already started this off though, hints for what to do next time aren't particularly helpful. Instead, you can put the running job in the background (with ctrl+z, then run bg at the prompt to make the job keep going in the background). If you're lucky, your livecd will have watch, so you can watch ls -h hda1.ntfs.ptcl-img.gz - this will run an ls every few seconds and update the screen with the output, so you can watch the file growing over time.

If you don't have watch installed, use poor man's watch:

while true
do
  clear
  date
  ls -l hda1.ntfs.ptcl-img.gz
  sleep 3
done

You'll still have to figure out for yourself how quickly bytes are being written (and therefore how much time you have left).

James Polley
And by the time I finish writing this, others have pointed to pv (which may or may not exist on the livecd), and come up with similar for loops (albeit lacking the datestamp, which I like because I'm easily distracted and forget when I started things). Great minds think alike!
James Polley
A: 

The short answer is, probably an hour or so.

If you're doing it locally and you haven't gotten back to your prompt, it's still working. If you're doing it remotely, there's some chance (dependent on a bunch of factors like whether you're connected to a wireless router) your connection has timed out.

Like others have said, a lot of it's dependent on how many files are being joined, whether it's local or remote, what other processes are happening on your machine, what your processor, clock speed, and RAM are at, et cetera.

Easiest way to make sure it's doing something is to open another terminal window and run "ls -l /path/to/file.name" periodically and see if the output file is getting larger. You can also run "top -p PID" (replacing PID with the actual process ID) to monitor just that process using top, and as long as it's still running, it's still doing something.

gabrielk
A: 

Similar to Roger's answer, I will run watch -n 10 "ls -l", which invokes "ls -l" every 10 seconds and lets me watch my file grow. Or I'll use watch "du -sh".

I don't really know if that's a 'good' thing to do during a file operation (I guess it might slow things down a little bit?) but it works for me.

rascher