views:

508

answers:

6

The page 38 of the book Linux 101 Hacks suggests:

cat url-list.txt | xargs wget –c

I usually do:

for i in `cat url-list.txt`
   do
      wget -c $i
   done

Is there some thing, other than length, where the xargs-technique is superior to the old good for-loop-technique in bash?

Added

The C source code seems to have only one fork. In contrast, how many forks have the bash-combo? Please, elaborate on the issue.

A: 

One advantage I can think of is that, if you have lots of files, it could be slightly faster since you don't have as much overhead from starting new processes.

I'm not really a bash expert though, so there could be other reasons it's better (or worse).

Matthew Crumley
+3  A: 

xargs will combine several arguments in a single command. The for loop forks a new process for each iteration, and that can sometimes be a significant speedup.

Jim Lewis
You mean xargs is only for performance? Hard to believe. There must be somethingg else with it.
Masi
It's also somewhat shorter to type.And has a big label that says "don't panic"
kdgregory
And to be serious: process creation isn't very cheap. With today's fast processors it's not noticeable, but go back 10-15 years, and you can see why xargs was a preferred solution.
kdgregory
"only for performance?" Don't forget that `xargs` also handles large numbers of arguments for commands that can't.
Dennis Williamson
+4  A: 

From the Rationale section of a UNIX manpage for xargs. (Interestingly this section doesn't appear in the OS X BSD version of xargs, nor in the GNU version.)

The classic application of the xargs utility is in conjunction with the find utility to reduce the number of processes launched by a simplistic use of the find -exec combination. The xargs utility is also used to enforce an upper limit on memory required to launch a process. With this basis in mind, this volume of POSIX.1-2008 selected only the minimal features required.

In your follow-up, you ask how many forks the other version will have. Jim already answered this: one per iteration. How many iterations are there? It's impossible to give an exact number, but easy to answer the general question. How many lines are there in your url-list.txt file?

There are other some other considerations. xargs requires extra care for filenames with spaces or other no-no characters, and -exec has an option (+), that groups processing into batches. So, not everyone prefers xargs, and perhaps it's not best for all situations.

See these links:

Telemachus
+2  A: 

Also consider:

xargs -I'{}' wget -c '{}' < url-list.txt

but wget provides an even better means for the same:

wget -c -i url-list.txt

With respect to the xargs versus loop consideration, i prefer xargs when the meaning and implementation are relatively "simple" and "clear", otherwise, i use loops.

nicerobot
+2  A: 

xargs will also allow you to have a huge list, which is not possible with the "for" version because the shell uses command lines limited in length.

xilun
A: 

Depending on your internet connection you may want to use GNU Parallel http://www.gnu.org/software/parallel/ to run it in parallel.

cat url-list.txt | parallel wget -c
Ole Tange