views:

157

answers:

3

I'm writing a simple program to run through a bunch of files in various directories on my system. It basically involves opening them up and checking for valid XML. One of the options of this program is to list bad xml files.

This leads me to my question. What the best output to format this for use with XARGS. I thought putting each entry on a newline would be good enough, but it seems a bit confusing. because the filenames all have spaces.

So say my output is:

./dir name 1/file 1.xml
./dir name 2/file 2.xml
./dir name 3/file 3.xml

I tried the following command, but it keeps saying "No such file or directory".

./myprogram.py --list BADXML | xargs -d '\n' cat

So.. I am either misunderstanding how to use XARGS or I need to slightly change the format of the output of my program. I am not sure the best easiest to use) route to take here. i would hate to have to always type a mess of xarg options if I can avoid it.

+1  A: 

man xargs

--null

-0 Input items are terminated by a null character instead of by whitespace, and the quotes and backslash are not special (every character is taken literally). Disables the end of file string, which is treated like any other argument. Useful when input items might contain white space, quote marks, or backslashes. The GNU find -print0 option produces input suitable for this mode.

Dyno Fu
That doesn't work either.. Well I suppose I could null terminate my output. That doesn't make for a very pretty list tho if I am not using xargs.
PKKid
Add a `-0` option to your program. `./myprogram.py -0 --list BADXML | xargs -0 cat`
Dennis Williamson
I am getting very close. I added a -0 option which will output "\0".join(listItems) via python. I am running the command "./myprogram.py -0 --list BADXML | xargs -0 ls -l". Everything works except the very last item which reports "No such file or dir". If I change the out python line to "\0".join(listItems) + "\0" to get that last item, it works for the last item, but then I still get a no such file or dir on "" (blank line).
PKKid
Got it! -- I needed to use sys.stdout.write() when printing out the -0 option. Using print adds a newline char at the end of everything.
PKKid
@PKKid, if you are using Python, why don't u do formatting in Python.?
ghostdog74
A: 

You could ditch xargs, and use read:

./myprogram.py --list BADXML | while read -a line; do cat "${line[*]}"; done

Anything xargs can do, while-read loops can do better...

Postscript Actually, maybe xargs is as essential a part of the Unix power toolkit as while-read loops; see my When should xargs be preferred over while-read-loops question, whose answers gave a very strong efficiency case for xargs. The choice is between malleability/flexibility vs. speed.

Charles Stewart
A: 

With GNU Parallel http://www.gnu.org/software/parallel/ you should be able to do it with no change to myprogram.py:

./myprogram.py --list BADXML | parallel cat

Added bonus: the cat will run in parallel and may thus be faster on multicore computers.

Ole Tange