views:

342

answers:

2

I have these files in a directory: y y1 y2 y3

Running this command:
ls y* | xargs -i basename {}|xargs -i sed "s/{}//g"

produces this:
1
2
3

Can someone explain why?! I expected it to produce nothing - running sed four times, once for each file, and removing the file name each time. But actually it looks like it's applying sed with {} set to the first file, on a list of y1 y2 y3

This is Solaris 10

+1  A: 

The input to the xargs -i sed... command is:

y
y1
y2
y3

The command will read the line y and execute sed s/y//g, which reads from the standard input. The standard input is inherited so it will have the same pipe as its standard input, and may be able to read the remaining input:

y1
y2
y3

The command sed s/y//g will remove the y from each line:

1
2
3

However if xargs consumes all the input before the first sed command is executed then the sed command will no have input left to read, and will do nothing.

mark4o
+2  A: 

When I try this on my linux box, I get inconsistent results. Sometimes 123, sometimes (most of the times) 23, sometimes 12. This is a subtle buffering race condition between the rightmost xargs and any of the sed it spawns.

Dissecting the command line:

  • ls y* will output 4 lines, y, y1, y2 and y3; buffering not relevant
  • xargs -i basename {} will read them and launch, in a sequence, basename y, basename y1, basename y2, basename y3; output, same as input in our case, is line-buffered as each line comes from a different process.
  • xargs -i sed "s/{}//g", for each line X it reads (more on that later), launches sed "s/X//g"
  • each sed "s/X//g" filters out each X it sees in the lines it reads

Where it gets tricky: the last two commands read input from the same stream. That stream is produced by multiple different processes in a sequence. Depending on a multitude of factors (system load, scheduling), the output could come out in very different timing patterns.

Let's suppose they're all very fast. Then all four lines might be available for the right xargs to read in a single block. In that case, there would no input left for any of the seds to read, hence no output at all.

On the other hand, if they were very slow, there might be only one line available for the right xargs on its first read attempt. That line would be "y". xargs would spawn the first sed as sed "s/y//g", which would consume all remaining input (y1, y2, y3), strip y's, and output 1, 2, 3. Here's the same explanation again, with more explicit sequencing.

  1. first basename writes "y".
  2. right xargs reads "y", spawns sed s/y//g. xargs now waits for sed to complete.
  3. second basename writes "y1"; sed reads "y1", writes "1"
  4. third basename writes "y2"; sed reads "y2", writes "2"
  5. fourth basename writes "y3"; sed reads "y3", writes "3"
  6. left xargs is done; sed reads EOF and stops
  7. right xargs tries to continue, reads EOF and stops

Not sure about my 12 case. Possibly GNU xargs doesn't wait for its children to complete before it reads subsequent available input, and snatched the "y3" line from the first sed.

In any case, you just set up a pipeline with multiple concurrent readers on the same writer, which yields mostly undeterministic results. To be avoided.

If you wanted operation on each of the files, it would be avoided by specifying a filename to use by sed (note the final {}):

ls y* | xargs -i basename {} | xargs -i sed "s/{}//g" {}

If what you wanted was a cross-product-type result (strip each file name from each file), you'd need to arrange to have the file list produced as many times as there are files. Plus one for xargs, if you still used that.

Hope this helps.

JB
sed "s/{}//g" {} will run sed on the actual files - not just the filenames. Not sure that's what the op wanted though.But yes, you're right - sed gets chained, reading from the same filedescriptor and you'll defintly have a subtle race.
nos
good grief, that's even more confusing. I need to study this - your initial explanation is what I originally thought, but the race I don't quite get. NB the original intention of the coder (not me) was to produce the output as is (1 2 3). I just think it's happened by accident rather than design, and the code should certainly change to something more maintainable!
Joe Watkins