views:

1434

answers:

7

I have a script which calls two commands:

long_running_command | print_progress

The long_running_command prints a progress but I'm unhappy with it. I'm using print_progress to make it more nice (namely, I print the progress in a single line).

The problem: The pipe activates a 4K buffer, to the nice print program gets nothing ... nothing ... nothing ... a whole lot ... :)

How can I disable the 4K buffer for the long_running_command (no, I don't have the source)?

A: 

Im not sure but by the sounds of things xargs might be what you need? Read the man page.

xargs is a command on Unix and most Unix-like operating systems. It is useful when one wants to pass a large number of arguments to a command. Until Linux kernel 2.6.23, arbitrarily long lists of parameters could not be passed to a command [1], so xargs will break the list of arguments into sublists small enough to be acceptable.

From Wikipedia

nullptr
+1  A: 

According to this the pipe buffer size seems to be set in the kernel and would require you to recompile your kernel to alter.

second
+2  A: 

I don't think the problem is with the pipe. It sounds like your long running process is not flushing its own buffer frequently enough. Changing the pipe's buffer size would be a hack to get round it, but I don't think its possible without rebuilding the kernel - something you wouldn't want to do as a hack, as it probably aversley affect a lot of other processes.

anon
The root cause is that libc switches to 4k buffering if the stdout is not a tty.
Aaron Digulla
That is very interesting ! because pipe don't cause any buffering. They provide buffering, but if you read from a pipe, you get whatever data is available, you don't have to wait for a buffer in the pipe. So the culprit would be the stdio buffering in the application.
shodanex
+11  A: 

You can use the expect command unbuffer, e.g.

unbuffer long_running_command | print_progress

unbuffer connects to long_running_command via a pseudoterminal (pty), which makes the system treat it as an interactive process, therefore not using the 4-kiB buffering in the pipeline that is the likely cause of the delay.

For longer pipelines, you may have to unbuffer each command (except the final one), e.g.

unbuffer x | unbuffer y | z
cheduardo
In fact, the use of a pty to connect to interactive processes is true of expect in general.
cheduardo
So fscking simple ! I am always amazed when I discover this kind of utilities. I knew expect, but not unbuffer
shodanex
When pipelining calls to unbuffer, you should use the -p argument so that unbuffer reads from stdin.
Chris Conway
+2  A: 

It used to be the case, and probably still is the case, that when standard output is written to a terminal, it is line buffered by default - when a newline is written, the line is written to the terminal. When standard output is sent to a pipe, it is fully buffered - so the data is only sent to the next process in the pipeline when the standard I/O buffer is filled.

That's the source of the trouble. I'm not sure whether there is much you can do to fix it without modifying the program writing into the pipe. You could use the setvbuf() function with the _IOLBF flag to unconditionally put stdout into line buffered mode. But I don't see an easy way to enforce that on a program. Or the program can do fflush() at appropriate points (after each line of output), but the same comment applies.

I suppose that if you replaced the pipe with a pseudo-terminal, then the standard I/O library would think the output was a terminal (because it is a type of terminal) and would line buffer automatically. That is a complex way of dealing with things, though.

Jonathan Leffler
+3  A: 

If it is a problem with the libc modifying its buffering / flushing when output does not go to a terminal, you should try socat. You can create a bidirectional stream between almost any kind of I/O mechanism. One of those is a forked program speaking to a pseudo tty.

 socat EXEC:long_running_command,pty,ctty STDIO 

What it does is

  • create a pseudo tty
  • fork long_running_command with the slave side of the pty as stdin/stdout
  • establish a bidirectional stream between the master side of the pty and the second address (here it is STDIO)

If this gives you the same output as long_running_command, then you can continue with a pipe.

Edit : Wow Did not see the unbuffer answer ! Well, socat is a great tool anyway, so I might just leave this answer

shodanex
...and I didn't know about socat - looks kinda like netcat only perhaps more so. ;) Thanks and +1.
cheduardo
+1  A: 

Should also work:

long_running_command | grep --line-buffered "" | print_progress
Mark Ruvald
Unfortunately, this just fixes the output side of the `grep`; the input for `grep` still uses the normal 4KB buffer.
Aaron Digulla
This works well for my case where I want to chain multiple grep (or other filter) commands - all but the last should be set with unbuffered writes, otherwise the output doesn't look interactive.
Guss