ansaurus

Question

Why doesn't more Java code use PipedInputStream / PipedOutputStream ?

Answer 1

+3 A:

In your example you're creating two threads to do the work that could be done by one. And introducing I/O delays into the mix.

Do you have a better example? Or did I just answer your question.

To pull some of the comments (at least my view of them) into the main response:

Concurrency introduces complexity into an application. Instead of dealing with a single linear flow of data, you now have to be concerned about sequencing of independent data flows. In some cases, the added complexity may be justified, particularly if you can leverage multiple cores/CPUs to do CPU-intensive work.
If you are in a situation where you can benefit from concurrent operations, there's usually a better way to coordinate the flow of data between threads. For example, passing objects between threads using a concurrent queue, rather than wrapping the piped streams in object streams.
Where a piped stream may be a good solution is when you have multiple threads performing text processing, a la a Unix pipeline (eg: grep | sort).

In the specific example, the piped stream allows use of an existing RequestEntity implementation class provided by HttpClient. I believe that a better solution is to create a new implementation class, as below, because the example is ultimately a sequential operation that cannot benefit from the complexity and overhead of a concurrent implementation. While I show the RequestEntity as an anonymous class, reusability would indicate that it should be a first-class class.

post.setRequestEntity(new RequestEntity()
{
    public long getContentLength()
    {
        return 0-1;
    }

    public String getContentType()
    {
        return "text/xml";
    }

    public boolean isRepeatable()
    {
        return false;
    }

    public void writeRequest(OutputStream out) throws IOException
    {
        output.setByteStream(out);
        serializer.write(doc, output);
    }
});

kdgregory 2009-01-27 16:48:13

That's an example I have handy. What IO delays are being introduced? PipedInputStreams and PipedOutputStreams are memory buffers.

Steven Huwig 2009-01-27 16:56:39

They may be memory buffers, but they use the underlying pipe implementation, which is a kernel I/O operation.

kdgregory 2009-01-27 16:59:55

Not according to the source they don't.

Steven Huwig 2009-01-27 17:02:22

As for your example: I haven't used HttpClient, but I would expect an alternate method to get access to the request body as an OutputStream. Perhaps not, although are you sure that PostMethod doesn't buffer its content in memory (in which case you don't gain anything)

kdgregory 2009-01-27 17:05:10

PostMethod can buffer or not, depending on whether the method has been configured to chunk the enclosed entity. By default it chunks when the content length is not set.It'd be more helpful if you assumed I had already read the APIs and source in question when you answer.

Steven Huwig 2009-01-27 17:08:52

Re Java piped streams: I learned something, and am somewhat disappointed. I always assumed that those classes used the pipe(2) syscall.

kdgregory 2009-01-27 17:09:01

Sorry, it's not by default but I elided post.setContentChunked(true); in my setup. This is not really relevant to my question though, it's only details about the specific example.

Steven Huwig 2009-01-27 17:11:51

Sorry to offend, but you asked the question "why isn't this more common," not "in this particular case, is there a reason this technique isn't used." And in the general case, you're creating a second thread to handle a sequential operation.

kdgregory 2009-01-27 17:12:55

It's not a sequential operation. You can serialize the XML at the same time you write it to the output stream.

Steven Huwig 2009-01-27 17:15:05

Actually, in this case it is a sequential operation. One thread is writing the XML to a stream, the other thread is writing a stream to a stream. In this specific case, there's one unnecessary stream. That's not to say that the technique is not useful in some cases (to be contd)

kdgregory 2009-01-27 17:18:40

The case in which piping from one thread to another is useful is when there is significant text-level processing that will happen in each stage (ie, something similar to a Unix pipeline). In that case, you can (1) logically partition the operations, and (2) benefit from multi-core architectures.

kdgregory 2009-01-27 17:21:04

However, unless you're doing text processing, there's probably a more efficient way to represent and pass the data (eg, ConcurrentLinkedQueue using defined objects).

kdgregory 2009-01-27 17:22:16

This "unnecessary stream" is for using the HttpClient API, which requires an InputStream for request entities.

Steven Huwig 2009-01-27 17:22:47

There is significant text-level processing that will happen in the InputStreamRequestEntity -- namely chunking.

Steven Huwig 2009-01-27 17:23:25

Why don't you do "text-level processing" in a FilterWriter or a FilterOutputStream ?

Dennis Cheung 2009-01-27 17:43:23

That code is part of the HttpClient API, which requires an InputStream.

Steven Huwig 2009-01-27 17:44:18

@kdgregory: your code appears to be an unnecessary class. Why is an unnecessary class preferable to concurrency?

Steven Huwig 2009-01-27 17:44:55

1 - because concurrency increases complexity, and 2 - because it is a piece of reusable functionality (one that should probably get back into HttpClient)

kdgregory 2009-01-27 19:54:29

You're right in this case. Thanks.

Steven Huwig 2009-01-27 20:09:46

Answer 2

+5 A:

From the Javadocs:

Typically, data is read from a PipedInputStream object by one thread and data is written to the corresponding PipedOutputStream by some other thread. Attempting to use both objects from a single thread is not recommended, as it may deadlock the thread.

This may partially explain why it is not more commonly used.

I'd assume another reason is that many developers do not understand it's purpose / benefit.

matt b 2009-01-27 17:06:17

So maybe the real question is "why doesn't more Java code use concurrency?" ...

Steven Huwig 2009-01-27 17:12:51

Concurrency is heavily used in most java code.

iny 2009-01-27 17:34:23

Sadly concurrency is overused where it isn't needed, and underused where it was needed... oops! :)

John Gardner 2009-01-27 17:59:35

@iny, I'd argue that most developers aren't writing concurrent code. Maybe it's running in a concurrent environment, but I think that it is a minority of developers who deal every day with multithreading (and this is probably a good thing)

matt b 2009-01-27 18:27:52

Answer 3

A:

There just isn't that much need for those and there are also other ways to do things.

I guess your example would be better, if the need would be bigger.

iny 2009-01-27 17:36:44

Answer 4

+1 A:

I too only discovered the PipedInputStream/PipedOutputStream classes recently.

I am developing an Eclipse plug-in that needs to execute commands on a remote server via SSH. I am using JSch and the Channel API reads from an input stream and writes to an output stream. But I need to feed commands through the input stream and read the responses from an output stream. Thats where PipedInput/OutputStream comes in.

import java.io.PipedInputStream;
import java.io.PipedOutputStream;

import com.jcraft.jsch.Channel;

Channel channel;
PipedInputStream channelInputStream = new PipedInputStream();
PipedOutputStream channelOutputStream = new PipedOutputStream();

channel.setInputStream(new PipedInputStream(this.channelOutputStream));
channel.setOutputStream(new PipedOutputStream(this.channelInputStream));
channel.connect();

// Write to channelInputStream
// Read from channelInputStream

channel.disconnect();

bmatthews68 2009-01-27 17:48:44

Answer 5

A:

So, what's wrong with this idiom? If there's nothing wrong with this idiom, why haven't I seen it?

EDIT: to clarify, PipedInputStream and PipedOutputStream replace the boilerplate buffer-by-buffer copy that shows up everywhere, and they also allow you to process incoming data concurrently with writing out the processed data. They don't use OS pipes.

You have stated what it does but haven't stated why you are doing this.

If you believe that this will either reduce resources used (cpu/memory) or improve performance then it won't do either. However it will make your code more complex.

Basically you have a solution without a problem for which it solves.

Peter Lawrey 2009-01-27 20:02:27

In this particular case you are right -- there is another way to code it that avoids unbounded memory consumption.In the general case, however, it requires less code to avoid unbounded memory consumption than would equivalent buffer copying code.

Steven Huwig 2009-01-27 20:07:05

What do you see as an "unbounded memory consumption" I have been developing networking solutions for trading system for six years and I have never come across this problem.

Peter Lawrey 2009-01-29 07:23:21

Do your trading systems handle single messages with gigs of payload without running out of space? If so then they have bounded memory consumption; otherwise they have unbounded memory consumption.(Not that I would expect trading systems to do anything but reject messages over a certain size, but believe it or not that's not the case in every domain.)

Steven Huwig 2009-07-20 21:22:43

It is true that trading messages are typically small as latency is important. They can add up fairly quickly and we end up with 10s of gigs of data in memory. However, I am not sure how this is relevant. The solution posted will not help you deal with very large messages as far as I can see, in fact instead of having one copy of the message passed around you will end up with two copies (as the writer cannot complete serialization of a large message and discard the original until the reader has almost read/rebuilt the copy)

Peter Lawrey 2009-07-22 05:52:45

The reader can be streaming to the server as far as I can tell, with content-encoding: chunked. There doesn't need to be a second copy constructed (in this process, anyway).

Steven Huwig 2009-08-21 02:25:32

Answer 6

+1 A:

I tried using these classes a while back for something, I forget the details. But I did discover that their implementation is fatally flawed. I can't remember what it was but I have a sneaky memory that it may have been a race condition which meant that they occasionally deadlocked (And yes, of course I was using them in separately threads: they simply aren't usable in a single thread and weren't designed to be).

I might have a look at their source code andsee if I can see what the problem might have been.

Adrian Pronk 2009-01-27 20:27:45

I found that both ends need to be closed.

Steven Huwig 2009-01-27 21:49:29

Answer 7

+1 A:

Also, back to the original example: no, it does not exactly minimize memory usage either. DOM tree(s) get built, in-memory buffering done -- while that is better than full byte array replicas, it's not that much better. But buffering in this case will be slower; and an extra thread is also created -- you can not use PipedInput/OutputStream pair from within a single thread.

Sometimes PipedXxxStreams are useful, but the reason they are not used more is because quite often they are not the right solution. They are ok for inter-thread communication, and that's where I have used them for what that's worth. It's just that there aren't that many use cases for this, given how SOA pushes most such boundaries to be between services, instead of between threads.

StaxMan 2009-01-27 20:50:59

ansaurus

tags:

views:

answers:

Why doesn't more Java code use PipedInputStream / PipedOutputStream ?

related questions