tags:

views:

603

answers:

3

We are working to reduce the latency and increase the performance of a process written in Java that consumes data (xml strings) from a socket via the readLine() method of the BufferedReader class. The data is delimited by the end of line separater (\n), and each line can be of a variable length (6KBits - 32KBits). Our code looks like:

Socket sock = connection;
InputStream in = sock.getInputStream();
BufferedReader inputReader = new BufferedReader(new InputStreamReader(in));
...
do 
{
   String input = inputReader.readLine();
   // Executor call to parse the input thread in a seperate thread
}while(true)

So I have a couple of questions:

  • Will the inputReader.readLine() method return as soon as it hits the \n character or will it wait till the buffer is full?
  • Is there a faster of picking up data from the socket than using a BufferedReader?
  • What happens when the size of the input string is smaller than the size of the Socket's receive buffer?
  • What happens when the size of the input string is bigger than the size of the Socket's receive buffer?

I am getting to grips (slowly) with Java's IO libraries, so any pointers are much appreciated.

Thank you!

+1  A: 

The answer to your first question is yes and no. If the buffer already contains the line terminator it will return immediately, however if it does not contain the terminator then it will try to fill the buffer, but not necessarily fully. It will only read until there is some new data (at least one char) or EOF is reached.

One of the nice things about java is that the libraries are open source, so if you have a full copy of the JDK you can look at the source yourself to answer these types of questions. I use eclipse as my IDE and by default if you place the cursor over a class name and press F3 it will take you to the source (this is how I obtained the answer above). The caveat is with the standard distribution the source for some of the internal classes / native code is not available.

For your second question, I would say generally no, as the logic used by BufferedReader is the generally the same any code would need to recreate to achieve the same task. The only thing that might slow BufferedReader is internally it uses a StringBuffer, which is synchronized, instead of the unsynchronized StringBuilder.

M. Jessup
+4  A: 

Will the inputReader.readLine() method return as soon as it hits the \n character or will it wait till the buffer is full?

  • It will return as soon as it gets a newline.

Is there a faster of picking up data from the socket than using a BufferedReader?

  • BufferedReader entails some copying of the data. You could try the NIO apis, which can avoid copying, but you might want to profile before spending any time on this to see if it really is the I/O that is the bottleneck. A simpler quick fix is to add a BufferedInputStream around the socket, so that each read is not hitting the socket (It's not clear if InputStreamReader does any buffering itself.) e.g.

    new BufferedReader(new InputStreamReader(new BufferedInputStream(in)))

What happens when the size of the input string is smaller than the size of the Socket's receive buffer?

  • The BufferedReader will fetch all the data availalbe. It will then scan this data to look for the newline. The result is that subsequent reads may already have the data in the BufferedReader.

What happens when the size of the input string is bigger than the size of the Socket's receive buffer?

  • The bufferedReader will read what is in the recieve buffer, and as there is no newline or the end of stream is reached, it will continue to read data from the socket until it finds EOF or a newline. Subsequent reads may block until more data becomes available.

To sum up, BufferedReader blocks only when absolutely necessary.

mdma
Thanks you for your detailed answer.
Luhar
No worries. I hope you get the improved performance you're looking for with the suggested changes. If not, try profiling, and if still no luck, you can always post another question asking for help with improving the performance :-) Good luck!
mdma
+1  A: 

One of the advantages of the BufferedReader is that it provides a layer of separation (the buffer) between the input methods (read, readLine, etc.) you use and the actual socket reads, so you don't have to worry about all the cases like "most of the line is in the buffer, but you need to read another buffer to get the \n" etc.

Have you done performance measurement that indicates that using a BufferedReader is a performance issue for your application? If not, I would suggest that you start by choosing an input method which provides the functionality you want (line-based input terminated by \n's, from the sound of it), and worry about if there's a "faster" way to do it only if you find the input method is a bottleneck.

If line-based input is really what you're after, you're going to end up using some kind of buffer like BufferedReader does, so why re-invent this wheel?

David Gelhar
Thank you for your answer. We have done a significant amount of profiling on the application and we discovered that there can be a delay of a few milliseconds when processing tiny messages. Given the API documentation of the BufferedReader, it doesn't seem to make any sense! We have disabled the Nagle algorithm by setting the TcpNoDelay flag, and are looking at other alternatives.
Luhar
Interesting. The BufferedReader will certainly involve an extra copy of the data, but it's hard to see how that could take milliseconds...
David Gelhar