views:

495

answers:

4

Hello,

I'm working with an ASCII input/output stream over a Socket and speed is critical. I've heard using the right Java technique really makes a difference. I have a textbook that says using Buffers is the best way, but also suggests chaining with DataInputStreamReader.

For output I'm using a BufferedOutputStream with OutputStreamWriter which seems to be fine. But I am unsure on what to use for the input stream. I'm working on new lines, so would Scanner be of any use? Speed is critical, I need to get the data off the network as fast as possible.

Thanks.

PH

A: 

A Scanner is used for delimited text. You didn't talk about what your data looks like so I can't comment on that.

If you just want to read until each newline character, use

BufferedReader r = new BufferedReader(new InputStreamReader(Socket.getInputStream()))

and

r.readLine()

When you get a null value, you will know you have exhausted the data in the stream.

As far as speed is concerned, they are both just reading data out of the stream. So assuming you don't need the extra functionality of a Scanner, I don't see any particular reason to use one.

danben
A: 

I would do something with a BufferedReader along the lines of:

Collection<String> lines = new ArrayList<String>();
BufferedReader reader = new BufferedReader( new InputStreamReader( Foo.getInputStream()));
while(reader.ready())
{
    lines.add( reader.readLine());
}

myClass.processData(lines); //Process the data after it is off the network.

Depending on your situation you could have an additional thread that processes the items in 'lines' as its getting filled, but then you would need to use a different structure to back the collection- one that can be used concurrently.

instanceofTom
using Vector is WRONG for performance, it is syncronized. Use List<String> instead. Also this is assuming that the entire data input set will reside in memory. A better way would to be process each line as it is read.
fuzzy lollipop
Changed it to an ArrayList which is unsyncronized.
instanceofTom
A: 

If speed is absolutely critical, consider using NIO. Here's a code example posted for the exact same question.

http://lists.apple.com/archives/java-dev/2004/Apr/msg00051.html

EDIT: Here's another example

http://www.java2s.com/Code/Java/File-Input-Output/UseNIOtoreadatextfile.htm

EDIT 2: I wrote this microbenchmark to get you started on measuring the performance of various approaches. Some folks have commented that NIO will not perform faster because you will need to do more work to 'massage' the data into a usable form, so you can validate that based on whatever it is you're trying to do. When I ran this code on my machine, the NIO code was approximately 3 times faster with a 45 megabyte file, and 5 times faster with a 100 megabyte file.

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.util.Scanner;

public class TestStuff {

    public static void main(final String[] args)
            throws IOException, InterruptedException {

        final String file_path = "c:\\test-nio.txt";
        readFileUsingNIO(file_path);
        readFileUsingScanner(file_path);

    }

    private static void readFileUsingScanner(final String path_to_file)
            throws FileNotFoundException {
        Scanner s = null;

        final StringBuilder builder = new StringBuilder();
        try {
            System.out.println("Starting to read the file using SCANNER");
            final long start_time = System.currentTimeMillis();
            s = new Scanner(new BufferedReader(new FileReader(path_to_file)));
            while (s.hasNext()) {
                builder.append(s.next());
            }
            System.out.println("Finished!  Read took " + (System.currentTimeMillis() - start_time) + " ms");
        }
        finally {
            if (s != null) {
                s.close();
            }
        }

    }

    private static void readFileUsingNIO(final String path_to_file)
            throws IOException {
        FileInputStream fIn = null;
        FileChannel fChan = null;
        long fSize;
        ByteBuffer mBuf;

        final StringBuilder builder = new StringBuilder();
        try {
            System.out.println("Starting to read the file using NIO");
            final long start_time = System.currentTimeMillis();
            fIn = new FileInputStream("c:\\test-nio.txt");
            fChan = fIn.getChannel();
            fSize = fChan.size();
            mBuf = ByteBuffer.allocate((int) fSize);
            fChan.read(mBuf);
            mBuf.rewind();
            for (int i = 0; i < fSize; i++) {
                //System.out.print((char) mBuf.get());
                builder.append((char) mBuf.get());
            }
            fChan.close();
            fIn.close();
            System.out.println("Finished!  Read took " + (System.currentTimeMillis() - start_time) + " ms");
        }
        catch (final IOException exc) {
            System.out.println(exc);
            System.exit(1);
        }
        finally {
            if (fChan != null) {
                fChan.close();
            }
            if (fIn != null) {
                fIn.close();
            }
        }

    }
Amir Afghani
My textbook says I will only get a perfomance boost if using many threads for different inputs. I only have one input, but the idea entices me for sure.
PH
your book is wrong if you are doing non-trivial processing of the input lines.
fuzzy lollipop
nio is critical for +scalability+, _not_ speed. nio can be slower for a dedicated thread.
james
@James, can you provide some references? Are there any micro benchmarks that indicate NIO is slower with a dedicated thread?
Amir Afghani
Amir: Even if NIO is at least in theory able to fill a direct ByteBuffer with data from the socket faster, the cost of accessing the data in the ByteBuffer vs. accessing a byte array (filled with data from an OutputStream) would in most cases eat up the performance improvement.
jarnbjo
Also, the OP is talking about using a socket, not reading a file -- that's a different problem. NIO mapping is faster for file operations because it leverages the OS's VM system to map the file in to User Space and saves copying. That's not necessarily the case with a socket read.
Will Hartung
+1  A: 

Just for laughs...

socket = new ServerSocket(2004, 10);
connection = socket.accept();
in = connection.getInputStream();
InputStreamReader isr = new InputStreamReader(in);
BufferedReader br = new BufferedReader(isr);
String line = null;
do {
    line = br.readLine();
} while (!"done".equals(line));

With LOOPBACK, i.e. just running to localhost with local processes, on my machine, and with a suitably "stupid" client.

requestSocket = new Socket("localhost", 2004);
out = requestSocket.getOutputStream();
PrintWriter pw = new PrintWriter(out);
String line =  "...1000 characters long..."; 
for (int i = 0; i < 2000000 - 1; i++) {
    pw.println(line);
}
line = "done";
pw.println(line);
pw.flush();

You'll note that this send 2M "1000 char" lines. It's simply a crude throughput test.

On my machine, loopback, I get ~190MB/sec transfer rate. Bytes, not bits. 190,000 lines/sec.

My point is that the "unsophisticated" way using bone stock Java sockets is quite fast. This will saturate any common network connection (meaning the network will be slowing you down more than your I/O here will).

Likely "fast enough".

What kind of traffic are you expecting?

Will Hartung