views:

2873

answers:

11

I set up a server with a ServerSocket, connect to it with a client machine. They're directly networked through a switch and the ping time is <1ms.

Now, I try to push a "lot" of data from the client to the server through the socket's output stream. It takes 23 minutes to transfer 0.6Gb. I can push a much larger file in seconds via scp.

Any idea what I might be doing wrong? I'm basically just looping and calling writeInt on the socket. The speed issue doesn't matter where the data is coming from, even if I'm just sending a constant integer and not reading from disk.

I tried setting the send and receive buffer on both sides to 4Mb, no dice. I use a buffered stream for the reader and writer, no dice.

Am I missing something?

EDIT: code

Here's where I make the socket

System.out.println("Connecting to " + hostname);

 serverAddr = InetAddress.getByName(hostname);

 // connect and wait for port assignment
 Socket initialSock = new Socket();
 initialSock.connect(new InetSocketAddress(serverAddr, LDAMaster.LDA_MASTER_PORT));
 int newPort = LDAHelper.readConnectionForwardPacket(new DataInputStream(initialSock.getInputStream()));
 initialSock.close();
 initialSock = null;

 System.out.println("Forwarded to " + newPort);

 // got my new port, connect to it
 sock = new Socket();
 sock.setReceiveBufferSize(RECEIVE_BUFFER_SIZE);
 sock.setSendBufferSize(SEND_BUFFER_SIZE);
 sock.connect(new InetSocketAddress(serverAddr, newPort));

 System.out.println("Connected to " + hostname + ":" + newPort + " with buffers snd=" + sock.getSendBufferSize() + " rcv=" + sock.getReceiveBufferSize());

 // get the MD5s
 try {
  byte[] dataMd5 = LDAHelper.md5File(dataFile),
      indexMd5 = LDAHelper.md5File(indexFile);

  long freeSpace = 90210; // ** TODO: actually set this **

  output = new DataOutputStream(new BufferedOutputStream(sock.getOutputStream()));
  input  = new DataInputStream(new BufferedInputStream(sock.getInputStream()));

Here's where I do the server-side connection:

 ServerSocket servSock = new ServerSocket();
 servSock.setSoTimeout(SO_TIMEOUT);
 servSock.setReuseAddress(true);
 servSock.bind(new InetSocketAddress(LDA_MASTER_PORT));

 int currPort = LDA_START_PORT;

 while (true) {
  try {
   Socket conn = servSock.accept();
   System.out.println("Got a connection.  Sending them to port " + currPort);
   clients.add(new MasterClientCommunicator(this, currPort));
   clients.get(clients.size()-1).start();

   Thread.sleep(500);

   LDAHelper.sendConnectionForwardPacket(new DataOutputStream(conn.getOutputStream()), currPort);

   currPort++;
  } catch (SocketTimeoutException e) {
   System.out.println("Done listening.  Dispatching instructions.");
   break;
  }
  catch (IOException e) {
   e.printStackTrace();
  }
  catch (Exception e) {
   e.printStackTrace();
  }
 }

Alright, here's where I'm shipping over ~0.6Gb of data.

public static void sendTermDeltaPacket(DataOutputStream out, TIntIntHashMap[] termDelta) throws IOException {
 long bytesTransferred = 0, numZeros = 0;

 long start = System.currentTimeMillis();

 out.write(PACKET_TERM_DELTA); // header  
 out.flush();
 for (int z=0; z < termDelta.length; z++) {
  out.writeInt(termDelta[z].size()); // # of elements for each term
  bytesTransferred += 4;
 }

 for (int z=0; z < termDelta.length; z++) {
  for (int i=0; i < termDelta[z].size(); i++) {
   out.writeInt(1);
   out.writeInt(1);
  }
 }

It seems pretty straightforward so far...

+1  A: 

You should download a good packet sniffer. I'm a huge fan of WireShark personally and I end up using it every time I do some socket programming. Just keep in mind you've got to have the client and server running on different systems in order to pick up any packets.

Spencer Ruport
+6  A: 

Maybe you should try sending ur data in chunks(frames) instead of writing each byte seperately. And align ur frames with the tcp packet size for best performance

Midhat
This small size of each write is the root of the problem.As the link is relatively fast it will send a packet afet every write of an integer. This packet will be wrapped in an IP packet and the TCP overhead will be incurred. Sending packets about 1K at a time will speed yhings up nicely.
James Anderson
A: 

Can you try doing this over loopback, it should then transfer the data in second.

If it takes minutes, there is something wrong with your application. If is only slow sending data over the internet it could be you network link which is slow.

My guess is that you have a 10 Mb/s network between your client and your server and this is why your transfer is going slowly. If this is the case, try using a DeflatoutOutputStream and an InflatorInputStream for your connection.

Peter Lawrey
+1  A: 

How are you implementing the receiving end? Please post your receiving code as well.

Since TCP is a reliable protocol, it will take steps to make sure the client is able to receive all of the data sent by the sender. This means that if your client cannot get the data out of the data receive buffer in time, then the sending side will simply stop sending more data until the client has a chance to read all the bytes in the receiving buffer.

If your receiving side is reading data one byte at a time, then your sender probably will spend a lot of time waiting for the receiving buffer to clear, hence the long transfer times. I'll suggest changing your receiving code to reading as many bytes as possible in each read operation . See if that will solve your problem.

futureelite7
A: 

How is your heap size set? I had a similar problem recently with the socket transfer of large amounts of data and just by looking at JConsole I realized that the application was spending most of its time doing full GCs.

Try -Xmx1g

oxbow_lakes
May I say that it *really* annoys me when people downvote a reasonable answer without leaving some reason in the comments section. Given that I had this exact problem, this is a perfectly reasonable suggestion!
oxbow_lakes
A: 

Things to try:

  • Is the CPU at 100% while the data is being sent? If so, use visualvm and do a CPU profiling to see where the time is spent
  • Use a SocketChannel from java.nio - these are generally faster since they can use native IO more easily - of course this only helps if your operation is CPU bound
  • If it's not CPU bound, there's something going wrong at the network level. Use a packet sniffer to analyze this.
Michael Borgwardt
A: 

Hey, I figured I'd follow up for anyone that was interested.

Here's the bizarre moral of the story:

NEVER USE DataInputStream/DataOutputStream and sockets!!

If I wrap the socket in a BufferedOutputStream/BufferedInputStream, life is great. Writing to it raw is just fine.

But wrap the socket in a DataInputStream/DataOutputStream, or even have DataOutputStream(BufferedOutputStream(sock.getOutputStream())) is EXTREMELY SLOW.

An explanation for that would be really interesting to me. But after swapping everything in and out, this is what's up. Try it yourself if you don't believe me.

Thanks for all the quick help, though.

DataInputStream doesn't buffer. If you write say 1 or some other very low number of bytes to a non buffered stream, performance will go down the drain. You also call get/setReceiveBufferSize. That's something you only do when you KNOW you are smarter than the TCP stack
nos
+5  A: 

You do not want to write single bytes when you are transferring large amounts of data.

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.ServerSocket;
import java.net.Socket;

public class Transfer {

    public static void main(String[] args) {
     final String largeFile = "/home/dr/test.dat"; // REPLACE
     final int BUFFER_SIZE = 65536;
     new Thread(new Runnable() {
      public void run() {
       try {
        ServerSocket serverSocket = new ServerSocket(12345);
        Socket clientSocket = serverSocket.accept();
        long startTime = System.currentTimeMillis();
        byte[] buffer = new byte[BUFFER_SIZE];
        int read;
        int totalRead = 0;
        InputStream clientInputStream = clientSocket.getInputStream();
        while ((read = clientInputStream.read(buffer)) != -1) {
         totalRead += read;
        }
        long endTime = System.currentTimeMillis();
        System.out.println(totalRead + " bytes read in " + (endTime - startTime) + " ms.");
       } catch (IOException e) {
       }
      }
     }).start();
     new Thread(new Runnable() {
      public void run() {
       try {
        Thread.sleep(1000);
        Socket socket = new Socket("localhost", 12345);
        FileInputStream fileInputStream = new FileInputStream(largeFile);
        OutputStream socketOutputStream = socket.getOutputStream();
        long startTime = System.currentTimeMillis();
        byte[] buffer = new byte[BUFFER_SIZE];
        int read;
        int readTotal = 0;
        while ((read = fileInputStream.read(buffer)) != -1) {
         socketOutputStream.write(buffer, 0, read);
         readTotal += read;
        }
        socketOutputStream.close();
        fileInputStream.close();
        socket.close();
        long endTime = System.currentTimeMillis();
        System.out.println(readTotal + " bytes written in " + (endTime - startTime) + " ms.");
       } catch (Exception e) {
       }
      }
     }).start();
    }
}

This copies 1 GiB of data in short over 19 seconds on my machine. The key here is using the InputStream.read and OutputStream.write methods that accept a byte array as parameter. The size of the buffer is not really important, it just should be a bit larger than, say, 5. Experiment with BUFFER_SIZE above to see how it effects the speed but also keep in mind that it probably is different for every machine you are running this program on. 64 KiB seem to be a good compromise.

Bombe
Awesome-- thank you so much. See my comment to ignasi35 below about what could be wrong with wrapping a socket's streams in DataxxxxStream. I have no idea why that's so slow, especially because transfers are on the order of seconds even using just a 4-byte buffer to push the integers vs DataxxxxxStream.yyyyInt(), which should (hopefully) be doing that 4-byte buffering behind the scenes, but apparently is doing something totally crazy instead
A: 

@Erik: using DataXxxputStream is not the problem here. Problem is you were sending data in too small chunks. Using a buffer solved your problem because even you would write bit by bit the buffer would solve the problem. Bombe's solution is much nicer, generic and faster.

ignasi35
even wrapping the socket's output stream in a BufferedOutputStream did not help, though. Let's look at this:int --> DataOutputStream --> Socket's output stream.An int (4 bytes) goes in, gets turned into individual bytes (probably as an array) and gets pushed to the socket. This is EXTREMELY slow. 23 min for 0.6Gb.Now we do:int --> 4-byte array --> socket's output stream0.6Gb in ~3 seconds.What could DataOutputStream be doing that's so different from what I'm doing there? What could it do that's more complicated than turning an int into an array and feeding it to the stream?
whoops, looks like StackOverflow eats newlines. that's not formatted how I'd like, but I hope you get the gist.
A: 

USe Byte buffer for sending the data

Nasif
A: 

Any updates on this?. Even I have observed slow data trasnfer with DataOutputStream along with BufferedOutputStream on socket.getOutputStream.

rajesh