views:

452

answers:

3

Hi All,

Is there a way to convert a byte array to string other than using new String(bytearray)? The exact problem is I transmit a json-formatted string over the network through UDP connection. At the other end, I receive it in a fixed-size byte array(as I am not aware of the array size) and create a new string out of the byte array. If I do this, the whole memory that I allocated is being held unnecessary.

To avoid this I get the byte array convert it to string, truncate the string till the last valid character and then convert it to a byte array and create a new string out of it. If I do this, it just uses up the required memory but the garbage collection frequency becomes so high as it involves more number of allocations. What is the best way to do this?

+1  A: 

Would something like:

String s = new String( bytearray, 0, lenOfValidData, "US-ASCII");

do what you want (change the charset to whatever encoding is appropriate)?


Update:

Based on your comments, you might want to try:

socket.receive(packet);
String strPacket = new String( packet.getData(), 0, packet.getLength(), "US-ASCII");
receiver.onReceive( strPacket);

I'm not familiar enough with Java's datagram support to know if packet.getLength() returns the truncated length or the original length of the datagram (before truncation to fit in the receive buffer). It might be safer to create the string like so:

String strPacket = new String( packet.getData(), 
                               0, 
                               Math.min( packet.getLength(), packet.getData().length),
                               "US-ASCII");

Then again, it might be unnecessary.

Michael Burr
The problem here is the length of the string in the packet varies. So I am not aware of the lenOfValidData here. Is there a way to do that without knowing that? And moreover using new String is causing lot of GCs as my string size is 8k-10k normally.
rajaramyadhav
How do you determine the length of the valid data now (somehow after the 1st conversion to a string)? As far as avoiding `new String` - if you need the data in a string, that'll have to happen at some point (even if it's hidden inside of a method that returns a string object). What you should be able to do is avoid creation of intermediate String and/or byte array objects.
Michael Burr
Also - it seems to me that whatever is filling in the byte array from the UDP packet should let you know how much data it put into the buffer/array. you should use that information as the `lenOfValidData` argument.
Michael Burr
I know the last character of my String as it is json formatted. So I can use substring() method on String to get only the required characters.
rajaramyadhav
These are the two snippets I tried.#1socket.receive(packet);String strPacket = new String(packet.getData());receiver.onReceive(strPacket.substring(0,strPacket.indexOf('}')+1)));This one holds up unnecessary memory.#2socket.receive(packet);String strPacket = new String(packet.getData());receiver.onReceive(new String(strPacket.substring(0,strPacket.indexOf('}')+1)).getBytes());This one causes GC twice as frequently as the first one.
rajaramyadhav
As far as I know, socket.receive(packet) does not return the length of received byte array. Is there a different method? I am using java.net.DatagramSocket
rajaramyadhav
@rajaramyadhav - that's incorrect. See my answer.
Stephen C
UDP seems to be the wrong protocol to use if your packet size is 8k-10k. You pretty much lose all the advantages of UDP when the packets have to be fragmented.
ZZ Coder
socket.receive(packet);String strPacket = new String( packet.getData(), 0, packet.getLength(), "US-ASCII");receiver.onReceive( strPacket);I tried this. It worked fine. Thanks for your suggestion
rajaramyadhav
A: 

You could avoid the second String creation by using a StringBuilder. I imagine Your data receiving process to look like this:

  1. Get the (fixed size) byte array at client side.
  2. Create a StringBuilder object.
  3. Loop over the array as long as You read valid characters and append them to the StringBuilder object.
  4. The byte array can be thrown away now. (I would rather keep it though for the next time You receive something over the network in order to avoid unnecessary memory allocations.)
Edit

I followed the suggestion of Tofubeer to use a StringBuilder instead of a StringBuffer.

Dave
StringBuilder instead, faster since it is not synchronized. also you could create a single StringBuilder and set the length to zero at the end.
TofuBeer
Actually, if you look at how StringBuilder and StringBuffer manage their internal character array, you'll see that they will possibly over-allocate as well.
Stephen C
@Dave: The string is too long. Probably 8k-10k characters long. Is it a good idea to loop over it and create a StringBuilder? Kindly let me know if I have understood wrongly.
rajaramyadhav
Also, the final `StringBuilder.toString()` call will entail copying the character array again.
Stephen C
@rajaramyadhav Since the only way to determine the length of Your string is by looking at its characters, I can't imagine how You avoid looping AND create a minimal length string at the same time. Judging by Stephen C's answer the String(byte[] bytes, int offset, int length, String charsetName) constructor seems to not have any hidden allocations, so I suggest You try his approach.
Dave
+2  A: 

The simplest and most reliable way to do this is to use the length of the packet that you read from the UDP socket. The javadoc for DatagramSocket.receive(...) says this:

Receives a datagram packet from this socket. When this method returns, the DatagramPacket's buffer is filled with the data received. The datagram packet also contains the sender's IP address, and the port number on the sender's machine.

This method blocks until a datagram is received. The length field of the datagram packet object contains the length of the received message. If the message is longer than the packet's length, the message is truncated.

If you cannot do that, then the following will allocate a minimum sized String with no unnecessary allocation of temporaries.

  byte[] buff = ... // read from socket.

  // Find byte offset of first 'non-character' in buff
  int i;
  for (i = 0; i < buff.length && /* buff[i] represents a character */; i++) { /**/ }

  // Allocate String
  String res = new String(buff, 0, i, charsetName);

Note that the criterion for determining a non-character is character set and application specific. But probably testing for a zero byte is sufficient.

EDIT

What does the javadoc exactly mean by "The length of the new String is a function of the charset, and hence may not be equal to the length of the subarray."

It is pointing to the fact that for some character encodings (for example UTF-8, UTF-16, JIS, etc) some characters are represented by two or more bytes. So for example, 10 bytes of UTF-8 might represent fewer than 10 characters.

Stephen C
+1 What does the javadoc exactly mean by "The length of the new String is a function of the charset, and hence may not be equal to the length of the subarray." (that's the comment belonging to the constructor You suggested.)
Dave
thx for the clarification!
Dave
@Stephensocket.receive(packet);String strPacket = new String( packet.getData(), 0, packet.getLength(), "US-ASCII");receiver.onReceive( strPacket);This worked fine for me. Can you please explain me when I should check for zero byte?
rajaramyadhav