What is faster?
out.writeObject(someString) or out.writeUTF(someString)
views:
863answers:
6No idea.
Time both of these and it'll tell you faster than we can.
for(int i=0; i<100000; i++) {
out.writeObject(someString);
}
for(int i=0; i<100000; i++) {
out.writeUTF(someString);
}
I would assume that the result may depend on the contents of someString
. It occurs to me that it wouldn't an unreasonable result to find that writeUTF
performance changes as the higher unicode points are used such that the output is multi byte.
Please note this is unproven and is just an idle thought.
I wrote a test case, and as expected (because Java natively uses UTF-16 characters) writeObject is faster. Another reason is because "Note that there is a significant difference between writing a String into the stream as primitive data or as an Object. A String instance written by writeObject is written into the stream as a String initially. Future writeObject() calls write references to the string into the stream." See the writeObject documentation.
EDIT: However, writeUnshared is still faster than writeUTF
100000 runs of writeObject: 464
100000 runs of writeUnshared: 5082
100000 runs of writeUTF: 7541
import java.io.*;
public class WriteString
{
private static int RUNS = 100000;
private static int STR_MULTIPLIER = 100;
public static void main(String[] a) throws Throwable
{
StringBuilder builder = new StringBuilder(26 * STR_MULTIPLIER);
for(int i = 0; i < STR_MULTIPLIER; i++)
{
builder.append("abcdefghijklmnopqrstuvwxyz");
}
String str = builder.toString();
File f = new File("oos");
ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream(f));
long startObject = System.currentTimeMillis();
for(int i = 0; i < RUNS; i++)
{
oos.writeObject(str);
oos.flush();
}
long endObject = System.currentTimeMillis();
System.out.println(RUNS + " runs of writeObject: " + (endObject - startObject));
long startUnshared = System.currentTimeMillis();
for(int i = 0; i < RUNS; i++)
{
oos.writeUnshared(str);
oos.flush();
}
long endUnshared = System.currentTimeMillis();
System.out.println(RUNS + " runs of writeUnshared: " + (endUnshared - startUnshared));
long startUTF = System.currentTimeMillis();
for(int i = 0; i < RUNS; i++)
{
oos.writeUTF(str);
oos.flush();
}
long endUTF = System.currentTimeMillis();
System.out.println(RUNS + " runs of writeUTF: " + (endUTF - startUTF));
oos.close();
f.delete();
}
}
There are two things I want people to learn from this quesiton: Java Serialisation is slow - live with it. Microbenchmarks are worse than failure.
Microbenchmarks tend to be misleading. There are some things that are worth doing as a general idiom (for instance, hoisting strlen out of loop in C). Optimisers have a habit of breaking microbenchmarks. Take your application and profile it under real load. If a piece of code is causing your program to slow down, don't bother to optimise it. Microbenchmarks will not help you find these places.
writeObject and writeUTF don't do the same thing. writeObject indicates what type of object it is going to write. Also writeObject just writes a back reference if the same object (string) has been written since the last reset. writeUnshared is closer to writeUTF.
So if you continue to write exactly the same long String writeObject should win because it just needs to write a back reference. Reducing serialised size may reduce file/network bandwidth or just memory, which may result in more significant performance improvements. For short strings, just writing out the data will be faster. writeUnshared should give almost writeUTF performance, but maintaining generality.
Note, in all cases that data is written as UTF-8 not UTF-16. If you want UTF-16 String.toCharArray
or similar will do.
You will get better performance with DataOutputStrema.writeUTF() than ObjectOutputStream.writeUTF().
You should be aware that writeUTF can only handle Strings with a length < 65535...