views:

108

answers:

4

Hi

I am working in a highly distributed environment. A lot of network access and a lot of db access.

I have some classes that are send over and over the network , and are serialized and de-serialized.

Most of the classes are quite simple in their nature , like :

class A{
long a;
long b;
}

And some are more complex ,( Compound - Collections ).

There are some people in the company I work that claim that all the classes should implement externalizable rather than Serializable , and that would make a major impact on the performance of the application.

Although the impact on the performance is very difficult to measure , since the application is so big and so distributed and not fully ready , I cant really simulate a full load right now.

So maybe some of you know some interesting article that would reveal anything to me. Or maybe you can share some thoughts.

My basic intuition was that is would not make any difference serializing and deserializing simple classes (like the one above) over the network/db, lets say when the IO process of the whole app are around 10%. ( I mean 90% of the time the system is doing other stuff than IO )

+2  A: 

My basic intuition was that is would not make any difference serializing and deserializing simple classes (like the one above) over the network/db, lets say when the IO process of the whole app are around 10%. ( I mean 90% of the time the system is doing other stuff than IO )

Your intuition sounds reasonable. But what exactly is taking 10% of the time? Is it just the serialization / deserialization? Or does the 10% include the real (clock) time to do the I/O?

EDIT

If you have actual profiling measurements to back up your "10% to 15%" clock time doing serialization + deserialization + I/O, then logic tells you that the maximum performance improvement you can get will be less than that. If you can separate the I/O from the serialization / deserialization, you can refine that upper bound. My guess is that the actual improvement will be less than 5%.

I suggest that you create a small benchmark to send and receive one of your data types using serialization and externalization and see what percentage difference it actually makes.

It must be said that there is a (relatively) significant overhead in generic serialization versus optimally implemented externalization. A lot of this due to the general properties of serialization.

  • There is the overhead of marshaling / unmarshaling the type descriptors for each class used in the object being transmitted.

  • There is the overhead of adding each marshaled object to a hash table so that the serialization faithfully records cycles, etc.

However, serialization / deserialization is only a small part of the total I/O overheads, and these are only a small part of your application.

Stephen C
100 % is the overall clock spend in the CPU ,and 10% - 15% on doing IO.
Roman
+1  A: 

I would ask them to come up with some measurements to support their claims. Then everybody will have a basis for a rational discussion. At present you don't have one. Note that it is those with the claims that should produce the supporting evidence: don't get sucked into being respnsible for proving them wrong.

EJP
I liked : don't get sucked into being respnsible for proving them wrong
Roman
A: 

Java serialization is flexible and standard, but its not designed to be fast, especially for simple objects. If you want speed I suggest you try hessian or protobuf. These can be 5x faster for simple objects. Or youc an write a custom serializer which can be as much as 10x faster.

Peter Lawrey
I know about Hessian Or Json Or ProtoBuff. I cannot use it here. I writing object to the cloud and i cannot provide a custom serializer , so my only option is to use Serialazable and Externalizable
Roman
+1  A: 

This is a pretty good website comparing many different Java serialization mechanisms.

http://github.com/eishay/jvm-serializers/wiki

Kevin Stembridge