views:

152

answers:

2

I am writing a Java application that among other things needs to read a dictionary text file (each line is one word) and store it in a HashSet. Each time I start the application this same file is being read all over again (6 Megabytes unicode file).

That seemed expensive, so I decided to serialize resulting HashSet and store it to a binary file. I expected my application to run faster after this. Instead it got slower: from ~2,5 seconds before to ~5 seconds after serialization.

Is this expected result? I thought that in similar cases serialization should increase speed.

+3  A: 

It's not a question of one serialization mechanism or another, it's a question of the data structure you are serializing.

You have one very efficient, natural representation of these words: a simple list, in the text file. That's fast to read.

You have created a data structure to store them which is different: a hash table. It takes more memory to represent a hash table. However the benefit is that it's very fast to look for a word, compared to a simple list.

But that tradeoff means serialization gets slower as well, since the naive serialization of a hash table will serialize more data and be larger, and therefore slower.

I think you should stick with the simple reading of the text file.

Sean Owen
That seems logical. Except serialized binary file didn't get much larger than original text file 6.536.068 to 6.879.332.
celicni
While it wasn't *longer*, it was *more complex*. That's what is slowing down.
Lo'oris
OK, thanks. I'll take the advice and stick with simple reading.
celicni
+2  A: 

@Sean's answer is correct. Java serialization/deserialization has significant performance overheads. If you need to make the dictionary loading faster (or ...), consider the following approaches:

  • Using the java.nio.* classes to read the file may speed things up.
  • If the application doesn't necessarily need the dictionary to be loaded instantly on startup, consider using a separate thread to do the dictionary loading asynchronously. The dictionary loading is no faster, but (for example) the application's GUI starts faster anyway.
Stephen C
Using the separate thread will work for me. Thanks for the idea.
celicni