views:

230

answers:

4

Hi!

I am intending to serialize and deserialize a hashmap whose key is a string.

From Josh Bloch's Effective Java, I understand the following. P.222 "For example, consider the case of a harsh table. The physical representation is a sequence of hash buckets containing key-value entries. Which bucket an entry is placed in is a function of the hash code of the key, which is not, in general guaranteed to be the same from JVM implementation to JVM implementation. In fact, it isn't even guranteed to be the same from run to run on the same JVM implementation. Therefore accepting the default serialized form for a hash table would constitute a serious bug. Serializing and deserializing the hash table could yield an object whose invariants were seriously corrupt."

My questions are: 1) In general, would overriding the equals and hashcode of the key class of the map resolve this issue and the map can be correctly restored?

2) If my key is a String and the String class is already overriding the hashCode() method, would I still have problem described above. (I am seeing a bug which makes me think this is probably still a problem even though the key is String with overriding hashCode.)

3)Previously, I get around this issue by serializing an array of entries (key, value) and when deserializing I would reconstruct the map. I am wondering if there is a better approach.

4) If the answers to question 1 and 2 are that I still can't be guaranteed. Could someone explain why? If the hashCodes are the same would they go to the same buckets across JVMs?

Thanks, Grace

+4  A: 

I'm 99% sure that the JVM implementation of HashMap and HashSet handle this issue. They have a custom serialization and deserialization handler. I don't have Bloch's book in front of me now, but I believe he is explaining the challange, not saying that you can't reliably serialize a java.util.HashMap in practice.

Yishai
A: 
erickson
What's your view on the below post which suggests something extra needs to be done to serialize a hashmap correctly. Thanks. Do you believe he is using an incorrectly implemented hashtable? Thanks.http://obscured.info/2007/02/15/serializable-override-hashcode/
Grace K
Yes, his hash table is broken, not his key. Or perhaps he simply misdiagnosed the problem altogether.
erickson
+2  A: 

The serialization form of java.util.HashMap doesn't serialize the buckets themselves, and the hash code is not part of the persisted state. From the javadocs:

Serial Data: The capacity of the HashMap (the length of the bucket array) is emitted (int), followed by the size of the HashMap (the number of key-value mappings), followed by the key (Object) and value (Object) for each key-value mapping represented by the HashMap The key-value mappings are emitted in the order that they are returned by entrySet().iterator().

from http://java.sun.com/j2se/1.5.0/docs/api/serialized-form.html#java.util.HashMap

The persisted state basically comprises the keys and values and some housekeeping. When deserialized, the hashmap is completely rebuilt; the keys are rehashed and placed in appropriate buckets.

So, adding Stirng keys should work just fine. I would guess your bug lies elsewhere.

EDIT: Here's a junit 4 test case that serializes and deserializes a map, and minics VMs changing hashcodes. The test passes, despite the hashcodes being different after deserialization.

import org.junit.Assert;
import org.junit.Test;

import java.io.*;
import java.util.HashMap;

public class HashMapTest
{
    @Test
    public void testHashMapSerialization() throws IOException, ClassNotFoundException
    {
        HashMap map = new HashMap();
        map.put(new Key("abc"), 1);
        map.put(new Key("def"), 2);

        ByteArrayOutputStream out = new ByteArrayOutputStream();
        ObjectOutputStream objOut = new ObjectOutputStream(out);
        objOut.writeObject(map);
        objOut.close();
        Key.xor = 0x7555AAAA; // make the hashcodes different
        ObjectInputStream objIn = new ObjectInputStream(new ByteArrayInputStream(out.toByteArray()));
        HashMap actual = (HashMap) objIn.readObject();
        // now try to get a value
        Assert.assertEquals(2, actual.get(new Key("def")));
    }

    static class Key implements Serializable
    {
        private String  keyString;
        static int xor = 0;

        Key(String keyString)
        {
            this.keyString = keyString;
        }

        @Override
        public int hashCode()
        {
            return keyString.hashCode()^xor;
        }

        @Override
        public boolean equals(Object obj)
        {
            Key otherKey = (Key) obj;
            return keyString.equals(otherKey.keyString);
        }
    }

}
mdma
What's your view on this post which suggests something extra needs to be done to serialize a hashmap correctly. Thanks. (I feel that it aligns with Josh Bloch's paragraph quoted above.) http://obscured.info/2007/02/15/serializable-override-hashcode/
Grace K
The article says to override hashCode/equals when implementing Serializable, but actually the rule is more general - override if your object is going to be used as the key in a Map. See my edit for a test case to demonstrate that serializing a map works even under changing hashcodes.
mdma
A: 

If all else fails, can you serialize your Map using JSON or YAML or XML or something?

skiphoppy
Thanks. I have used 3) that I mentioned in the post above which works just fine. I just want to get better understanding of my other questions.
Grace K