ansaurus

Question

Answer 1

A:

I could be way off here, but that looks about as good as you'll get. That's basically cookie cutter sync-map accessors there.

glowcoder 2010-08-09 20:02:09

Can you explain what do you meant by cookie cutter sync-map accessor there ?

Rachel 2010-08-09 20:05:44

Answer 2

+3 A:

Dumping all the synchronized keywords and defining dict as a ConcurrentHashMap might be worth trying.

BlairHippo 2010-08-09 20:02:18

Should `dict` be a private `final` type ? What would be the consequences of having it without `final` ?

Rachel 2010-08-09 20:04:35

It also wouldn't be thread-safe.

Tom Hawtin - tackline 2010-08-09 20:05:55

Is there any major flaw in code ?

Rachel 2010-08-09 20:08:10

@Tom: I believe John V.'s rewrite of addToDictionary corrects the only multi-threading vulnerability associated with simply dropping in CHM (though I welcome correction and request clarification if I'm wrong). The fact that this code doesn't have any mechanism for deleting entries simplifies things considerably.

BlairHippo 2010-08-09 20:14:44

Yes, although `translate` is inefficient and will break if, say, a remove method is added.

Tom Hawtin - tackline 2010-08-09 21:08:16

Answer 3

+6 A:

First thing you want to do is get rid of all synchronized key words.

The easiest way of doing that is be declaring dict to be a ConcurrentHashMap:

private final ConcurrentMap<String,String> dict = new ConcurrentHashMap<String,String>();

Doing this you can immidiatly remove the synchronized part of translate so it would look like:

 public String translate (String word) throws IllegalArgumentException { ..

The reason for this is the contract in which the CCHM holds regarding up to date reads.

Finally, add to dictionary can look like:

 public void addToDictionary (String word, String translation) throws IllegalArgumentException {
            if (dict.putIfAbsent(word,translation)!=null) {
                throw new IllegalArgumentException(word + " already exists.");
            }
        }

Also remove synchronized from getAllWords as well.

Edit: After thinking about tom's comment. Having a double look up in this 'exception case' probably isnt worth it. If the case didnt throw an exception then it would be appropriate.

John V. 2010-08-09 20:05:31

I'm confused about that `addToDictionary` implementation. You are optimising the path that throws an exception and doubling the cost of the happy path?

Tom Hawtin - tackline 2010-08-09 20:12:41

@John V. - What would be the unit test case for the problem statement ?

Rachel 2010-08-14 15:09:46

Answer 4

A:

if you have more reads than writes (that's usually the case) consider using ReadWriteLock this way readers are not blocking each other.

LiorH 2010-08-09 20:08:10

Answer 5

+1 A:

Constructing and throwing exception is slow, so don't do that.
Make sure you only use a single map operation in each method, rather than doubling the look up.
If significantly used concurrently, use ConcurrentHashMap instead of synchronized.

Note, the getAllWords method is not thread-safe, or at least, the Set it returns is not.

Tom Hawtin - tackline 2010-08-09 20:08:59

but if am using synchronized than getAllWords should be thread safe right ?

Rachel 2010-08-09 20:10:14

if am not constructing and throwing exceptions than how can I deal with situation where I want to throw exceptions ?

Rachel 2010-08-09 20:11:13

Also can you elaborate on your 2nd point, I am not getting it clearly ?

Rachel 2010-08-09 20:12:11

1. Exceptions are mostly used for exceptional conditions, so why should we care about their performance? That's the strangest java practice I've ever seen.

Nikita Rybak 2010-08-09 20:12:26

@Rachel In performance terms it depends how often you are going to throw an exception. If you want good performance and be throwing exceptions, you are going to have to drop one constraint or the other. As for point 2, you seem to be doing two lookup in to the hash map where one would do (you may need the extended operations in `ConcurrentHashMap` for this).

Tom Hawtin - tackline 2010-08-09 20:42:38

@Tom - Can you give an example for 2 point, extended operation in ConcurrentHashMap ?

Rachel 2010-08-09 21:09:03

@Rachel The `putIfAbsent` as used in John V's updated answer. `ConcurrentMap` also adds a `remove` and two `replace` methods. All the new methods add an atomic check and mutate.

Tom Hawtin - tackline 2010-08-09 21:47:46

@Tom - Why is getAllWords method not thread-safe, if am using synchronized keyword than it should internally thread-safe that method, is that correct or am I missing out anything here ?

Rachel 2010-08-09 23:46:50

@Rachel You are returning the `Set` outside of the method and hence outside of the `synchronized` block. So any operations performed on the set will not be within the `synchronized` block, and hence unsafe.

Tom Hawtin - tackline 2010-08-10 00:50:53

@Tom - I do not see than any value in having `getAllWords` as `synchronized` or is there a reason to have it `synchronized` ?

Rachel 2010-08-10 02:46:00

@Rachel The `keySet` method is not thread safe in `HashMap`. Just making the method `synchronized` does not solve the problem as to be useful the `Set` needs to be used.

Tom Hawtin - tackline 2010-08-10 06:12:44

The important point is that the `keySet` method returns a *`Set` view* of the map, not a newly created `Set`. So whatever you will do with the returned `Set` will also influence the map. And at this point, you can call any methods without synchronization.

Roland Illig 2010-08-21 05:26:05

Answer 6

A:

There are a lot of efficient ways to store dictionaries. Using heavyweight things like Java's default HashMap and String objects is not one of them.

So, sure, you can get rid of the synchronized keyword and try gain a tiny bit of speed left and right by working around Java idiosynchrasies.

Sure, a Map's contains is O(1)... But the map resizing when you put millions of String in it ain't O(1) ;)

Food for thought: determining if a word is present using, say, a Trie, will probably be faster than simply computing a String's hashcode (I'm not saying a trie is what you need is: all I'm saying is: there's more than "let's use a HashMap, it's O(1) so you can't beat that"-meets-the-eye.

And I can tell you that, say, Google's 'translate' and Google's 'find-as-you-type' are definitely not implemented by storing millions of Java String object in I-need-constant-resizing-and-I-resize-very-slowly Java HashMaps.

What are your requirements? How many words? How many languages to be supported?

NoozNooz42 2010-08-09 21:58:22

Java's String.hashCode() (at least in the Sun classes) is incredibly fast. So don't mock about that.

Roland Illig 2010-08-21 05:21:58

Answer 7

+1 A:

When you say improve the performance, do you have any idea about the usage statistics? How many writes to reads for instance, and how large the internal map is?

If the number of reads is proportionally high and the map is populated mostly at startup (and isn't enormous), a copy-on-write strategy might be your best bet. We have used (and maintain) a CopyOnWriteMap that has better performance for concurrent reads than ConcurrentHashMap (by about 10% in our tests).

Jed Wesley-Smith 2010-08-10 05:23:19

I forgot to say, the linked CopyOnWriteMap also provides a stable, unmodifiable keySet() view as well which would get around a potential bug in the getAllWords() implementations shown here – if someone tried to remove(String) from it for example, or iterated over it while it was concurrently modified.

Jed Wesley-Smith 2010-08-11 02:27:32

Answer 8

+1 A:

You should be using ConcurrentHashMap however with your current implementation its worth nothing that the getAllWords() has a thread safe copy on the data only inside the synchronized block, i.e. unless the caller also synchronizes the collction, its not thread safe. One way around this is to take a copy before returning (or use ConcurrentHashMap)

In the following example, the map is accessed once per method, rather than twice. (without synchronized)

public class SlowDictionary { 
    private final ConcurrentMap<String,String> dict = new ConcurentHashMap<String,String>(); 

    public String translate (String word) throws IllegalArgumentException { 
        String translation = dict.get(word);
        if (translation == null) 
            throw new IllegalArgumentException(word + " not found."); 
        return translation; 
    } 

    public void addToDictionary (String word, String translation) throws IllegalArgumentException { 
        if (dict.putIfAbsent(word, translation) != null) 
            throw new IllegalArgumentException(word + " already exists."); 
    } 

    public Set<String> getAllWords () {     
        return dict.keySet(); 
    } 
}

Peter Lawrey 2010-08-10 06:19:42

@Peter - How can I unit test this scenario ?

Rachel 2010-08-14 15:12:22

That depends on what you want to test for. What situation do you want to test?

Peter Lawrey 2010-08-17 18:18:30

ansaurus

tags:

views:

answers:

Java Performance using Concurrency

related questions