views:

141

answers:

4

This question relates to Java collections - specifically Hashtable and Vector - but may also apply elsewhere.

I've read in many places how good it is to program to interfaces and I agree 100%. The ability to program to a List interface, for instance, without regard for the underlying implementation is most certainly helpful for decoupling and testing purposes. With collections, I can see how an ArrayList and a LinkedList are applicable under different circumestances, given the differences with respect to internal storage structure, random access times, etc. Yet, these two implementations can be used under the same interface...which is great.

What I can't seem to place is how certain synchronized implementations (in particular Hashtable and Vector) fit in with these interfaces. To me, they don't seem to fit the model. Most of the underlying data structure implementations seem to vary in how the data is stored (LinkedList, Array, sorted tree, etc.), whereas synchronization deals with conditions (locking conditions) under which the data may be accessed. Let's look at an example where a method returns a Map collection:

public Map<String, String> getSomeData();

Let's assume that the application is not concerned at all with concurrency. In this case, we operate on whatever implementation the method returns via the interface...Everybody is happy. The world is stable.

However, what if the application now requires attention on the concurrency front? We now cannot operate without regard for the underlying implementation - Hashtable would be fine, but other implementations must be catered for. Let's consider 3 scenarios:

1) Enforce synchronization using synchronization blocks, etc. when adding/removing with the collection. Wouldn't this, however, be overkill in the event that a synchronized implementation (Hashtable) gets returned?

2) Change the method signature to return Hashtable. This, however, tightly binds us to the Hashtable implementation, and as a result, the advantages of programming to an interface are thrown out the window.

3) Make use of the concurrent package and change the method signature to return an implementation of the ConcurrentMap interface. To me, this seems like the way forward.

Essentially, it just seems like certain synchronized implementations are a bit of a misfit within the collections framework in that, when programming to interfaces, the synchronization issue almost forces one to think about the underlying implementation.

Am I completely missing the point here?

Thanks.

A: 

Java's Vector and Hashtable predate current concurrency package that was added in JDK 5. At the time Vector was written, people thought it was a good idea to make it synchronized, then they probably hit the performance wall in the enterprise use. Concurrency certainly is one of those situations where code-to-interface modularity may not always work out.

eed3si9n
It's a bit more long-drawn-out that that: 1.4 introduced an updated collections framework with java.util.Collections.synchronizedMap and friends which were recommended to be used instead of Hashtable. And then 1.5 brought in the whole concurrent package.But Hashtable is still littered around standard libraries and probably will be for ever more...The "Java approach" seems to be that synchronization is an implementation detail, and requirements for it should be communicated through documentation not through API declarations.
araqnid
Seconded. Both of those classes are historical, and I believe they date back to Java 1.0. I'm sure they date back to 1.2, but don't see the ancient APIs online.
Dean J
@araqnid, the way of "Java approach" is painted with blood and scars of performance, concurrency, etc from over the years. Especially since it took off in the server-side use, the scale has tipped towards dependency-injecting, code-to-interface, architecture astronaut way, compared to simple "new Vector();"
eed3si9n
+1  A: 

What you are struggling with is the fact that in a multi-threaded environment, a client cannot naively use an object that has mutable, shared state. The collection interface, by itself, tells you nothing about how the object can be used safely. Returning a ConcurrentMap helps give some additional information but only for that particular case.

Normally, you have to communicate the thread safety issues separately in documentation (e.g., javadoc) or by using custom annotations as is described in Java Concurrency in Practice. The client of the returned object will have to use its own locking mechanism or one that you provide. The interface is usually orthogonal to the thread safety.

It's not a problem if the client knows that all the implementations are from the Concurrent implementations, but that information is not communicated by the interface itself.

David G
+5  A: 

1) Yes, it will be overkill
2) Correct, that should not be done
3) Depends on the situation.

The thing is, as you already know, programming to the interface describe what the application does ( not how it does it, that's implementation )

Synchronization was removed from subsequent implementations ( remember, Vector and Hastable are prior to java 1.2 later came ArrayList and HasMap that were not synchronized, but all of them did implement List and Map interface respectively ), because they result in performance penalty due to the excessive synchronization. For instance if you use a vector in a single thread, you still got synchronization within that single thread.

Sharing a datastructure between multiple threads is something that has to be considered when designing the application. There you will pick the methods that you will use and you'll choose who is responsible for keeping the data state clean.

Here's where you choose between option 1 or 3 that you mentioned. Would there be a manual synchronization? Should we use a synchronized interface? What version we will support etc etc.

For instance, if you pick 1, you can also in your design reject certain implementations ( ie vector )

Data synchronization is not something that happens by "luck" you really have to design for it to happen correctly and don't cause more problems that those it solves.

During this design, you should pay attention to the options ( the implementations ) and/or the underlying infrastructure you'll use.

The easiest way to avoid excessive synchronization is to use immutable data and don't share your data with other threads.

Something very similar to the first law of distributing computing by Martin Fowler:

"Hence, we get to my First Law of Distributed Object Design: Don't distribute your objects."

Would the first law of multithreaded applications be:

First law of multithreaded applications: don't share your data?

:)

Final note: the Collections class provides "synchronized" version of some interfaces:

Synchronized List
Synchronized Map
Synchronized Set

OscarRyz
A: 

Hashtable and Vector are very old, from JDK 1.0. They predate the standard collections from JDK1.2 and shouldn't be used in new code since a long time ago. Use HashMap and ArrayList wrapped by Collections.synchronizedMap() or Collections.synchronizedList() instead.

You can see the version when something was introduced in the JDK in the API docs under the Since tag.

starblue