When designing a collection class, is there any reason not to implement locking privately to make it thread safe? Or should I leave that responsibility up to the consumer of the collection?
I personally would leave it up to the consumers. It will make your collection class more generic.
For Java, you should leave unsynchronized for speed. Consumer of the collection can wrap in a synchronization wrapper if desired.
Just be clear in your documentation that your are not making it thread safe and leave it out, or if, for your application, you want it thread safe, make it thread safe and note that in your documentation for it. The only rule is to document it. Other than that, make your class for you and if other people want to use it, they can.
This would make it impossible to simultaneously access a collection from several threads even if you know that the element you touch is not used by anyone else.
An example would be a collection with an integer based index accessor. Each thread might know from its id which index values it can access without worrying about dirty reads/writes.
Another case where you would get an unnecessary performance hit would be when data is only read from the collection and not written to.
If I'm looking for a collection class and I need thread safe capabilities and your class doesn't have them, I'm immediately going to skip to the next offering out there to see what they provide. Your collection won't get any more of my attention.
Note the "If" at the beginning. Some customers will want it, some will not, and some won't care. If you're going to build a tool-kit for consumers, then why not offer both varieties? That way I can choose which one to use, but if I want thread-safe you still have my attention and I don't have to write it myself.
Collection classes need to be as fast as possible. Hence leave the locks out.
The calling code will know where the locks best lie the collection class doesn't. In the worst case scenario the app will have to add an additional lock meaning that two locks occur making it double the perf hit.
The primary reason not to make it thread safe is performance. Thread safe code can be 100s of times slower than non-safe code, so if you client doesn't want the feature, that's a pretty big waste.
I agree that leaving it up to the consumer is the right approach. If provides the consumer much more flexibility as to whether the Collection instance is synchronized on or a different object is synchronized on. For example, if you had two lists that both needed to be updated it might make sense to have them in a single sychronized block using a single lock.
If you make a collection class, do not make it thread safe. It's quite hard to do right (e.g. correct and fast), and the problems for your consumer when you do it wrong (heisenbugs) are difficult to debug.
In stead, implement one of the Collection APIs and use Collections.synchronizedCollection( yourCollectionInstance) to obtain a thread-safe implementation if they need it.
Just refer to the appropriate Collections.synchronizedXXX method in your class javadoc; it will make clear that you have considered thread-safety in your design and ensured the consumer has a thread-safe option at his disposal.
Note that if you attempt to make any class thread-safe you need to decide on common usage scenarios.
For instance, in the case of a collection, just making all the properties and methods individually thread-safe might not be good enough for a consumer, as reading first the count, and then looping, or similar, would not do much good if the count changed after reading it.
is there any reason not to implement locking privately to make it thread safe?
It depends. Is your goal to write a collection class which is accessed by multiple threads? If so, make it thread safe. If not, don't waste your time. This kind of thing is what people refer to when they talk about 'premature optimization'
Solve the problems that you have. Don't try to solve future problems that you think you may have some years in the future, because you can't see the future, and you'll invariably be wrong.
Note: You still need to write your code in a maintainable way, such that if you did need to come along and add locking to the collection, it wouldn't be terribly hard. My point is "don't implement features that you don't need and won't use"
Making the collection threadsafe is what killed Java's Vector and Hashtable classes. It is far easier for a client to wrap it in a threadsafe wrapper, as previously suggested, or to synchronize data access on the subset of methods, than to take a synchronization hit every time the class is accessed. Hardly anyone uses Vector or Hashtable, and if they do, they get laughed at, because their replacements (ArrayList and HashMap) are worlds faster. Which is unfortunate, as I (coming from a C++ background) much prefer the "Vector" name (STL), but ArrayList is here to stay.
Basically, design your collection as thread-safe, with locking implemented in two methods of your class: lock() and unlock(). Call them anywhere needed, but leave them empty. Then subclass your collection implementing the lock() and unlock() methods. Two classes for the price of one.
A really good reason to NOT make your collection thread-safe is for improved single-thread performance. Example: ArrayList over Vector. Deferring thread-safety to the caller allows the unsynchronized use case to optimize by avoiding locking.
A really good reason to make your collection thread-safe is for improved multi-threaded performance. Example: ConcurrentHashMap over HashMap. Because CHM internalizes the multi-threaded concerns, it can stripe locking for greater concurrent access more effectively than external synchronization.
Here's a good start.
But you will notice you lose one of the great features of collections - enumeration. You cannot thread-safe an enumerator, it's just not really feasible, unless you implement your own enumerator which holds an instance lock back to the collection itself. I would suspect this would cause major bottlenecks and potential deadlocks.
Thread safe collections can be deceiving. Jared Par posted a couple of interesting articles about thread safe collections:
The problem is there are several levels of thread safe collections. I find that when most people say thread safe collection what they really mean “a collection that will not be corrupted when modified and accessed from multiple threads”
...
But if building a data thread safe list is so easy, why doesn’t Microsoft add these standard collections in the framework?
Answer: ThreadSafeList is a virtually unusable class because the design leads you down the path to bad code.
The flaws in this design are not apparent until you examine how lists are commonly used. For example, take the following code which attempts to grab the first element out of the list if there is one.
static int GetFirstOrDefault(ThreadSafeList<int> list) {
if (list.Count > 0) {
return list[0];
}
return 0; }
This code is a classic race condition. Consider the case where there is only one > element in the list. If another thread removes that element in between the if statement and the return statement, the return statement will throw an exception because it’s trying to access an invalid index in the list. Even though ThreadSafeList is data thread safe, there is nothing guaranteeing the validity of a return value of one call across the next call to the same object
http://blogs.msdn.com/b/jaredpar/archive/2009/02/11/why-are-thread-safe-collections-so-hard.aspx
http://blogs.msdn.com/b/jaredpar/archive/2009/02/16/a-more-usable-thread-safe-collection.aspx