views:

1475

answers:

10

So I am thinking about building a hobby project, one off kind of thing, just to brush up on my programming/design.

It's basically a multi threaded web spider, updating the same data structure object->int.

So it is definitely overkill to use a database for this, and the only thing I could think of is a thread-safe singleton used to contain my data structure. http://www.ibm.com/developerworks/java/library/j-dcl.html

Is there a different approach I should look in to?

+4  A: 

Using lazy initialization for the database in a web crawler is probably not worthwhile. Lazy initialization adds complexity and an ongoing speed hit. One case where it is justified is when there is a good chance the data will never be needed. Also, in an interactive application, it can be used to reduce startup time and give the illusion of speed.

For a non-interactive application like a web-crawler, which will surely need its database to exist right away, lazy initialization is a poor fit.

On the other hand, a web-crawler is easily parallelizable, and will benefit greatly from being multi-threaded. Using it as an exercise to master the java.util.concurrent library would be extremely worthwhile. Specifically, look at ConcurrentHashMap and ConcurrentSkipListMap, which will allow multiple threads to read and update a shared map.

When you get rid of lazy initialization, the simplest Singleton pattern is something like this:

class Singleton {

  static final Singleton INSTANCE = new Singleton();

  private Singleton() { }

  ...

}

The keyword final is the key here. Even if you provide a static "getter" for the singleton rather than allowing direct field access, making the singleton final helps to ensure correctness and allows more aggressive optimization by the JIT compiler.

erickson
I don't think is what he is asking about...
matt b
I based my answer on the fact that the article cited focuses on the use of double-checked locking to initialize the singleton lazily. This article would be a very bad guide to follow.
erickson
Ah, yes thats correct. Double-checked locking is broken anyway.
matt b
A: 

The article you referenced only talks about making the creation of the singleton object, presumably a collection in this case, thread-safe. You also need a thread-safe collection so that the collection operations also work as expected. Make sure that the underlying collection in the singleton is synchronized, perhaps using a ConcurrentHashMap.

tvanfosson
This is only true if the collection is exposed outside of the singleton. If it is not, then there is no reason to worry about it.
Jeach
@Jeach -- not true. Either the collection needs to be synchronized or the methods that you expose that interact with it need to be synchronized. Precisely because you have a single instance, you need to take care to make sure only one thread is in a critical section (like updating the count related to an object in this case) at a time. It's much easier to just use a synchronized collection than to implement all the locking code in your methods.
tvanfosson
+1  A: 

If you look at the very bottom of that article, you'll see the suggestion to just use a static field. That would be my inclination: you don't really need lazy instantiation (so you don't need getInstance() to be both an accessor and a factory method). You just want to ensure that you have one and only one of these things. If you really need global access to one such thing, I'd use that code sample towards the very bottom:

class Singleton
{
  private Vector v;
  private boolean inUse;
  private static Singleton instance = new Singleton();

  private Singleton()
  {
    v = new Vector();
    inUse = true;
    //...
  }

  public static Singleton getInstance()
  {
    return instance;
  }
}

Note that the Singleton is now constructed during the installation of static fields. This should work and not face the threading risks of potentially mis-synchronizing things.

All that said, perhaps what you really need is one of the thread-safe data structures available in the modern JDKs. For example, I'm a big fan of the ConcurrentHashMap: thread safety plus I don't have to write the code (FTW!).

Bob Cross
this pattern is known as a "static initializer"
matt b
Yes, I was trying to stick with the original vocabulary, though.
Bob Cross
Yes, this is probably the way to do it. But now I am thinking why not make a static member ConcurrentHashMap in my threaded spider class? I guess really my question is, this is one way to do it, is there a better way?
Dan.StackOverflow
A HashMap is very compelling. It's simple, fast, comes with the JDK, simple and simple ;-). For more specific guidance, I think we need more information on exactly what it is that you want to do (and that would likely be outside the scope of this question).
Bob Cross
+1  A: 

If your life depended on a few microseconds then I would advise you to optimize your resource locking to where it actually mattered.

But in this case the keyword here is hobby project!

Which means that if you synchronized the entire getInstance() method you will be fine in 99.9% of all cases. I would NOT recommend doing it any other way.

Later, if you prove by means of profiling that the getInstance() synchronization is the bottleneck of your project, then you can move on and optimize the concurrency. But I really doubt it will cause you trouble.

Jeach!

Jeach
+2  A: 

Double-checked locking has been proven to be incorrect and flawed (as least in Java). Do a search or look at Wikipedia's entry for the exact reason.

First and foremost is program correctness. If your code is not thread-safe (in a multi-threaded environment) then it's broken. Correctness comes first before performance optimization.

To be correct you'll have to synchronize the whole getInstance method

public static synchronized Singleton getInstance() {
   if (instance==null) ...
}

or statically initialize it

private static final Singleton INSTANCE = new Singleton();
Steve Kuo
A: 

Check out this article Implementing the Singleton Pattern in C#

public sealed class Singleton
{
    Singleton()
    {
    }

    public static Singleton Instance
    {
        get
        {
            return Nested.instance;
        }
    }

    class Nested
    {
        // Explicit static constructor to tell C# compiler
        // not to mark type as beforefieldinit
        static Nested()
        {
        }

        internal static readonly Singleton instance = new Singleton();
    }
}
Tawani
A: 

How about:

public static Singleton getInstance() {
  if (instance == null) {
    synchronize(Singleton.class) {
      if (instance == null) {
         instance = new Singleton();
      }
    }
  }

  return instance;
}
mamboking
On line 3, I think you mean Singleton.class. Your example might be clearer if you rename the reference from 'singleton' to 'instance'.
Outlaw Programmer
You are correct. Why don't these edit boxes have code completion :)
mamboking
Now THAT'S a million dollar idea! Maybe Firefox 4.0 ;-)
Outlaw Programmer
The double-checked lock pattern has been proven to be broken.
Steve Kuo
A: 

Try the Bill Pugh solution of initialization on demand holder idiom. The solution is the most portable across different Java compilers and virtual machines. The solution is thread-safe without requiring special language constructs (i.e. volatile and/or synchronized).

http://en.wikipedia.org/wiki/Singleton_pattern#The_solution_of_Bill_Pugh

Aries McRae
+1  A: 

Why don't you create a data structure you pass to each of the threads as dependency injection. That way you don't need a singleton. You still need to make the thread safe.

Peter Lawrey
A: 

as Joshua Bloch argues in his book "effective java 2nd edition" I also agree that a single element enum type is the best way to implement a singleton.

public enum Singleton {
  INSTANCE;

  public void doSomething() { ... }
}
Alexandros