views:

368

answers:

5

Anyone know of an algorithm that will group pictures into events based on the date the picture was taken. Obviously I can group by the date, but I'd like something a little more sophisticated that would(might) be able to group pictures spanning multiple days based on the frequency over a certain timespan. Consider the following groupings:

  • 1/2/2009 15 photos
  • 1/3/2009 20 photos
  • 1/4/2009 13 photos
  • 1/5/2009 19 photos
  • 1/15/2009 5 photos

Potentially these would be grouped into two groups:

  1. 1/2/2009 -> 1/5/2009
  2. 1/15/2009

Obviously there will be some tolerance(s) that need to be established.

Is there any well established way of doing this, other then inventing my own top/down approach?

A: 

Just group the pictures that were taken on successive days (no days on which no pictures were taken) together.

tehvan
right-this would be the most obvious top/down approach.
Greg Dean
+2  A: 

You can apply pretty much any standard clustering technique to this, it's just a matter of defining your distance function correctly. When you are making your matrix of distances between your photos you should consider a combination of physical distance between locations - if you have it - and temporal distance between their creation timestamps. Normalise them and put them on separate dimensions and you may even just be able to take a regular euclidean distance.

Best of luck.

Simon
A: 

You might try to dynamically calculate tolerance based on how many or how big (absolute or %) clusters you want to create.

vartec
A: 

To get a useful clustering of pictures according to date you require the following:

1) The number of clusters should be variable and not fixed a priori to the clustering

2) The diameter of each cluster should not exceed a specific amount.

The clustering algorithm that best satisfies both requirements is the QT (quality threshold) clustering algorithm. From Wikipedia:

QT (quality threshold) clustering (Heyer, Kruglyak, Yooseph, 1999) is an alternative method of partitioning data, invented for gene clustering. It requires more computing power than k-means, but does not require specifying the number of clusters a priori, and always returns the same result when run several times.

Although it is mainly used for gene clustering I think it would fit in very well for what you need.

Il-Bhima
Any hierarchical agglomeration technique shares that property.
Simon
Why do you think QT clustering is better?
Greg Dean
hierarchical agglomeration technique will naively merge always the closest two point/cluster pairs at each iteration. Since you are not considering all clusters for each point you could end up with skewed clusters
Il-Bhima
w/ QT wont the first cluster always be the size of the predefined max diameter?
Greg Dean
The first cluster is by definition the cluster having the most points within the given diameter. Every cluster will have the predefined max diameter if there are enough points.
Il-Bhima
Right, which makes it ill suited for this application, unless of course you assume each event is the same duration of time (same diameter).
Greg Dean
@Il-Bhima. Naive implementations will indeed do that, but it is an easy problem to avoid.
Simon
A: 

Try to detect the Gaps instead of the Clusters.

The Unknown