Algorithm for clustering pictures based on date taken

views:

368

answers:

+2 Q:

Algorithm for clustering pictures based on date taken

Anyone know of an algorithm that will group pictures into events based on the date the picture was taken. Obviously I can group by the date, but I'd like something a little more sophisticated that would(might) be able to group pictures spanning multiple days based on the frequency over a certain timespan. Consider the following groupings:

1/2/2009 15 photos
1/3/2009 20 photos
1/4/2009 13 photos
1/5/2009 19 photos
1/15/2009 5 photos

Potentially these would be grouped into two groups:

1/2/2009 -> 1/5/2009
1/15/2009

Obviously there will be some tolerance(s) that need to be established.

Is there any well established way of doing this, other then inventing my own top/down approach?

Just group the pictures that were taken on successive days (no days on which no pictures were taken) together.

tehvan 2009-03-06 08:30:45

right-this would be the most obvious top/down approach.

Greg Dean 2009-03-06 08:33:00

+2 A:

You can apply pretty much any standard clustering technique to this, it's just a matter of defining your distance function correctly. When you are making your matrix of distances between your photos you should consider a combination of physical distance between locations - if you have it - and temporal distance between their creation timestamps. Normalise them and put them on separate dimensions and you may even just be able to take a regular euclidean distance.

Best of luck.

Simon 2009-03-06 08:31:46

You might try to dynamically calculate tolerance based on how many or how big (absolute or %) clusters you want to create.

vartec 2009-03-06 08:40:55

To get a useful clustering of pictures according to date you require the following:

1) The number of clusters should be variable and not fixed a priori to the clustering

2) The diameter of each cluster should not exceed a specific amount.

The clustering algorithm that best satisfies both requirements is the QT (quality threshold) clustering algorithm. From Wikipedia:

QT (quality threshold) clustering (Heyer, Kruglyak, Yooseph, 1999) is an alternative method of partitioning data, invented for gene clustering. It requires more computing power than k-means, but does not require specifying the number of clusters a priori, and always returns the same result when run several times.

Although it is mainly used for gene clustering I think it would fit in very well for what you need.

Il-Bhima 2009-03-06 08:45:10

Any hierarchical agglomeration technique shares that property.

Simon 2009-03-06 08:48:46

Why do you think QT clustering is better?

Greg Dean 2009-03-06 08:50:05

hierarchical agglomeration technique will naively merge always the closest two point/cluster pairs at each iteration. Since you are not considering all clusters for each point you could end up with skewed clusters

Il-Bhima 2009-03-06 09:22:39

w/ QT wont the first cluster always be the size of the predefined max diameter?

Greg Dean 2009-03-06 09:31:23

The first cluster is by definition the cluster having the most points within the given diameter. Every cluster will have the predefined max diameter if there are enough points.

Il-Bhima 2009-03-06 09:43:14

Right, which makes it ill suited for this application, unless of course you assume each event is the same duration of time (same diameter).

Greg Dean 2009-03-07 15:46:30

@Il-Bhima. Naive implementations will indeed do that, but it is an easy problem to avoid.

Simon 2009-03-09 20:26:38

Try to detect the Gaps instead of the Clusters.

The Unknown 2009-06-05 18:15:56

ansaurus

tags:

views:

answers:

Algorithm for clustering pictures based on date taken

related questions