views:

196

answers:

4

I need to implement some kind of metric space search in Postgres(*) (PL or PL/Python). So, I'm looking for good sources (or papers) with a very clear and crisp explanation of the machinery behind these ideas, in such way that I can implement it myself.

I would prefer clarity over efficiency.

(*) The need for that is described better here.

+2  A: 

Especially for geographical data, look at PostGIS first to see if you need to implement anything. If you do, start with the papers listed in the Wikipedia entry on GiST.

Looking at your link, it seems your metric space is strings with some sort of edit distance as the metric. A nice but oldish overview of some solutions is given by Navarro, Baeza-Yates, Sutinen, and Tarhio, IEEE Data Engineering Bulletin, 2001; the related papers on Citeseer could also be useful. Locality Sensitive Hashing is a newer technique that might be useful, but a lot of the papers are heavy on math.

Jouni K. Seppänen
A: 

Some techniques that involve space search that might help you are Hill-Climbing, Neural Network Training, Genetic Algorithm, and Particle Swarm.

You will also need to define a distance metric over your metric space. Have you done so?(& out of curiosity, what is it, if you have done so)

Paul Nathan
+1  A: 

BK-Trees are useful for indexing and searching anything that obeys the triangle inequality, metric spaces included. The canonical example is searching for strings within a given edit distance of a target. I wrote an article about that here.

Unfortunately, there's no built in support for this in Postgres. You could implement it yourself using GIST, but obviously that'll be a lot of work. I can't think of any way to implement it without writing your own indexes short of storing the tree in a table, which obviously isn't going to be very efficient.

Nick Johnson
+1  A: 

You can try http://sisap.org where many modern metric indexes are listed, including BK-trees. You can find code in C to try different alternatives.