views:

584

answers:

4

Hi all,

Looking for some advice here. Does anyone know a good place to start looking into matching algorithm in a n-dimensional space. For example, any dating site out there must be using some sort of algorithm to match 2 people. What I have read is that we can map characteristics of a person in a n-dimensional array with a point system for each characteristic. Once we have all (available) characteristics of a person, we can represent this person in a point within a n-dimensional array. Then, to match 2 person would be as simple as finding the shortest distance between 2 point in this n-dim array. Does anyone has any reference in implementation of these kind of algorithm? What's the best language to write these kind of stuff in?

A: 

What you are describing is clustering. This is a non-trivial analytics problem. I can't point you to a specific algorithm, but you're going to want to familiarize yourself with (at minimum) graph theory. This can be developed in any language you choose, but you have a lot of studying to do before you get to that point (I believe).

Sean Bright
A: 

First off, pick the language you are most familiar with. The algorithms for handling this are fairly straightforward, and should work in any modern language. (As long as there is some concept of array, and potentially a matrix library, you should be fine.) I've implemented many of these in C, C++, and C# before, but seen implementations in python, vb.net, etc.

Depending on what you are trying to do, there are a couple of options.

That being said, what you want to do depends on your goals. If you just want to find the best match, you can use simple distance calculations (ie: sqrt of sum of squares for each dimension/property in the n-dimensional array), optionally weight each properties distance, and use the closest point.

If you want to group together people, you'll want to look at clustering algorithms. For data like this, I would suspect that some form of K-Means clustering or fuzzy c-means clustering would work the best.

Reed Copsey
+2  A: 

If you want to find the closest match for one person, Bentley & Shamos published a multi-dimensional divide-and-conquer method: Divide-and-conquer in O(N log N) time: Divide-and-conquer in multidimensional space in Proceedings of the eighth annual ACM symposium on Theory of computing 1976. If you can't get a copy this may also be helpful.

However for your example application actually finding the nearest neighbour doesn't seem to be the biggest problem - much trickier is mapping inputs into dimensions. For example if one dimension is "likes animals", what value do you give to someone who likes dogs & cats but can't stand horses? What about someone who loves horses, thinks dogs are OK, is annoyed by cats and is ambivalent about goldfish?

danio
Good point on matching people into dimensions. Maybe something like a scale per dimension, ie: somebody who likes cats+dogs but hates horses would get +1/+1/-1, ie: +1 as a score in that dimension, or something along those lines.
Reed Copsey
@danio: You can always break a single "likes animals" dimension into separate "likes dogs", "likes cats", etc. dimensions.
j_random_hacker
A: 

The process you mention is known as k-nearest neighbor, with k=1. It's the most intuitive approach for finding similar vectors.

http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm