suggestions for a people similarity algorithm | ansaurus

tags:

views:

95

answers:

1

+4 Q:

suggestions for a people similarity algorithm

Hello all,

I want to get some suggestions for my "find similar people" algorithm :). I have one database where I store the following entities: Person, article, keywords. So for each person I have a collection of keywords (with the number of mentions by the person) that have been compiled from person's articles keywords. So I need to get similar people by looking at their relevant keywords, the simple solution would be to get x keywords from a person y and find all people that share similar keyword scores (not equal), but seems that is not the best way. Thoughts?

Thanks!

+7 A:

It sounds like your case is close enough to normal information retrieval system "similarity" queries that you could use the same vector space model.

For each person, count the number of occurrences of each keyword. Treat each keyword as a dimension, and the number of occurrences as the magnitude of a vector in that dimension. Normally, each dimension is treated the same, but if you found that some keywords are better predictors of compatibility, you could scale each occurrence in that dimension by some factor.

Then, the dot product of the vectors of different people gives you a score of how similar they are. Or, you can input your own keywords and find people whose interests are closest.

erickson 2010-08-30 16:40:24

related questions

Java Time Zone is messed up

Eclipse on win64

Automate builds for Java RCP for deployment with JNLP

Why are professors or schools picking Java over C++ to teach to students?

Is there a real benefit of using J#?

Public/Popular Websites using JavaServer Faces

Why can't I use a try block around my super() call?

Accessing post variables using Java Servlets

Personal Linux web server

Is this really widening vs autoboxing?

How can I Java webstart multiple, dependent, native libraries?

Why can't I call toString() on a Java primitive?

How do I use Java to read from a file that is actively being written?

What code analysis tools do you use for your Java projects?

IllegalArgumentException or NullPointerException for a null parameter?

How do I configure and communicate with a serial port?

What is the best way to parse strings in Java

Getting started with a custom JXTA PeerGroup

Creating a custom button in Java

How to get started "writing" a code coverage tool?

Which Build-/Configuration Management Tool?

What is the difference between an int and an Integer in Java/C#?

What is the meaning of the type safety warning in certain Java generics casts?

How would you access Object properties from within an object method?

Converting CSV File to XML in Java