views:

103

answers:

1

Badoo.com has 56.000.000 user profiles. Profiles can be searched by sex, age, hair color, zodiac, education and so on, plus distance from my hometown, online status and date of registration. So far, this seems doable even if it's quite some query on huge tables (56m members...), it can be cached in a general way.

The interesting part is that they also have an individual "exclude list" (with every profile you look at, you can say that you don't want to meet this person). Plus, you friends don't show up either.

The second interesting part are the OR parts of the query. You can search for someone who's a woman, 25-35, blonde OR brunette, non-smoker, hetero OR bisexual, virgo OR twins OR cancer, living in a 50KM radius of Paris and who is not your friend and not on your exclude list and who's online now. Many ORs, heavy query, sort options, no way of caching or pre-calculating all this, but the search returns 11.298 results in milliseconds.

How do they do such a thing with 56 million datasets and 250K people using it at the same time? Fulltext search indexes? Relational Databases? Key Value Stores? Does anyone have an idea abou the concept or architecture?

A: 

They are most likely built using an inverted indexing technology like Lucene or Sphinx. If you are looking to build a solution, my recommendation would be Apache Solr (a search server built using Lucene). It is very popular, has an active OSS community, and is used by sites such as Netflix, Cnet etc.

Mikos