views:

19

answers:

0

I want to build a large scalable database with millions of high dimensional vectors using LSH. Since I have to hold all the data in ram for fast querying, the data must be distributed onto multiple servers to hold all the objects.

A naïve approach would be to spread all objects to different servers and send one query to every server. The server with the best answer properly has the right object.

I'm sure there must be some better solution, where a query don't has to be send to all server nodes and similar objects are grouped together on one server.

What would be a good approach for distributed LSH tables? Maybe there are even some projects out there?

Thanks for any hint.