views:

136

answers:

3

There is a portal with two billion users registered. If you store all the 2 billion users in a conventional databases it will take more time to retrieve the data about a particular user when that user tries to login. How do you handle this situation to make sure that the user gets the response quickly.

A: 

I dont know if its practical, but in theory you could use some sort of tree structure. If I remember my CS classes from a long time ago, trees are O(ln), so for a billion (which is 2^30), you only ever need 30 operations for a lookup. Thats the beauty of CS....

Implementing a tree structure for that, i have no idea.

hvgotcodes
There is a family of tree structures optimized for storing large amounts of data (i.e. too large to fit into main memory). They're called B-Trees. And now guess how pretty much all databases implement their indexes...
Michael Borgwardt
#Michael Borgwardt -- right. I didnt know if this a practical "I need to do this" question or a theoretical question. Sounds like homework or an interview question to me...
hvgotcodes
+6  A: 

I don't see any particular reason why a conventional database on decent modern hardware couldn't retrieve log-on information pretty quickly, even if you have 2 billion records. It's just a simple indexed lookup after all (you did remember to index on user ID, right?)

On a really big machine you might even fit most of it in RAM.

However, if you are really trying to engineer this for scale I'd look at something like Cassandra. This is a highly available, distributed NoSQL database, basically the same kind of architecture that Google, Facebook etc. would use.

mikera
A: 

If you have a portal of 2 billion users, login is such a small amount of all the queries that will be performed.
The problem here is not the time it takes for 1 login, but what if one percent of all users is active at the same time.
Luckely two billion users do not fit into one continent, so you can use distributed database servers, that each serve their own part of the world. And you can synchronize them in the background (in case somebody travels to another continent).

If you have the resources (time, money, staff) you can invent your own bigtable database like google did (with 2 billion user you probably have money and staff), but I would stick with the normal relational databases to implement this.

GvS