views:

205

answers:

2

Suppose A follows 100 person,

then will need 100 join statement,

which is horrible for database I think.

Or there are other ways ?

+6  A: 

Why would you need 100 Joins?

You would have a simple table "Follows" with your ID and the other persons ID in it...

Then you retrieve the "Tweets" by joining something like this:

Select top 100 tweet.* from tweet inner join followers on follower.id = tweet.AuthorID where followers.masterID = yourID

now you just need a decent caching and make sure you use a non locking querry and you have all information... (Well maybe add some userdata into the mix)

Edit:

tweet

ID - tweetid AuthorID - ID of the poster

Followers

MasterID - (Basically your ID) FollowerID - (ID of the person following you)

The Followes table has a composite ID based on master and followerID It should have 2 indexes - one on "masterID - followerID" and one on "FollowerID and MasterID"

Heiko Hatzfeld
+1, it's worth noting that this is a classic N-to-M relation, and the standard solution is a link table like you describe.
Henk Holterman
what's the solution for N*M*M relation then?
Shore
whats N-M-M? Please decribeThe above table can be used in both directions. Say column 1 is master and 2 is follower.. so a row 1-2 means user 2 follows user one, and 2-1 means that user 1 follows user 2...
Heiko Hatzfeld
N-to-M-to-M
Shore
You "only" have a relation between 2 tabels. So n-m-m would be implemented as n-m and m-m (which could also be called an extension table - assuming the 2 M's are allways identical)
Heiko Hatzfeld
+1  A: 

The real trick is to minimize your database usage (e.g., cache, cache, cache) and to understand usage patterns. In the specific case of Twitter, they use a bunch of different techniques from queuing, an insane amount of in-memory caching, and some really clever data flow optimizations. Give Scaling Twitter: Making Twitter 10000 percent faster and the other associated articles a read. Your question about how you implement "following" is to denormalize the data (precalculate and maintain join tables instead of performing joins on the fly) or don't use a database at all. <-- Make sure to read this!

D.Shawley
Interesting link but I'm -1ing this since it's *way too soon* to start worrying about denormalising for scalability reasons -- the OP doesn't even know what a normalised DB is! Do you really think performance is top priority here? Please put a disclaimer at the top saying "When your site has 1 million active users, you should maybe think about considering the following."
j_random_hacker
The OP did ask how an application like Twitter is implemented? I agree that it is overkill in this case. We have run into scaling problems with considerably fewer than one million users - 10000 or so client requests per second can cause problems if your processing requires more than just fetch a few rows and return them.
D.Shawley
Sure, 1 million users was just a wild guess. If you put your more realistic estimate in a disclaimer at the top I'll happily +1.
j_random_hacker