views:

118

answers:

2

Does anyone have any information on the architecture of twitter?

A few specific items I'm especially interested in:

  • I know that they use message queues. But what exactly do they use queues for?

  • Do they "duplicate" tweets? If so, how? For example, say a user has 10,000 followers and he makes a tweet "hello world". Does twitter store "hello world" only once and each of those 10,000 followers would need to read that tweet from the same database table, or does each follower have his own "tweets I'm following" data and "hello world" is duplicated 10,000 times, once for each follower?

  • Somewhat related to the point above: how do they shard their data, by tweet sender, by tweet follower, by tweet ID, by tweet datetime, or something else?

  • Do you know what technologies they use? I read about MySQL, RoR, Starling, Scala, memcached. But that was a while ago and the information wasn't very detailed . Any updated info or more details?

+3  A: 

You can check out the code in identi.ca which is a service similar to Twitter, built on open source tools and open standards. Hope this helps.

JeremySpouken
+1  A: 

Not sure if this will help. Twitter has open-sourced a lot of code that is being used on the service at Twitter Open Source page, which might give you a hint or two on what is being done.

Also, Twitter has an Engineering Blog where they have a posts on Technologies being used in Twitter

dkris