Does anyone have any information on the architecture of twitter?
A few specific items I'm especially interested in:
I know that they use message queues. But what exactly do they use queues for?
Do they "duplicate" tweets? If so, how? For example, say a user has 10,000 followers and he makes a tweet "hello world". Does twitter store "hello world" only once and each of those 10,000 followers would need to read that tweet from the same database table, or does each follower have his own "tweets I'm following" data and "hello world" is duplicated 10,000 times, once for each follower?
Somewhat related to the point above: how do they shard their data, by tweet sender, by tweet follower, by tweet ID, by tweet datetime, or something else?
Do you know what technologies they use? I read about MySQL, RoR, Starling, Scala, memcached. But that was a while ago and the information wasn't very detailed . Any updated info or more details?