views:

25

answers:

1

I was trying to reverse engineer Twitter-Live Search. Maybe we could discuss it here. I am talking about the feature where Tweets are shown even latest to "1 sec ago" etc. Trying to understand how the following might happen -

  1. There must be some layer between when the user tweets & when the index (updates) happen. Is this layer MySQL or some other caching layer (memcached, cassandra)? Maybe...
  2. Indexing - How might the index updates be happening? They can't possibly build a new index from scratch?
  3. Indexing - There must be a distributed index here. How to update all the Indexes without having to serve stale data from one index & latest data from the other?
  4. Indexing - Or does it matter if something like this happens? Honestly I don't think so :) Which user would notice...

Anybody have anything interesting to add/discuss. I am just trying to understand...

A: 

Interesting indeed, but I guess it's more of an "architecture" question, and not really a programming question.

But FYI there's a lot of information at high scalability: posts tagged with twitter

Do they keep all tweets? My guess is they just throw them away after a while, and surely they don't need ACID properties? ..

And I wouldn't trust those timestamps if I where you :)

Øyvind Skaar
Precisely! The timestamps Twitter shows (for that matter SO) may not be totally precise. But they atleast have to synchronize it in such a way that everyone sees it at the same time. Maybe there is a switch of indexes happening (i.e. from old index to fresh index)? or using some large index + small index combo?
MovieYoda
Don't know what they do index wise, but I doubt it's one big master index. Why does everyone have to see the same thing all the time? It's probably all "eventually consistent" - the fact that you see something doesn't mean others do .. e.g. If you post something you might see it right away, but I might not see it for a short while..
Øyvind Skaar