views:

309

answers:

5

I just saw the hilarious "The Rise and Fall of Twitter" which made me think:

if you reimplemented twitter, what would you do differently?

What technologies would you use? What languages?

How do you ensure that the service is scalable?

What else would you change?

+2  A: 

It's already being done: Laconica

EBGreen
A: 

I would design it scalable like hell just from the beginning.

My choice would be the Microsoft Platform, C#, IIS, SQL Server, Memcached (or Velocity if it is final and runs good when i start ;-)

JRoppert
The reason Twitter had a hard time scaling was their use of SQL. Using SQL means you will need to shard or stripe your database in order to scale. This doesn't work too well for Twitter's use case, plus, if you use SQL server, you'll have to pay for a new license on each machine.
0124816
You're right, the problem was their use of SQL, not SQL itself and what's wrong with paying money for something that helps you to run their business? Do you think running an application like Twitter is not possible on the MS platform? It is, definitely.
JRoppert
+1  A: 
  1. It's already being done Part II - The Revenge: identi.ca (which is over the top of Laconica)
  2. It's already being done Part III - From the Dark Side: yammer

VBG! (-:

Rob Wells
A: 

I'm going to start from the premise of going back to do it over again: what would I do differently, were I at twitter back then?

Not a thing.

Twitter maintained focus on what matters: providing a service which people actually want to use.

I would love to work on a product which became so popular in such a short period of time that its biggest threat became its own scalability. That means you've won. With success comes the resources and attention to capitalize on the success.

DGentry
Scalability may be a "high class" problem, but its still a problem. Its debatable, but many argue that Friendster ceded its place as the #1 social network due, at least partially, to its inability to scale.
sanity
+3  A: 

I would have implemented it on GAE, like so:

Each user will have a table containing the the tweets of the people they follow. This table will be keyed by (user, timestamp descending).

Each user also has a follower_ranges table, which maps a user to a set of contiguous follower id ranges. For most users, who have only a few thousand followers this table will have a single entry (-inf..+inf); this will be the implied default. For users with more followers, each range in the table will have a few thousand users. The ranges will be balanced over time to keep the number of users in each within some interval, e.g. larger than 1000, smaller than 10000. The union of all ranges will include all user ids.

Whenever a user -> follower operation is created, it is encoded as an action and added to a queue. Each element in the queue is a (sender, action, payload, follower subrange) tuple. Queue workers take an item, find all the followers in the given subrange, and apply the action to each of them. (Note that the action can be "add a tweet," "delete a tweet," "edit a tweet," etc. Basically anything that will need to be applied to all followers.)

Applying the queue action to each follower will involve issuing the corresponding writes and deletes to each user's tweet table. The barrier of the queue will mean that writes will not appear instantaneously, but it should possible to keep the delay below a few seconds.

Showing the user their tweets will be a cheap operation: "SELECT * FROM tweets WHERE user_id = :user_id ORDER BY (created_at DESC) LIMIT :max_per_page". This will scan a single table, and be a very fast operation. (Keeping user-blocking latency down is good!)

I think this design would scale rather well initially. Each component of the system can now be scaled up easily:

  • The queue storage can be backed by GAE, and scaled as per any Datastore table
  • The frontends can be scaled naturally, and there is no need for stickyness
  • More queue processors can be added at any time
  • The actual storage tables will grow naturally, and should scale fine on Datastore.

That said, I can think of a couple future improvements I would look into immediately:

  • Reduce storage of rarely-shown data. This design denormalizes each tweet into a per-follower copy. However only the most recent tweets are usually accessed. By deleting the per-user copy of tweets after they are N days old, we can recover a lot of storage. If a user tries to view something from ancient history, we fetch the data from denormalized tables. This will be slower, but will not happen too often, and the savings will be significant. Storage savings: (#avg_followers - 1) / #avg_followers
  • The write pattern is non-optimal. Across multiple queue items, each queue worker will be writing to every user's tweets table, thus the locality of writes will not be very good. (Worst case, we'll have #processor * #storage server connections.) This can be fixed by applying multiple updates to each range of users. For example, if two actions A and B are to be applied to range [0, 10000), then have a single queue processor apply those two actions at once.
0124816