views:

268

answers:

5

[I'm not asking about the architecture of SO, but it would be helpful to the question.]

On SO, when a user clicks on his/her name and clicks on "responses" they see other users responses to comment threads, questions, and answers in which they have participated. I've had the sneaking suspicion that I've missed certain responses out there, which made me wonder: if you had to build that thing, would you pull everything dynamically from the database every time a user requested it? Or would you modify it when there is new related activity in the application? Or would you build it in a nightly daemon process?

I imagine that the real answer is that it's dynamically constructed every time, but that the tables are denormalized in such a way so as to make the thing less time-consuming. How would you build it?

I'm asking about any platform, of course, not only on .Net.

+5  A: 

I would pull it dynamically from the database every time. I think this gives you the best result from a user experience and then I would apply the principal that premature optimization is evil. Later if there were performance issues I would look into caching.

I think doing it as a daemon/push process would actually result in more overall work being done. That is the updates would happen more frequently than the users are requesting the info.

Brian Fisher
You could do the push only for users that have had activity on their account in the last X minutes.
Yar
Good point. I would just try the simplest way first as I would guess the performance would be adequate.
Brian Fisher
Yeah, I'm just beginning to question all the anti-optimization rhetoric. I think that if you want the responses tab to conain 100% of the responses all the time (it doesn't), you'd have to plan for that complexity in advance... async updates through a message queue are fun, perhaps...
Yar
A database query unioned to a "Pending Transactions" query of the queue could work, if the queue has the proper fields and indexes. The queue would be 100% in-memory with any reasonable design.
le dorfier
I agree, I think trying to design a scalable architecture is not premature optimization.
DrJokepu
+1  A: 

Obviously, when an answer or comment is posted, you'll want to identify the user that should be informed in their responses tab. Then just add a row to a responses table containing the response text, timestamp, and the user to which it belongs. That way you can dynamically generate the tab with a simple

select * from responses where user=<userid> order by time desc limit 30

or something like that.

p.s. Extra credit to anyone that can write a query that will remove old responses - assume that each person should have the last 30 responses in their responses tab.

Kyle Cronin
But you can delete comments, for instance, and eliminate yourself from a thread. I should try that and see if I still get informed.
Yar
Deleted comments do, or at least used to, appear in the responses tab.
Kyle Cronin
Neither of your two comments appeared in my responses tab, so I would really like to know how it works on SO itself.
Yar
In the responses tab you'll find answers to your questions and comments on your posts. You won't find responses to comments; for example, you won't find this comment in your responses thread, as this is really a response to my answer, and would appear in my responses tab were it not posted by me.
Kyle Cronin
Aha! Never noticed that. Thanks, Nobody.
Yar
+1  A: 

I expect that userid would be a natural option for the clustered index. If you have an "Active" boolean field then you don't need to worry much about locks; the table could be write-only except to update the (unindexed) Active column. I bet it already works that way, since it appears that everything is recoverable.

Don't need no stinking extra-credit response remover.

le dorfier
Hahaha re: the extra-credit response... I have to admit that I'm not interested in the actual syntax of the queries, but rather just the general structure.
Yar
What's the table? A RESPONSES tables like @nobody recommends? Or can you do it without mirroring the UI in the Database?
Yar
+1  A: 

I would assume this is denormalized in the database. The Comment table probably has both and answer_id and an answer_uid so the SQL to find comments on you answers just run against the comment table. The same setup would work on the Answer table. Each answer has a question_id and a question_uid.

Having said that, these are probably the same table and you have response_to_id and response_to_uid and that makes lots of code simpler and makes the "recent" tab a single select as well. In fact the difference between the two selects is one uses the uid and one uses the response_to_uid.

jmucchiello
+1  A: 

I'd say that your UI and your database should both be driven by your Application Domain; so they will reflect each other based on their common provenance there.

Some quick notes to illustrate, using simplified Object Role Modeling as discussed by Fowler et al.

Entities

Users
Questions
Answers
Comments

Entity Roles
(Note: In Object Role Modeling, most Roles are reflexive. Some, e.g. booleans here, are monopolar)

Question has User
Question has QuestionVersions
Question as Answers
Question has Comments

Answer has AnswerVersions
Answer has Comments

Question has User
QuestionVersion has Text
QuestionVersion has Timestamp
QuestionVersion has IsDeleted (could be inferred from nonNULL timestamp eg)
QuestionVersion has DeltedByUser
QuestionVersion has DeletedTimestamp

Answer has User
AnswerVersion has Text
AnswerVersion has Timestamp
AnswerVersion has IsDeleted
AnswerVersion has DeltedByUser
AnswerVersion has DeletedTimestamp

Comment has Text
Comment has User
Comment has Timestamp
Comment IsDeleted (boolean)

(note - no versions on comments)

I think that's the basics. These assertions drive ERDs in ORM. Hopefully it's self-evident how they drive the User Stories as well.

I don't think an implementation of a normalized design like this would require denormalization - especially since I think it's clear (from behavior) that queries => UI displays are cached to be refreshed 1X per minute.

le dorfier
You mean cached "naturally" by the Web Server, DB Server, and all other entities in the system?
Yar
No, it looks to me like at least the main question list is explicitly cached and refreshed 1X per minute. Inferred from behavior - if you try to force a refresh, you'll only get anything new about that often.
le dorfier