views:

481

answers:

2

If it matters as of now I'm using MySQL/MyISAM but I'm open to using PostgreSQL. I'm also open to using memcached.

Consider a table that's used to store forum threads:

id forum_name post_date
1  Hey!       2009-01-01 12:00:00
  • What's the best practice of storing thread-related entities such as votes, views, and counters?

Should I..

Create a separate table such as

id thread_id views
1  1         532

Or keep it as a column in the initial threads table?

id forum_name post_date              views
1  Hey!       2009-01-01 12:00:00    532

Another related tidbit, what's the practical solution for displaying and incrementing page views? I read this thread and it seems like I could just cache a value for a certain time, I wasn't exactly clear on the incrementing part - perhaps something like storing the values in flat files somewhere, then periodically with cronjobs update the database forum views every hour or so?

EDIT: To clarify, voting IS like SO with one vote per thread and there can be reversals. Nevermind what I meant about counters.

+3  A: 

Voting

I suggest defining two tables in addition to the thread table - VOTE_CODES and THREAD_VOTES. At a glance, it might appear over normalized but the format will allow you to change vote value without needing major DML changes.

VOTE_CODES table

  • vote_code, primary key, IE: up, down
  • vote_cast_value --value attributed to up/down vote
  • vote_caster_value --optional, if you want to keep SO's style of the negative vote impacting the caster.

THREAD_VOTES table

  • thread_id
  • user_id
  • vote_code

All of the columns in THREAD_VOTES are the primary key - that will ensure that there can only be as many rows for a given user and thread as there are vote codes. Assuming only two codes, this would support the ability to reverse a vote because there could only be two records - one with either code.

Views

I would suggest storing the:

  • thread id
  • ip address
  • user_agent --rough browser capture
  • timestamp

All of the above are the primary key. Your table will populate fast, but it will give you the ability to create a computed column in a view for more accurate reporting.

OMG Ponies
Nevermind the counters bit, I updated the original post so yes it'll be pretty much the same as SO.
meder
I made the thread_votes and thread_vote_codes table. However, if I try to insert a duplicate entry for shoop_votes I obviously get the warning saying "Duplicate entry '1-2-2' for key 1 " - should my server-side code act upon that in a try..catch or should I add an `ON DUPLICATE` option to handle that?
meder
@meder: You need to trap the exception in TRY/CATCH. The primary key for `THREAD_VOTES` should be all three columns - `thread_id`, `user_id`, `vote_code`. If you get a duplicate entry, it's because someone is attempting to upvote (or downvote) more than once (one of each is supported, in order to reverse).
OMG Ponies
What if say, I initially vote up, and since each type is supported, I unclick the upvote ( 0 ), then JOIN all my votes and I get `0` and `1` and the sum of that would be `1` - shouldn't I check if the user_id has already voted on the same thread instead of storing multiple times?
meder
@meder: The design does not allow more than one vote of a given type. That's why you get the duplicate entry error, and it is the correct response from the database.
OMG Ponies
OMG Ponies
+1  A: 

flat files are obviously a bad idea, because you'd need to implement locking (the db already does that, and there's fewer bugs in that code).

relational database design is more of an art rather than a science: you can have

CREATE TABLE threads (
  tid THREADID
, title THREADTITLE
, views COUNTER
, PRIMARY KEY (tid)
);

and it'll be no more and no less "correct" than

CREATE TABLE threads (
  tid THREADID
, title THREADTITLE
, PRIMARY KEY (tid)
);

CREATE TABLE views (
  tid THREADID
, views COUNTER
, PRIMARY KEY (tid)
, FOREIGN KEY (tid)
  REFERENCES threads
);

so it's really up to you.

i'd say: go with the simplest thing first, make it more complicated if you find it's necessary (e. g. for performance reasons). IOW: put the views COUNTER attribute in threads. if it turns out the trafic is hurting performance (too many updates on the threads.views attribute mean the dbms must shuffle around otherwise immutable data in the other attributes), you can always split the table up in two, and replace it with a view that joins them. voila, immutable (or rarely changing) data separated from the volatile data, interface remains the same.

of course, go with PostgreSQL. the above shown code is valid in that dbms, just add these:

CREATE DOMAIN threadid
AS INT NOT NULL;

CREATE DOMAIN threadtitle
AS TEXT NOT NULL
CHECK (LENGTH(VALUE) > 0);

CREATE DOMAIN counter
AS INT NOT NULL
CHECK (VALUE > 0);

edit to refute the comment by OMG Ponies: of course it's safe.

UPDATE threads SET
  views = views + 1
WHERE tid = X

either succeeds or bails out.

edit 2 to add consideration for the voting aspect

let's say the spec is: a user may vote a thread up (+1) or down (-1), the sum of his or her votes on a given thread may not exceed |1|, and the history is irrelevant. iow a user may vote a thread up, then down to reset their vote to "no vote", then down again to "voted down", etc.

CREATE DOMAIN vote
AS INT NOT NULL
CHECK (VALUE BETWEEN -1 AND 1);

CREATE TABLE votes (
  tid THREADID
, uid USERID
, vote VOTE
, PRIMARY KEY (tid, uid)
);

in MySQL, you could

INSERT INTO votes (
  tid
, uid
, vote
) VALUES (
  X
, Y
, Z -- +1 or -1
)
ON DUPLICATE KEY UPDATE
vote = vote + Z

alas, PostgreSQL doesn't (yet) have such functionality built in, so you'd need to use the idiomatic user-level implementation

just somebody
Thank you for that long and comprehensive answer, I'm going to try both your and OMGPonie's suggestions and see how it goes.
meder
So every unique vote would go into the `votes` table if I decide to it separate from the `threads` table.. that means on every page view I would need to SELECT the total number of rows pertaining to `thread_id` from the `votes` table, add them up? And sorry if I missed this, but should I be updating the view on every page view, or do something like use the # of replies and ajax post to the db so it only updates one table at a time?
meder
Use a precomputed aggregate if you find the SUM() on every page view too slow. I couldn't grok your closing question, sorry.
just somebody
Would you mind posting the MySQL schema for the votes table?
meder
just replace all the DOMAIN names with `INT NOT NULL`.
just somebody