views:

174

answers:

4

If we look at the stackoverflow website we have votes. But the question is what is the bestway to store who has voted and who has not. Lets also simplify this even more and say that we can only vote Up, and we can only Remove the Up vote.

I was thinking having the table to be in such form

question - Id(INT) | userId(INT) | title(TEXT) | vote(INT) | ratedBy(TEXT)

Thre rest is self explanitory but ratedBy is a Comma Seperated Id values of the Users.

I was thinking to read the ratedBy and compare it with the userId of the current logged in User. If he dosent exist in the ratedBy he can vote Up, otherwise he can remove his vote. Which in turn will remove the value from ratedBy

+5  A: 

I think to make another table "vote" is better. The relationship between users and votes is n to n, therefore a new table should be created. It should be something like this:

question id (int) | user id (int) | permanent (bool) | timestamp (datetime)

Permanent field can be used to make votes stay after a given time, as SO does.
Other fields may be added according to desired features. As each row will take at least 16B, you can have up to 250M rows in the table before the table uses 4GB (fat32 limit if there is one archive per table, which is the case for MyISAM and InnoDB).
Also, as Matthew Scharley points out in a comment, don't load all votes at once into memory (as fetching all the table in a resultset). You can always use LIMIT clause to narrow your query results.

slipbull
But this leaves the question, thats a very large table over time
Shahmir Javaid
And? I've worked with multi-million record (logging) tables before. As long as your table is *well-designed* (reads: indexed, unlike that logging table was at first... Uggh.) it won't be (too) slow.
Matthew Scharley
cool, just confirming. Il index the userId and questionId, and make them both Primary Keys. This way i think i can avoid the check :D
Shahmir Javaid
The other potential pitfall is, don't try to fetch the whole table at once! Loading millions of rows into memory will probably kill something in the chain somewhere...
Matthew Scharley
A: 

I'd rather create a table votes (id, questionid, userid), with a 1:n relation to the questions table. That way you can easily add and remove votes by userid, without having to do lots of text-processing on the ratedBy field.

x3ro
woudnt text processing be faster when the table size of the votes is increased massivley
Shahmir Javaid
Yes. Processing the text field is infinitely slower comparitively to doing an index lookup (except *maybe* in the case of you don't care who voted what, you just want a count)
Matthew Scharley
A: 
Evangelos
Would you say that creating two tables to link the votes will always be faster than text prcessing?
Shahmir Javaid
My experience so far, primarily with mysql and oracle, is that databases are better at handling tables and joins. In addition, you can add indexes which will allow you to both have faster aggregation (get the sum of the votes) and check weather a user has already voted for a certain question. The database checks the index and goes directly to the appropriate row or rows without taking into account the rest of the data. However, you should keep in mind that in order to see any differences between the two approaches you need large numbers of votes per question ( a few thousands and more).
Evangelos
+2  A: 

A new table:

Article ID | User ID | Rating

Where Article ID and User ID make up the composite key, and rating would be 1, indicating upvote, -1 for a downvote and 0 for a removed vote (or just remove the row).

DanDan