views:

76

answers:

5

Hi guys. Just after some opinions on the best way to achieve the following outcome:

I would like to store in my MySQL database products which can be voted on by users (each vote is worth +1). I also want to be able to see how many times in total a user has voted.

To my simple mind, the following table structure would be ideal:

  table: product          table: user            table: user_product_vote       
+----+-------------+    +----+-------------+    +----+------------+---------+ 
| id |   product   |    | id | username    |    | id | product_id | user_id |
+----+-------------+    +----+-------------+    +----+------------+---------+
| 1  | bananas     |    | 1  | matthew     |    | 1  | 1          | 2       |
| 2  | apples      |    | 2  | mark        |    | 2  | 2          | 2       |
| .. | ..          |    | .. | ..          |    | .. | ..         | ..      |

This way I can do a COUNT of the user_product_vote table for each product or user.

For example, when I want to look up bananas and the number of votes to show on a web page I could perform the following query:

SELECT p.product AS product, COUNT( v.id ) as votes
FROM product p
LEFT JOIN user_product_vote v ON p.id = v.product_id
WHERE p.id =1

If my site became hugely successful (we can all dream) and I had thousands of users voting on thousands of products, I fear that performing such a COUNT with every page view would be highly inefficient in terms of server resources.

A more simple approach would be to have a 'votes' column in the product table that is incremented each time a vote is added.

  table: product               
+----+-------------+-------+
| id |   product   | votes |
+----+-------------+-------+
| 1  | bananas     | 2     |
| 2  | apples      | 5     |
| .. | ..          | ..    |

While this is more resource friendly - I lose data (eg. I can no longer prevent a person from voting twice as there is no record of their voting activity).

My questions are:
i) am I being overly worried about server resources and should just stick with the three table option? (ie. do I need to have more faith in the ability of the database to handle large queries)
ii) is their a more efficient way of achieving the outcome without losing information

A: 

You have to balance the desire for your site to perform quickly (in which the second schema would be best) and the ability to count votes for specific users and prevent double voting (for which I would choose the first schema). Because you are only using integer columns for the user_product_vote table, I don't see how performance could suffer too much. Many-to-many relationships are common, as you have implemented with user_product_vote. If you do want to count votes for specific users and prevent double voting, a user_product_vote is the only clean way I can think of implementing it, as any other could result in sparse records, duplicate records, and all kinds of bad things.

SimpleCoder
+4  A: 

you can never be over worried about resources, when you first start building an application you should always have resources,space,speed etc in mind, if your site's traffic grew dramatically and you never built for resources then you start getting into problems.

As for the vote system, personally i would keep the votes like so

table: product          table: user             table: user_product_vote       
+----+-------------+    +----+-------------+    +----+------------+---------+ 
| id |   product   |    | id | username    |    | id | product_id | user_id |
+----+-------------+    +----+-------------+    +----+------------+---------+
| 1  | bananas     |    | 1  | matthew     |    | 1  | 1          | 2       |
| 2  | apples      |    | 2  | mark        |    | 2  | 2          | 2       |
| .. | ..          |    | .. | ..          |    | .. | ..         | ..      |

Reasons: Firstly user_product_vote does not contain text,blobs etc its purly interger so it takes up less resources anyways secondly, you have more of a doorway to new entities within your application such as Total votes last 24 hr, Highest rated product over the past 24 hour etc...

take this example for instance

table: user_product_vote       
+----+------------+---------------------------------+ 
| id | product_id | user_id   vote_type       time  |
+----+------------+---------------------------------+
| 1  | 1          | 2       |    product  |  224..  |
| 2  | 2          | 2       |    page     |  218..  |
| .. | ..         | ..      |..           |..       |

And a simple query

SELECT COUNT(id) as total FROM user_product_vote WHERE vote_type = 'product' AND time BETWEEN(....) ORDER BY time DESC LIMIT 20

another thing is if a user voted at 1AM and then tried to vote again at 2PM, you can easily check when the last time (s)he voted and if (s)he should be allowed to vote again..

theres so many opportunities that you will be missing if you stuck with your incremental example.


in regards to your count() no matter how much you optimize your quires it would not really make a difference on a large scale;

with an extremely large user-base your resource usage will be looked at from a different prospective such as load balancers, mainly server settings, apache, catching etc, theres only so much you can do with your quires.

RobertPitt
+1  A: 

Why not mix and match both? Simply have the final counts in the product and users tables, so that you don't have to count every time and have the votes table , so that there is no double posting.

Edit: To explain it a bit further, product and user table will have a column called "votes". Every time the insert is successfull in user_product_vote, increment the relevant user and product records. This would avoid dupe votes and you wont have to run the complex count query every time as well.

Edit: Also i am assuming that you have created a unique index on product_id and user_id, in this case any duplication attempt will automatically fail and you wont have to check in the table before inserting. You will just to make sure the insert query ran and you got a valid value for the "id" in the form on insert_id

Sabeen Malik
A: 

You don't want to update the product table directly with an aggregate every time someone votes - this will lock product rows which will then affect other queries which are using products.

Assuming that not all product queries need to include the votes column, you could keep a separate productvotes table which would retain the running totals, and keep your userproductvote table as a means to enforce your user voting per product business rules / and auditing.

nonnb
A: 

If my site became hugely successful (we can all dream) and I had thousands of users voting on thousands of products, I fear that performing such a COUNT with every page view would be highly inefficient in terms of server resources.

Don't waste your time solving imaginary problems. mysql is perfectly able to process thousands of records in fractions of a second - this is what databases are for. Clean and simple database and code structure is far more important than the mythical "optimization" that no one needs.

stereofrog