tags:

views:

124

answers:

4

I am using SQLite and will port to MySQL (5) later.

I wanted to know if I am doing something I shouldn't be doing. I tried purposely to design so I'll compare to 0 instead of 1 (I changed hasApproved to NotApproved to do this, not a big deal and I haven't written any code). I was told I never need to write a subquery but I do here. My Votes table is just id, ip, postid (I don't think I can write that subquery as a join instead?) and that's pretty much all that is on my mind.

Naming conventions I don't really care about since the tables are created via reflection and is all over the place.

select 
    id, 
    name, 
    body, 
    upvotes, 
    downvotes, 
    (select 1 from UpVotes where IPAddr=? AND post=Post.id) as myup, 
    (select 1 from DownVotes where IPAddr=@0 AND post=Post.id) as mydown
from Post 
where 
    flag = '0'
limit ?, ?"
A: 

You can use joins to achieve the same thing, and I would expect joins to work a lot more efficiently than embeded selects.

Tom Gullen
How? i have no column to select
acidzombie24
+1  A: 

I'm guessing that you're trying to ensure that a user only votes once on each post here.

I wouldn't - I don't - use separate tables for up votes and down votes. Add vote type to your votes table and you won't need correlated subqueries.

coolgeek
Actually i am allowing an 'up' an 'down' vote. Its really agree and disagree and you are allowed to do both. Since i can do both should i stick them in the same table? i was thinking its easier to have in 2 tables then 1 table and do a check for IP addr and writing the extra AND voteType <> (or =) myVoteType.
acidzombie24
Yikes. Given the improbability of a need arising for a third (fourth, etc) type of vote field, I would still go with one votes table, with two vote fields, one for up, one for down. One votes table record per user per post. Join votes with post on user id, selecting both votes fields and take the appropriate action in your application. Also, I trust that you're versed in the problems that can arise when using IP addresses for identification...
coolgeek
+1  A: 

Here is my opinions:

  1. It seems that table "UpVotes" and "DownVotes" have same structure and can be merged into one table.

  2. The relation between table "Post" and "Up/DownVotes" can be constrained by foreign key.

  3. Although I am not sure about the performance difference, but I think it would be better to use "join" mechanism rather than nesting two select statement in a select statement.
ZEAXIF
note i'd have two joins because its 2 values and not an or. The votes tables do use post as a FK and unless i have a column i can check (a bool saying up otherwise i know its down) i dont know how to write it as a join. Do i need that to be able to write it as a join?
acidzombie24
Up/DownVotes keep in one table with a bool field is a good idea. It is not necessary to use join, but you may need it for performance boost when your database gone large.
ZEAXIF
What do you mean? i would still have to join my list of votes with the post in both cases.
acidzombie24
Oh, sorry I just mean SQL join statement (select ... join...).
ZEAXIF
+1  A: 

Since you're asking about good practices... the "upvotes" and "downvotes" appearing in your Posts table looks like you're duplicating data in your database. That's a problem, because now you always have to worry whether or not the data is in sync and correct. If you want to know the number of upvotes then count them, don't also store them in the Post table. I'm not positive that is what you're doing, but it's a guess.

Onto your query... You will probably get better performance using a JOINed subquery instead of how you have it. With the scalar subqueries as columns they have to be run once for every row that is returned. That could be a pretty big performance hit if you're returning a bunch of rows. Instead, try:

SELECT
    P.id,
    P.name,
    P.body,
    P.upvotes,
    P.downvotes,
    COALESCE(UV.cnt, 0) AS upvotes2,
    COALESCE(DV.cnt, 0) AS downvotes2
FROM
    dbo.Posts P
LEFT OUTER JOIN (SELECT post_id, COUNT(*) cnt FROM dbo.UpVotes GROUP BY post_id) AS UV ON UV.post_id = P.id
LEFT OUTER JOIN (SELECT post_id, COUNT(*) cnt FROM dbo.DownVotes GROUP BY post_id) AS DV ON DV.post_id = P.id

Compare it to your own query and see if it gives you better performance.

EDIT: A couple of other posters have advocated a single table for up/down votes. They are absolutely correct. That makes the query even easier and also probably much faster:

SELECT
    P.id,
    P.name,
    P.body,
    P.upvotes,
    P.downvotes,
    SUM(CASE WHEN V.vote_type = 'UP' THEN 1 ELSE 0 END) AS upvotes2,
    SUM(CASE WHEN V.vote_type = 'DOWN' THEN 1 ELSE 0 END) AS downvotes2,
FROM
    dbo.Posts P
LEFT OUTER JOIN Votes V ON
    V.post_id = P.id
GROUP BY
    P.id,
    P.name,
    P.body,
    P.upvotes,
    P.downvotes
Tom H.
LEFT OUTER JOIN doesnt do what mine does. I am checking if i upvoted/downvotes the post by looking at my IP (its an fmylife like site which like a wiki doesnt require login). Isnt the () part in the outer join mean its doing a subquery? also when you say 'run once for every row that is returned' isnt that the same as a join which is applied on every row?
acidzombie24
oh and the up/down count is maintained by triggers on insert, update and remove. There should never be a case where it is wrong but it isnt important and i figure i can run checks later on.
acidzombie24
Nope, a JOIN is most definitely not the same - SQL deals with sets and a JOIN is a set operation, so it is much faster. Although there is a subquery, that query is only run once in this case and the results are then kept until SQL is done with them - it's because they're different kinds of subqueries (correlated vs. non-correlated). As for the triggers and the data never getting out of sync, I'll just say, "famous last words" ;)
Tom H.
The first is a correlated subquery, it fires once for every row in the results set. The second is a derived table, it fires only once for the entire set of data. It is almost always significantly faster. Basically you don't want to do row-by-row processing in a database 99% of the time. Databases are optimized for set-based oprations.
HLGEM