views:

118

answers:

5

Given the following schema:

CREATE TABLE players (
  id BIGINT PRIMARY KEY,
  name TEXT UNIQUE
);

CREATE TABLE trials (
  timestamp TIMESTAMP PRIMARY KEY,
  player BIGINT,
  score NUMERIC
);

How would I create a SELECT that first finds the best scores from trials, then joins the name field from users? I've been able to get the scores I'm after using this query:

SELECT * FROM trials GROUP BY player ORDER BY score ASC LIMIT 10;

And my query for returning the top 10 scores looks like:

CREATE VIEW top10place AS
  SELECT player.name AS name, trial.*
    FROM trials AS trial, players AS player
    WHERE trial.player = player.id
      AND trial.score = (
        SELECT MAX(score)
        FROM trials AS tsub
        WHERE tsub.player = trial.player
      )
    ORDER BY trial.score DESC, trial.timestamp ASC
    LIMIT 10;

But when I hit thousands of entries in the tables, the DB performance starts to crawl. I figure the subquery is killing my performance. The first query (returning only the top scores) still performs adequately, so I was wondering if there is a way to force a JOIN operation to occur after the top scores have been selected.

EDIT Note that the query will return the top 10 ranked players, not just the top 10 scores. If the same player has many high scores, he should only show up once in the top 10 list.

I'm using SQLite, so it doesn't have some of the extended features of SQL Server or MySQL.

+1  A: 

This is an instance of you making something harder than it needs to be. The correct code is:

CREATE VIEW top10place AS
  SELECT player.name AS name, trial.*
    FROM trials AS trial, players AS player
    WHERE trial.player = player.id
    ORDER BY trial.score ASC, trial.timestamp ASC
    LIMIT 10;

Basically, let the LIMIT statement do the work :)

Silas
Shouldn't it be "ORDER BY trial.score DESC" ?
Andomar
The problem with this approach is that if the same player has to top-10 scores, they will both show up. I'm looking for the top-ranked players, so only unique player entries in this list.
jheddings
@Andomar yeah, that was my fault... My original question had a typo.
jheddings
+1  A: 

A subquery in a WHERE can be expensive if the optimizer runs it for every row.

(Edit) Here's another way to write the query, now with an exclusive join: it says there's no row for that user with a higher score:

SELECT 
     (select name from user where id = cur.userid) as UserName
,    cur.score as MaxScore
FROM trails cur
LEFT JOIN trials higher
    ON higher.userid = cur.userid
    AND higher.timestamp <> cur.timestamp
    AND higher.score > cur.score
WHERE higher.userid is null
ORDER BY cur.score DESC
LIMIT 10

This would return the 10 highest scoring users. If you'd like the 10 highest scores regardless of user, check Silas' answer.

Andomar
That approach is also very slow on my DB with 10,000 players and 100,000 trials. It takes minutes to complete. Can you offer another possibility?
jheddings
Ahh, I didn't see this one...
hythlodayr
@jheddings: Edited for another way to write the query
Andomar
+1  A: 

Let's start with a question: What are the indexes you have defined for your two tables?

hythlodayr
Indexes are just on `players.id` and `trials.timestamp`.
jheddings
You could get quite a performance boost, if you create one more index based on trials.player, trials.score (in that order). This way, the query can rapidly zoom in on a particular player and then find the max score.
hythlodayr
I should mention, it should be in conjunction with Andomar's query (or a similar variant).
hythlodayr
You could try experimenting w/ trials.score, trials.player as well since you don't want the max score for ALL players.
hythlodayr
Indexing did give me a considerable performance boost... +1
jheddings
+1  A: 

As has been mentioned, since your identifying key between players and trials is the player.id and trials.player, you should have an index on trials.player. Particularly if you relate those two tables a lot.

Also you might try making your query more like.

SELECT p.name as name, t.* FROM players as p
INNER JOIN (SELECT * FROM trials WHERE trials.score = (SELECT MAX(score) FROM trials as tsub WHERE tsub.player = trials.player) LIMIT 10) as t ON t.player = p.id
ORDER BY t.score DESC, t.timestamp ASC

This might even be able to be optimized a little more, but I'm no good at that without some data to throw the query at.

Myles
+2  A: 

Don't have sqlite running, hope the limit is right.

select players.name, trials.player, trials.timestamp, trials.score from
    (select player, score, timestamp from
    trials order by score desc, timestamp asc limit 10) trials, players
where players.id = trials.player

Regards

Khb
That did it... The query executed in < 2s with this approach, and with indexing, I'm < 500ms. Thanks!
jheddings
@jheddings No problem.
Khb