ansaurus

Question

Answer 1

A:

If your trying to get the latest comment, it should be ORDER BY comments.time DESC LIMIT 1. I doubt that'll solve your problem, though.

Andrew 2010-04-15 02:18:29

Why the downvote? What I said is true. Otherwise he would've gotten all the comments for that topic.

Andrew 2010-04-15 02:34:11

I'm not the downvoter, but the OP is looking for the latest comment per topic id. LIMIT 1 works over the entire result set, not within the groupings.

Larry Lustig 2010-04-15 03:10:30

Answer 2

+2 A:

This is an extension of standard SQL in MySQL which I don't think is helpful at all. In standard SQL your command would not be allowed at all since there's no way to determine which single line should be reported as a result of the GROUP BY. MySQL will execute this command with (as you found out) a random row returned.

You can see a discussion of this issue here: MySQL - Control which row is returned by a group by.

Larry Lustig 2010-04-15 02:33:03

Thanks Larry, I asked the guy below as well, but I might as well ask you since you were the one who sent the article. Your article said there were performance issues with using select queries within select queries. Is this true? If so, when do I have to worry about this?

Scarface 2010-04-15 03:17:22

@Scarface: each query you send to a database is a fairly expensive operation involving preparation, disk reads, sorting, etc. So if your query involves a second query, it becomes twice as expensive. That's not usually a problem if it's two queries instead of one (as in this case). But there are cases, called *correlated queries*, in which the nested query must be executed once for every candidate row in the outer query and that can have unacceptable performance implications.

Larry Lustig 2010-04-15 12:54:01

thanks larry, appreciate it

Scarface 2010-04-15 21:06:52

Answer 3

+3 A:

You've got a couple of things going on here. First, the reason your current query is returning weird results is that you aren't really using your GROUP BY clause in the way intended; it is intended to be used with aggregrated fields (like COUNT(), SUM(), etc). It is a convenient side-effect that on MySQL, the GROUP BY clause also returns the first record that would be in the group--which, in your case, should be the first inserted message for each topic (not a random one). So your query as it is written is essentially returning the oldest messsage per topic (on MySql only; note that other RDBMS's will throw an error if you try to use a GROUP BY clause like that!)

But you can actually abuse the GROUP BY clause to get what you want, and you are really close already. What you need to do is to do a sub-query to make a derived table first (with your messages ordered by DESC date), then query the derived table using the GROUP BY clause like this:

select * from (
  SELECT
    topic.topic_title, comments.id, comments.topic_id, comments.message
  FROM comments
  JOIN topic ON topic.topic_id=comments.topic_id
  WHERE topic.creator='admin'
  order by comments.time desc) derived_table
group by topic_id

Ken Taylor 2010-04-15 03:05:44

Hey really nice answer, it worked. If I could donate points I would donate 10 of mine. Very detailed, and I don't feel mystified after reading. I just have one question however, when I read the article that one of the other guys sent me it said there were performance issues with using select queries within select queries. Is this true? If so, when do I have to worry about this?

Scarface 2010-04-15 03:11:18

MySQL does return random values, not the first row, at least according to the docs: "The server is free to return any value from the group, so the results are indeterminate unless all values are the same." http://dev.mysql.com/doc/refman/5.1/en/group-by-hidden-columns.html

Larry Lustig 2010-04-15 03:12:17

Also when you say derived table, does that just mean you put that bit at the end to label your query, basically as a filler?

Scarface 2010-04-15 03:12:22

larry what exactly do you mean? You mean if you attempt to group something with different values, then the returned value is random within the group? I am sorry, I am kind of noob still so I always have a lot of questions.

Scarface 2010-04-15 03:15:45

@Scarface (Thanks!) There are performance issues for a nested query (a "select within a select"), because you are running multiple queries and not just the one. It gets worse if you get too fancy; you can get into situations where your sub-query is running multiple times for the parent query, and that is ugly. But this isn't one of those times; your performance hit here shouldn't be too bad as long as your tables aren't huge.

Ken Taylor 2010-04-15 03:23:44

@Larry Ha ha, I know what's in the docs...but nothing in a computer is random. I've run test and re-test on this, and what appears to be happening is that the first physically indexed record (i.e., the first record inserted) is what is being returned--which makes sense if you think about how an RDBMS stores data internally. I think the docs would be better written to say that the result is "unreliable" (which is true from a logic standpoint--you shouldn't code this way if you can avoid it), rather than the result is "random".

Ken Taylor 2010-04-15 03:26:15

My topic table has about 4 more rows I did not list, you think that would be a problem? Also just one more question lol, I noticed that although I got the information for the comments table, but I wanted to select rows from the topic table as well. Is this possible or should I just run a separate query?

Scarface 2010-04-15 03:28:52

@Scarface Derived tables are just sub-queries (you can think about them as temporary tables that contain the result sets of the inner queries), but MySQL requires that you give them an alias--hence the "derived_table" identifier in the sample query (not all RDMBS's force you to give derived tables a name, by the way). The name is irrelevant--pick something that matters to you--but you have to have one. You can use derived tables in joins, etc. just like any other table; in that case, having an alias for it is critical.

Ken Taylor 2010-04-15 03:31:17

@Scarface Just add the topic fields you want to select to the inner query like so: topic.topic_title, topic.creator, etc. They will bubble-up to the outer query.

Ken Taylor 2010-04-15 03:32:32

@Larry: Was that changed in 5.1? According to my copy of *MySQL: The definitiv`e Guide* using `GROUP BY` will implicitly sort unless an `ORDER BY` is present (and therefore specify `ORDER BY NULL` to avoid the overhead if you don't need sorted results). Of course, arbitrary order should be expected since relational sets don't have an order.

Duncan 2010-04-15 03:34:56

Thanks Ken, looks like I ended up giving you like 10 points anyway lol. Appreciate your time and teachings. Thanks everyone else as well for sparking discussion.

Scarface 2010-04-15 03:38:06

No problem--my pleasure!

Ken Taylor 2010-04-15 03:39:37

@Ken: For "random" you may prefer "undefined". You can test all you want, and I'm sure the results of your tests are accurate. But without an explicit guarantee the implementation can change at some future date. You must never rely on the internal storage mechanism of an RDBMS: unless the specification guarantees a certain result (especially in cases of ordering), the results can change from version to version. Many people got bitten when various RDBMSes implemented index-only retrieval, and result set order changed.

Larry Lustig 2010-04-15 12:48:19

@Duncan: You are mixing up two ordering issues. You are correct that GROUP BY will implicitly order the rows in the actual result set once those rows are calculated. But it will *not* order the rows *inside* each group that were used to calculate the single row for that group in the result set. If you GROUP BY customer_id and don't ORDER BY then customer 1 will implicitly be before customer 2 in the result set. But the 100 (or whatever) customer 1 rows that were aggregated to produce the single customer 1 row will not have been internally ordered.

Larry Lustig 2010-04-15 12:51:07

@Larry Absolutely agree! That's what I meant in my previous comment (though I like your term "undefined" better than my "unreliable"). This is problematic use of GROUP BY and works the way it does by accident, so to speak; it is not portable to other RDBMS's and cannot be guaranteed even on future versions of MySQL.

Ken Taylor 2010-04-15 15:33:28

@Larry: Ah, that makes sense. I probably misunderstood what Paul DuBois meant. Perhaps he didn't mention internal ordering because it doesn't happen, and it's like telling people not to think about purple elephants. Fortunately I wasn't abusing `GROUP BY` at the time I read that, I was aggregating so all columns I was interested in had the same value anyway.

Duncan 2010-04-15 23:58:00

ok ken lol I came back to this part of my site, and it turns out that your query does not work. While the right results are selected, they are in seemingly random order. Any more ideas haha?

Scarface 2010-04-20 20:56:17

ansaurus

tags:

views:

answers:

group by, order by, with join

related questions