ansaurus

Question

Answer 1

+1 A:

Is there any reason why you NEED to do this in SQL? I try to do data collection in SQL and push formatting into the code from which the SQL is called.

I would do an SQL query that simple, retrieving all of the image files but ordered by user_id/date. Then my code would loop through the results, concatenating each image file into the same output line until the user id/date changes, in which case we know we need to start a new line with a new user id/date.

This also makes it easier to do any funky formatting around the text, particularly if you're intending to emit HTML.

Joel Goodwin 2010-08-05 08:44:44

well i need to do this, because if you have limiting the rows by 20 and if one user has 20 images, it will show as 1 row, rather then 20 rows. in sql, it got 20 rows, but coding saw it as one row after grouping it. and it wont be same result as always. with query, it stays same and 20 rows images, after grouping them they will consider 1 row and fetch other rows too and make it complete 20 differnt rows after grouping them.

Basit 2010-08-05 08:58:09

It's a good point, but the pragmatic part of my soul would probably just limit the simpler query to, say, 500 rows and go from there, assuming I'd get enough rows out of the bargain. (The other alternative is to do this in multiple queries so that you will get back precisely the data you need and no more, but seems more complicated to me with little gain.) What if there are 1000 images for one date - will your code/interface be prepared for that?

Joel Goodwin 2010-08-05 09:18:51

Answer 2

+1 A:

This is not so easy to do in SQL, since it is order-dependent, which SQL is not well suited to.

The query is quite unwieldy, so I'll give it in full first, followed by a breakdown showing how it's put together.

SELECT @rownum:=@rownum+1 AS id, t.user_id, type, date, urls FROM
 (SELECT MIN(ID) AS original_id, user_id, type, date, GROUP_CONCAT(url) urls FROM
  (SELECT i1.*, 
   IF(i1.type='image', 
     IFNULL((SELECT MIN(i2.ID)-1 FROM Items i2
             WHERE i2.ID>i1.ID AND 
                  (i2.type!=i1.type OR i1.user_id!=i2.user_id OR i1.date!=i2.date)), 
            (SELECT MAX(id) FROM Items)), 
     i1.ID) AS lastRow,
   IF (i1.type='image', 
     IFNULL(SELECT MAX(i3.ID)+1 FROM Items i3 
             WHERE i3.ID<=i1.ID AND 
                  (i3.type!=i1.type OR i1.user_id!=i3.user_id OR i1.date!=i3.date)), 
            (SELECT MIN(id) FROM Items)), 
      i1.ID) AS firstRow) AS groupItems
  GROUP BY user_id, type, date, firstRow, lastRow) t, (SELECT @rownum:=0) r
  ORDER BY t.original_id;

The query uses a correlated subquery to find the start and end IDs of each image group. The group boundary is an item that is not the same type, user or date.

SELECT i1.ID, 
  IF(i1.type='image', 
     IFNULL((SELECT MIN(i2.ID)-1 FROM Items i2
             WHERE i2.ID>i1.ID AND 
                  (i2.type!=i1.type OR i1.user_id!=i2.user_id OR i1.date!=i2.date)), 
            (SELECT MAX(id) FROM Items)), 
     i1.ID) AS lastRow,
  IF (i1.type='image', 
     IFNULL(SELECT MAX(i3.ID)+1 FROM Items i3 
             WHERE i3.ID<=i1.ID AND 
                  (i3.type!=i1.type OR i1.user_id!=i3.user_id OR i1.date!=i3.date)), 
            (SELECT MIN(id) FROM Items)), 
      i1.ID) AS firstRow

For each item, the firstRow/lastRow columns give the start and end of the group. We can then use GROUP_CONCAT to concatenate all the URLs. To preserve order, MIN(id) is output, giving the first ID of each group.

SELECT MIN(id) AS original_id, user_id, type, date, GROUP_CONCAT(url) urls FROM
 (SELECT i1.*, 
  IF(i1.type='image', 
     IFNULL((SELECT MIN(i2.ID)-1 FROM Items i2
             WHERE i2.ID>i1.ID AND 
                  (i2.type!=i1.type OR i1.user_id!=i2.user_id OR i1.date!=i2.date)), 
            (SELECT MAX(id) FROM Items)), 
     i1.ID) AS lastRow,
  IF (i1.type='image', 
     IFNULL(SELECT MAX(i3.ID)+1 FROM Items i3 
             WHERE i3.ID<=i1.ID AND 
                  (i3.type!=i1.type OR i1.user_id!=i3.user_id OR i1.date!=i3.date)), 
            (SELECT MIN(id) FROM Items)), 
      i1.ID) AS firstRow) AS groupItems
 GROUP BY user_id, type, date, firstRow, lastRow

Finally, to get consecutive IDs for the new table, use variable to compute the rank:

SELECT @rownum:=@rownum+1 AS id, user_id, type, date, urls FROM
 (SELECT MIN(ID) AS original_id, user_id, type, date, GROUP_CONCAT(url) urls FROM
  (SELECT i1.*, 
   IF(i1.type='image', 
     IFNULL((SELECT MIN(i2.ID)-1 FROM Items i2
             WHERE i2.ID>i1.ID AND 
                  (i2.type!=i1.type OR i1.user_id!=i2.user_id OR i1.date!=i2.date)), 
            (SELECT MAX(id) FROM Items)), 
     i1.ID) AS lastRow,
   IF (i1.type='image', 
     IFNULL(SELECT MAX(i3.ID)+1 FROM Items i3 
             WHERE i3.ID<=i1.ID AND 
                  (i3.type!=i1.type OR i1.user_id!=i3.user_id OR i1.date!=i3.date)), 
            (SELECT MIN(id) FROM Items)), 
      i1.ID) AS firstRow) AS groupItems
  GROUP BY user_id, type, date, firstRow, lastRow) t, (SELECT @rownum:=0) r
  ORDER BY t.original_id;

SQL is best suited to working with unordered sets of data, rather than sequences, as here. If you can do this in the presentation code, or possibly better in your application layer, I imagine that will be faster and more flexible. A hand-coded solution would find the start and end of each group in a single pass through the data. I doubt the SQL query will execute as efficiently as that.

mdma 2010-08-11 23:15:52

where does i1 come from, because its giving error unknown table 'i1'

Basit 2010-08-12 02:09:03

takes 17.312 to run the query.

Basit 2010-08-12 02:16:01

i1 = groupItems. I changed the name to make it clearer, but didn't propagate that, but I guess you fixed it. It's no surprise that this is not fast, since for each row the query has to find the start and end group, which is typically going to involve a table scan for each row. Like I said in my answer, SQL is not really suited to this. I give you the SQL because you ask for it, but really a better solution is to not do this in SQL.

mdma 2010-08-12 10:10:31

Alternatively, are you sure the ordering is strictly needed? For example, if we can group simply by user_id, type, date and not concern about when the next/previous text or video is, then the query will be much more efficient. If you are not presenting all rows, then you may be able to make the query more efficient by adding a WHERE clause to the innermost use of the Items table to reduce the number of rows, and also filter the final output query. Adding an index to the items table on ID, and another on (type, user_id, date) may also improve performance.

mdma 2010-08-12 10:15:37

btw i did added the limit 0, 20, to reduce the rumber of rows. basically im trying to create news feed, which are like facebook type and i want to display the data as it comes and only show 3 images and hide other, but if some other text is written after it, then cut the group. if you have some other better idea, which will make news feed like facebook like, then i would love to use that instead. this is my database scheme http://tinyurl.com/sql-dump. you can try the sql by yourself on feed table and see whats the result you get.

Basit 2010-08-12 12:25:05

Can you show some more complete data with the dates also? Are you sure it's possible to have image1, image2, then text, then another image all from the same user and same date? If we know the date will change for different groups of images then the problem is much simpler.

mdma 2010-08-12 14:48:16

the database i pasted the link, it has around 10,000 rows i think. with date. yes date changes for different groups of images for same user. basically what im doing is, if someone was uploading there images and then from that, they posted a tweet, then group breaks, else group stays together for images and videos. if you have better approach or idea, i would love hear that.

Basit 2010-08-12 20:34:30

Answer 3

+2 A:

SELECT `date`, `user_id`, `type`, GROUP_CONCAT(`link`)
FROM `table`
GROUP BY `date`, `user_id`, `type`

zerkms 2010-08-11 23:20:25

this will not give the desired output, since items 1 and 4 will be grouped together, and the OP shows in the example that this is not desired.

mdma 2010-08-12 11:24:05

@mdma: you're wrong. be more attentive, please: 1 and 4 rows belong to different users. Moreover, they are different type rows.

zerkms 2010-08-12 12:20:43

sorry for the typo - I meant rows 1 and 5. They are the same user, same image, but they should be in different groups.

mdma 2010-08-12 14:44:01

@mdma: ah, now i get it :-S sorry. this task is not for RELATIONAL databases. OP is trying to make NAVIGATIONAL database from mysql :-S

zerkms 2010-08-12 20:35:07

Hang on -- rows 1 and 5 have different dates (see the comments on the OP). So no, they wouldn't group together with this query

eeeeaaii 2010-08-18 18:40:12

Answer 4

A:

this query would do the trick:

select id, user_id, `type`, group_concat(link) from images
group by user_id, `type`, date
order by id

ovais.tariq 2010-08-16 19:28:02

Answer 5

A:

I would possibly consider a different datastore such as CouchDB or RavenDB. This would be much better at serving up this type of content and wouldn't have to deal with foreign keys or joins.

Simply use continuous replication to the databases of all the friends.

Ultimately you're going to have to pre-query or abandon the relational model in order to gain back your speed.

Martin Murphy 2010-08-18 00:32:17

ansaurus

tags:

views:

answers:

complex group by mysql query

related questions