tags:

views:

1676

answers:

3

I have the following query which works except that I would like it to behave differently. Now it looks for all the duplicate rows on url and returns it in the order of number of duplicate urls. I use GROUP_ CONCAT to seperate all the different screen_name's.

However there can be multiple rows with the same url and same screen_name. How do I make it so that it only retreives the rows where screen_name is distinct.

SELECT url, title, GROUP_CONCAT( screen_name ) , COUNT( t_uid )
      FROM `twl_links`
      WHERE twl_uid =3
      AND timestamp >= NOW( ) - INTERVAL 24 HOUR
      GROUP BY (
      url
      )
      ORDER BY COUNT( t_uid ) DESC

Thank you very much,

Ice

A: 

Have you tried using the distinct keyword?

http://www.tech-recipes.com/rx/1481/ignore-duplicate-entries-in-mysql-select-using-distinct-keyword/

EDIT:

Here is another article showing where distinct can be used with group_concat:

http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html

AaronS
I have but it makes no difference since I am getting the URLS and using GROUP_CONCAT for the screen_names
I modified my answer to show another article that combines the two. Which I think will give you what you want.
AaronS
+1  A: 

It not clear exactly what you want. Given this data:

t_uid    url                    title           screen_name
1        http://google.com/     Google          bob
2        http://google.com/     Google Search   bob
3        http://google.com/     Google          tom

what results are you expecting? If you want:

http://www.google.com  '???'  'bob,tom'   2

then you could do a SELECT DISTINCT subquery to remove the duplicates (which you'd put in place of FROM twl_links).

If you want

http://www.google.com  '???'  'bob,tom'   3

then you can do GROUP_CONCAT(DISTINCT screen_name) to get that.

Note that in all cases, the title returned is ambiguous (unless there is a N:1 mapping between URL and title; that is unless each URL has only one distinct title)

FYI, it sounds like you have a redundant and hard-to-query data as a result of a lack of normalization.

derobert
Thank you very much. I agree normalization normally would be the better option however this a table where data is only stored for 24 hours and then deleted. Hence I have chosen not to put the urls and titles in a seperate table.
A: 

You could limit it by using a subquery like this:

SELECT t1.url, GROUP_CONCAT( t1.screen_name ) , COUNT( * ) 
FROM (
  SELECT DISTINCT url, screen_name
  FROM mytable
) AS t1
GROUP BY t1.url

It doesn't give you the title, but since the title isn't in the GROUP BY, it wasn't really clear to me what that field would return anyway. If you want all the titles of those URLs, you could JOIN the above query with your table.

cpm