tags:

views:

3936

answers:

6

(Sorry the title isn't more help. I have a database of media-file URLs that came from two sources: RSS feeds and manual entries. I want to find the ten most-recently added URLs, but a maximum of one from any feed. To simplify, table 'urls' has columns 'url, feed_id, timestamp'. feed_id='' for any URL that was entered manually. How would I write the query? Remember, I want the ten most-recent urls, but only one from any single feed_id.

A: 

You probably want a union. Something like this should work:

    (SELECT 
        url, feed_id, timestamp 
    FROM rss_items  
    GROUP BY feed_id 
    ORDER BY timestamp DESC 
    LIMIT 10)
UNION
    (SELECT 
        url, feed_id, timestamp 
    FROM manual_items  
    GROUP BY feed_id 
    ORDER BY timestamp DESC 
    LIMIT 10)
ORDER BY timestamp DESC
LIMIT 10
Aeon
you can't select url, feed_id, timestamp if you group by feed_id, you must use some aggregate function around whatever you do not group by (i.e. url and timestamp)
Laurent
That's correct. The second SELECT returns at most one row since in the case of the manual feed, the feed_id is always ''. And to be clear, there's just one table.
Doug Kaye
A: 

MySQL doesn't have the greatest support for this type of query.

You can do it using a combination of "GROUP-BY" and "HAVING" clauses, but you'll scan the whole table, which can get costly.

There is a more efficient solution published here, assuming you have an index on group ids: http://www.artfulsoftware.com/infotree/queries.php?&bw=1390#104

(Basically, create a temp table, insert into it top K for every group, select from the table, drop the table. This way you get the benefit of the early termination from the LIMIT clause).

SquareCog
A: 

Would it work to group by the field that you want to be distinct?

SELECT url, feedid FROM urls GROUP BY feedid ORDER BY timestamp DESC LIMIT 10;

don
assuming that I'm right in that there's only 1 table, I think this is a workable solution.
cori
this does not work... see my answer
Sam Saffron
I tried something like that, but it only returns a max of one url that doesn't have a feed.
Doug Kaye
+3  A: 

Assuming feed_id = 0 is the manually entered stuff this does the trick:

select p.* from programs p
left join 
(
    select max(id) id1 from programs
    where feed_id <> 0
    group by feed_id
    order by max(id) desc
    limit 10
) t on id1 = id
where id1 is not null or feed_id = 0 
order by id desc
limit 10;

It works cause the id column is constantly increasing, its also pretty speedy. t is a table alias.

This was my original answer:

(
select 
    feed_id, url, dt 
    from feeds  
    where feed_id = ''
    order by dt desc 
    limit 10
)
union
(

select feed_id, min(url), max(dt) 
     from feeds
     where feed_id <> '' 
     group by feed_id
     order by dt desc 
     limit 10
)
order by dt desc
limit 10
Sam Saffron
yes, all rows have unique IDs ('id').
Doug Kaye
in that case you can do something like where id in (select max(id) from table where feed_id <> '' group by feed_id) can you post example sql definitions with expected results?
Sam Saffron
why min(url)? supposing the url is different for each item, how do you know this url will be the one matching the most recent date?
Laurent
this unions the last 10 entries, regardless of dupe feed with the last 10 by feed_id, which does nothing to give your desired results. what does "min(url)" accomplish. i do not think this will work.
Joe Skora
min(url) is kind of tricky, it ensures that if we have 2 items with the same date we break the tie by sorting by url, its kind of rare and you can break the ties using max(id) as well
Sam Saffron
I think the subselect in the latest version may be a bit slow with 100,000+ feed_id's. Can it have a LIMIT, too? And what's the 'p' in the first FROM?
Doug Kaye
Let me know how this latest edit goes, it should be a bit faster, you can not use limits in subqueries in mysql
Sam Saffron
Nope. Latest version returns ten rows from one feed.
Doug Kaye
Sorry Doug, give the latest iteration a shot, i found the bug.
Sam Saffron
Yup, that does it. Seems reasonably fast, too. Thanks for the help. And to Stack Overflow for an awesome system.
Doug Kaye
No worries Doug, thanks for your awesome podcasts!
Sam Saffron
To see the results, visit the new site, still early in development: spokenword.org. This SQL drives 'Recently Added Programs..." on the home page. Ugly for now (no CSS, etc.) but it works.
Doug Kaye
+3  A: 

Assuming this table

  CREATE TABLE feed (
  feed varchar(20) NOT NULL,
  add_date datetime NOT NULL,
  info varchar(45) NOT NULL,
  PRIMARY KEY  (feed,add_date);

this query should do what you want. The inner query selects the last entry by feed and picks the 10 most recent, and then the outer query returns the original records for those entries.

  select f2.*
  from (select feed, max(add_date) max_date
          from feed f1
         group by feed
         order by add_date desc
         limit 10) f1
  left join feed f2 on f1.feed=f2.feed and f1.max_date=f2.add_date;
Joe Skora
This is a good answer, and more of a general solution than the accepted answer which relies on having an auto-increment id.
gregmac
+1  A: 

Here's the (abbreviated) table:

CREATE TABLE programs (
  id int(11) NOT NULL auto_increment,
  feed_id int(11) NOT NULL,
  `timestamp` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  PRIMARY KEY  (id)
) ENGINE=InnoDB;

And here's my query based on sambo99's concept:

(SELECT feed_id,id,timestamp 
    FROM programs WHERE feed_id='' 
    ORDER BY timestamp DESC LIMIT 10)
UNION
    (SELECT feed_id,min(id),max(timestamp) 
    FROM programs WHERE feed_id<>'' GROUP BY feed_id 
    ORDER BY timestamp DESC LIMIT 10)
ORDER BY timestamp DESC LIMIT 10;

Seems to work. More testing needed, but at least I understand it. (A good thing!). What's the enhancement using the 'id' column?

Doug Kaye
I expanded my answer to cover that.
Sam Saffron