ansaurus

Question

select arbitrary amount of rows with N distinct values in foreign key referenced column

Answer 1

+1 A:

Here is a graceless query. There's probably a more clever way to do this.

SELECT s.*, u.* 
FROM sites s, urls u ON s.site_id = u.site_id
WHERE s.site_id IN 
    (SELECT DISTINCT site_id
     FROM urls uu INNER JOIN sites ss ON uu.site_id = ss.site_id
     WHERE uu.last_visited + ss.crawl_frequency < current_time 
     ORDER BY ss.site_id
     LIMIT n);

The subquery is supposed to return up to n distinct site_ids with least one crawlable URL. The ORDER BY attribute needn't be site_id. Actually ORDER BY isn't necessary at all. I just threw it in there because consistency is nice when playing with a new query.

The enclosing query returns all urls associated with n distinct sites, where each site has at least one crawlable url. Note that not all urls returned are necessarily crawlable; the only guarantee is that at least one url per site is crawlable. A returned site could have non-crawlable urls, too.

If only crawlable urls should be returned, the timing condition can be copied in the enclosing query. I couldn't tell which behavior was required from the question.

P.S. I'm indulging in pedantry now, but the way crawl_frequency is used makes me think it could be called crawl_period or crawl_delay instead

Dan LaRocque 2010-07-21 04:27:46

Thanks, That's close enough to what I want to get the rest for myself. I'll edit my question later to give the full solution.

aaronasterling 2010-07-21 19:30:32

ansaurus

tags:

views:

answers:

select arbitrary amount of rows with N distinct values in foreign key referenced column

related questions