tags:

views:

297

answers:

3

I've been looking at fast ways to select a random row from a table and have found the following site: http://74.125.77.132/search?q=cache:http://jan.kneschke.de/projects/mysql/order-by-rand/&hl=en&strip=1

What I want to do is to select a random url from my table 'urls' that I DON'T have in my other table 'urlinfo'.The query I am using now selects a random url from 'urls' but I need it modified to only return a random url that is NOT in the 'urlinfo' table.

Heres the query:

SELECT url 
FROM urls JOIN (SELECT CEIL(RAND() * (SELECT MAX(urlid)
                                     FROM urls
                                     )
                           ) AS urlid 
               ) AS r2 USING(urlid);

And the two tables:

CREATE TABLE urls (
 urlid INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
 url VARCHAR(255) NOT NULL
) ENGINE=INNODB;


CREATE TABLE urlinfo (
 urlid  INT NOT NULL PRIMARY KEY,
 urlinfo VARCHAR(10000),
 FOREIGN KEY (urlid) REFERENCES urls (urlid)
   ) ENGINE=INNODB;
+3  A: 

How about working from this random solution:

SELECT TOP 1 * FROM urls
WHERE (SELECT COUNT(*) FROM urlinfo WHERE urlid = urls.urlid) = 0
 ORDER BY NEWID()
LorenVS
-1 SQL does not have a == operator; MySQL does not have a NEWID() function; and TOP 1 won't work in MySQL either :)
Andomar
My bad on the == operator, original question never explicitly mentioned mysql, I may have missed the references to InnoDB on my first look at the question
LorenVS
+1  A: 

You need to first do a left outer join to get the set of records in 'urls' that are not in 'urlinfo', then pick a random record from that set.

SELECT * FROM urls
LEFT OUTER JOIN urlinfo
ON urls.urlid = urlinfo.urlid
WHERE urlinfo.urlid IS null

Now pick a random row from this set - you can do something like

SELECT newUrls.url
FROM (    
      SELECT urls.urlid, urls.url FROM urls
      LEFT OUTER JOIN urlinfo
      ON urls.urlid = urlinfo.urlid
      WHERE urlinfo.urlid IS null
     ) as newUrls
WHERE urls.urlid >= RAND() * (SELECT MAX(urlid) FROM urls) LIMIT 1

However, this will only work if the urlids in urlinfo are roughly randomly distributed across the range of possible values.

David
+1 Your subquery needs an alias, and `*` gives a duplicate column name error. But otherwise nice answer :)
Andomar
Thanks. Edited to fix both (I hope - I don't have a test mysql db handy at the moment)
David
Get VMWare :) MySQL says `ERROR 1146 (42S02) at line 20: Table 'newUrls' doesn't exist`, and `select urls.url` should be `newUrls.url` :)
Andomar
A: 

You could use where not exists to exclude rows that are in the other table. For a random row, one option is a order by rand() with a limit 1:

SELECT url
FROM urls
WHERE NOT EXISTS (
    SELECT *
    FROM urlinfo ui
    WHERE ui.urlid = urls.urlid
)
ORDER BY RAND()
LIMIT 1
Andomar
"order by rand() limit 1" is the best way of picking a random row if you have a small table (<100 rows) or you don't care about performance. If your table is big, that query could be generating 100,000 random numbers and scanning them all for the lowest. That's expensive.
David
Well on my machine, generating 100,000 random numbers and picking the lowest is faster than running `SELECT MAX(urlid) FROM newUrls`.
Andomar