views:

562

answers:

3

Problem: a table of coordinate lat/lngs. Two rows can potentially have the same coordinate. We want a query that returns a set of rows with unique coordinates (within the returned set). Note that distinct is not usable because I need to return the id column which is, by definition, distinct. This sort of works (@maxcount is the number of rows we need, intid is a unique int id column):

select top (@maxcount) max(intid)
from Documents d
group by d.geoLng, d.geoLat

It will always return the same row for a given coordinate unfortunately, which is bit of a shame for my use. If only we had a rand() aggregate we could use instead of max()... Note that you can't use max() with guids created by newid().

Any ideas? (there's some more background here, if you're interested: http://www.itu.dk/~friism/blog/?p=121)

UPDATE: Full solution here

A: 

Where did you get the idea that DISTINCT only works on one column? Anyway, you could also use a GROUP BY clause.

Joel Coehoorn
I second this, distinct works on ALL columns! ALl columnst must be unique.
Mitchel Sellers
Your absolutely right, I've refrased the question. The problem remains however.
friism
+2  A: 

You might be able to use a CTE for this with the ROW_NUMBER function across lat and long and then use rand() against that. Something like:

WITH cte AS
(
    SELECT
     intID,
     ROW_NUMBER() OVER
      (
       PARTITION BY geoLat, geoLng
       ORDER BY NEWID()
      ) AS row_num,
     COUNT(intID) OVER (PARTITION BY geoLat, geoLng) AS TotalCount
    FROM
     dbo.Documents
)
SELECT TOP (@maxcount)
    intID, RAND(intID)
FROM
    cte
WHERE
    row_num = 1 + FLOOR(RAND() * TotalCount)

This will always return the first sets of lat and lngs and I haven't been able to make the order random. Maybe someone can continue on with this approach. It will give you a random row within the matching lat and lng combinations though.

If I have more time later I'll try to get around that last obstacle.

Tom H.
This actually works if one replaces the `order by intid` with `newid()`. `partition` seems to impose some of order (which makes good sense) so one has to reshuffle the results in the outer query. Without that reshuffle, all the rows chosen where to the extreme south :-).
friism
I made the change as you suggested. I have to admit, I'm still learning with CTEs, so this was a good excuse to play around with them. :) I would have thought that ordering by NEWID() there would only change the ordering within a lat/lng pair.
Tom H.
+1  A: 

this doesn't work for you?

select top (@maxcount) *
from 
(
    select max(intid) as id from Documents d group by d.geoLng, d.geoLat
) t 
order by newid()
Mladen Prajdic
The problem with that approach is that it will always return the same document for a given coordinate, namely the one with the largest id. I'd like that to be random too.
friism
so just do select top 1 * from ... order by newid()
Mladen Prajdic
I can't `select *` when doing `group by`, can you elaborate?
friism
i don't see how this solution doesn't work. It's one i've used numerous times.
rip747