views:

62

answers:

4

Hi folks,

I wish to do the following pseduo code in Sql without the use of a CURSOR, if possible.

for each zipcode
{
    -- What city is this zipcode in? A zipcode can be in multiple cities, 
    -- but one city area is always way greater that the others.
    -- Eg. 90210 is 96% in the city of Beverly Hills. 
    -- The remaining 4% is that it s in some other fringe cties .. 
    --but only just (some mismatch mapping errors).

    ** grab the city largest shape area which this zip is a part of **
}

Now, I have some SQL to help us out.

-- All zipcodes and each boundary/shapefile.
SELECT ZipCodeId, Boundary FROM ZipCodes -- Boundary is a GEOGRAPHY field type.

To determine if a zipcode boundary is in a city....
SELECT CityId, CityName, 
    City.Boundary.Intersection(@someZipCodeBoundary).STArea() AS Area
FROM Cities
WHERE City.Boundary.Intersects(@someZipCodeBoundary) = 1

and to get the area of intersection (because we want the highest area of intersection... ie. TOP(1) ORDER BY Area DESC or a DISTINCT with an ORDER BY .. sorta thing... .. we use the Intersection sql method.

NOTE: Intersects and Intersection are two different Sql methods.

So.. can anyone help me out? I suck at all this SET stuff :(

A: 

Does Max work for that?

drachenstern
Can you please elaborate? `MAX`?
Pure.Krome
A: 

I must say that i have no idea about what Intersection / Intersects are or what they do and i must clarify that i dont understand this syntax below Boundary.Intersection(@someZipCodeBoundary). either

Assuming that this query of yours returns all cities which intersect with the zipcodeBoundary specified and Area is their area of intersection,

SELECT 
    CityId, 
    CityName, 
    City.Boundary.Intersection(@someZipCodeBoundary).STArea() AS Area 
FROM 
    Cities 
WHERE 
    City.Boundary.Intersects(@someZipCodeBoundary) = 1 

Based on my understanding, what drachenstern is suggesting above is about using MAX is

SELECT 
    CityId, 
    MAX(City.Boundary.Intersection(@someZipCodeBoundary).STArea()) AS Area
FROM 
    Cities 
WHERE 
    City.Boundary.Intersects(@someZipCodeBoundary) = 1 
GROUP BY CityId

This should give you the CITYID which has the MAXIMUM intersection area.

InSane
Kewl ... but notice how in your sql example, i'm passing in a _SINGLE_ `@someZipCodeBoundary` ? This is what i'm trying to avoid :( With the above code ... i would have that in a cursor, right? I'm trying to avoid that.
Pure.Krome
@Pure.Krome - Yes, you are right. To that end, will the list of all zipcodes come from ZipCodes table? Also, does Cities table have Boundary field in it?
InSane
A: 

Try another approach. Get somewhere a census statistics and filter out cities with maximum population grouped by the same zip code.

Like this Gazetteer thing, for example.

Denis Valeev
A: 

Ok - got it ;) The trick was to use a PARTITION BY. @In Sane gave me the idea when I realised I've done something similar, before :)

So .. here we go..

SELECT 
    a.CityId, a.Name, b.ZipCodeId, b.Code, a.Boundary.STIntersection(b.Boundary).STArea() AS Area, 
    RANK() OVER (PARTITION BY b.ZipCodeId ORDER BY a.Boundary.STIntersection(b.Boundary).STArea() DESC) AS Area_Rank
FROM 
    Cities a
    INNER JOIN ZipCodes b on a.Boundary.STIntersects(b.Boundary) = 1
    where b.Code = 12010 OR b.Code = 90210


CityId      Name                                                                                                 ZipCodeId   Code                 Area                   Area_Rank
----------- ---------------------------------------------------------------------------------------------------- ----------- -------------------- ---------------------- --------------------
2166        Los Angeles                                                                                          9331        90210                13235413.8430175       1
1686        Beverly Hills                                                                                        9331        90210                10413397.1372613       2
2635        West Hollywood                                                                                       9331        90210                0                      3
14570       Amsterdam                                                                                            29779       12010                15369521.9602067       1
14921       Hagaman                                                                                              29779       12010                1394562.70390667       2
14856       Fort Johnson                                                                                         29779       12010                211058.884834718       3
14651       Broadalbin                                                                                           29779       12010                0                      4

(7 row(s) affected)

So in this filtered example (Filtered by ZipCode 12010 or 90210), we can see that this zipcode exists in 4 different cities/towns. Each zipcode can have 1 to many results, which are then ordered by the Area value .. but the key here is the PARTITION keyword .. which does this ordering by ZipCode groups or partitions. Very funky :) Notice how the zipcode 90210 has it's own rank results? same with 12010 ?

Next, we make that a subquery, and just grab all the Rank == 1 :)

SELECT CityId, Name, ZipCodeId, Code, Area_Rank
FROM (
SELECT 
    a.CityId, a.Name, b.ZipCodeId, b.Code, a.Boundary.STIntersection(b.Boundary).STArea() AS Area, 
    RANK() OVER (PARTITION BY b.ZipCodeId ORDER BY a.Boundary.STIntersection(b.Boundary).STArea() DESC) AS Area_Rank
FROM 
    Cities a
    INNER JOIN ZipCodes b on a.Boundary.STIntersects(b.Boundary) = 1
    where b.Code = 12010 OR b.Code = 90210
) subQuery
WHERE Area_Rank = 1


CityId      Name                                                                                                 ZipCodeId   Code                 Area_Rank
----------- ---------------------------------------------------------------------------------------------------- ----------- -------------------- --------------------
14570       Amsterdam                                                                                            29779       12010                1
2166        Los Angeles                                                                                          9331        90210                1

(2 row(s) affected)

Sweet as candy :)

Side Note: This also shows me that my Los Angeles city shapefile/boundary is corrupted, because it's intersecting the zipcode 90210 far too much (which I visually confirmed :P)

alt text

Pure.Krome
Don't forget that using a WITH on SQL 2008+ gives you faster results ...
drachenstern
the `WITH` keyword ... er ... that one I don't know :) Time to check this bad boy out ... Cheers for the suggestion!
Pure.Krome

related questions