ansaurus

Question

Answer 1

A:

Do you need to be doing all those calculations on the sql server? I generally try to only use SQL for basic CRUD with the data, then all other computations are done outside of SQL. You may want to try retrieving the data you are basing your calculations on, and then doing the actual calculation with whatever is retrieving the data.

Corey Sunwold 2009-10-02 03:37:17

Answer 2

+1 A:

The SQL proper seems fine, the bulk of the CPU time must be spent doing math... There are two avenues for optimization

simplify the formulae
Filter-out the rows early ("prune"), on the basis of an even simpler calculation

I haven't time at the moment for full details but here the general idea:
It is to approximate the distance from the reference ZipCode location and the other locations, with a cheap (CPU-wise) calculation, and to only do the full math (with a better formula than the one in the original query), for the locations that are below 50 miles (+ a small extra, to account for possible underestimation).

Estimating the distance and pruning
We calculate, once, the distance expessed in miles, corresponding to one degree of latitude and to one degree of longitude, from the reference ZIPcode location; call these MpDLat and MpDLong. Possibly we calculate the fractional value of degree that correspond to our target radius from the reference location; call these Dp50Lat and Dp50Long. Then work with the [absolute value of the] difference between the latitudes and between the longitudes, relative to the reference location, and filter out the locations for which this distance in one direction (lat or long) exceeds our limit. i.e. something like the following

WHERE .... (some other condidtions....) 
   AND (abs(z.latitude - 32.91336) * MpDLat) < 50 
   AND (abs(z.longitude + 85.93836) * MpDLong) < 50 
--or, if we got by the Dp50 values
WHERE .... (some other condidtions....) 
   AND (abs(z.latitude - 32.91336)  < Dp50Lat
   AND (abs(z.longitude + 85.93836) < Dp50Long

Calculating the distance (for there locations not readily filtered)
Depending on the level of precision required it may be acceptable to stick with the MpD factors (I'm guessing errors of a less than a mile or so, for distances in the order of 50 miles, within the continental USA). Then the distances would be calculated as: Sqrt((z.latitude - 32.91336)^2 + (z.longitude + 85.93836)^2 or, if we are only interested in filtering these out without needed the distance per se, we can work off the squares, i.e. ... WHERE (z.latitude - 32.91336)^2 + (z.longitude + 85.93836)^2 < 2500 -- 2509 is 50^2

I'm guessing this type of approximation is acceptable, since much bigger errors are made considering the fact the distance by way of roads (which is likely the one eventually desired) rarely matches that of the 'as-the-crow-flies' ;-) I can calculate precise worse case loss of precision (but again, no time for that now...)

If exact distance is needed, we use a slightly better formula that the original, this one seems directly derived from the spherical law of cosine. We can probably do better.

Variations on the above
The ideas discussed above can be implemented in various fashion, for example with the use of temporary SQL tables, or various constructs for the query(ies), etc.

mjv 2009-10-02 05:03:50

Answer 3

A:

you may select the distance calculation into a temp table and remove the HAVING from your SQL, then do a 2nd SELECT WHERE dist <= 50

this helps saving memory space and possible swap out to temporary disk segments for large numbers of records in your base table

MikeD 2009-10-08 11:54:16

ansaurus

tags:

views:

answers:

How to optimize my SQL in server?

related questions