ansaurus

Question

Large dataset (SQL to C#), long load time fix

Answer 1

+2 A:

If you have a ZIP code database with longitude/latitude coordinates, you could calculate the distance on the fly with my Haversine function (See my answer to this question).

This performs very well in web apps with the entire US ZIP code data.

The query would look like something similar to this:

select * from zip where 
   dbo.udf_Haversine(zip.lat,zip.long, @lat, @lon) < 20   -- (miles)

You would not apply this to each recipient's address, but you would determine the ZIP codes within your radius first (in an nested query, or with a CTE), and then join in all the addresses that you need to send a mail to.

cdonner 2009-03-25 02:20:22

The problem lies within the way the data is being generated - since many employees will be attached to multiple media targets, I'd have to make a separate database query for each source. Would all of these queries cause the kind of load times that the massive dataset would generate?

C Bauer 2009-03-25 02:37:19

Nice function, I think I'll have to use that. Apparently the one we were using didn't come with the same efficiency.

neouser99 2009-03-25 02:41:57

@unknown: Just try it out.

cdonner 2009-03-25 03:09:13

I'll have to wait on the new database table, but i will come back to report how it went once I get it.

C Bauer 2009-03-25 14:34:38

Answer 2

A:

Are you using SQL 2008? If so the new spatial data features might be just what you're looking for here. You can find coordinates within range of another as easily as using a "LIKE" comparison on strings.

http://www.microsoft.com/sqlserver/2008/en/us/spatial-data.aspx

Barry Fandango 2009-03-25 02:24:08

I did write in the additional notes at the bottom the versions of the applications I am able to work with. SQL2008 is not in my near future, unfortunately :)

C Bauer 2009-03-25 02:38:38

Ah right, sorry I didn't RTFQ all the way :)

Barry Fandango 2009-03-25 13:36:27

Answer 3

A:

EDIT After investigation the answer with the Haversine function is the route I would take... it's not as intensive as the function our db uses (which will be fixed :))

You should not calculate the distances every time, it's a heavy calculation from long/lat to long/lat, and if you are doing it more than once, it's unnecessary.

That being said, I'm unsure why you wrote off option #2 already. We are actually doing something similar to this. Maybe I'm confused by the numbers, but what you are mentioning should be nothing for SQL2k to sweat.

Even if you calculate offline the distance from zip to zip in the US, there are only ~2bn rows. Yes, it's a lot, but it's roughtly static, could be sharded if it's slow, etc.

neouser99 2009-03-25 02:26:55

Ah - I was more worried about the C# end - would it handle hundreds of database connections well? I suppose I assumed that making that many DB connections was just bad mojo :/

C Bauer 2009-03-25 02:31:04

Answer 4

+1 A:

If you have a dataset for your employees, and a dataset for your media, and a third dataset for the distance betweeen source and target zips, you may save a bit of time joining the 3 tables together...

SELECT *
FROM Employees_List
   INNER JOIN 
       (Media_List INNER JOIN Distance_List ON Media_List.Zip = Distance_List.Target_Zip)
   ON Employees_List.Zip = Distance_List.Source_Zip
WHERE distance_Miles <=5

This way you set the relations between the Employee and Media using the Distance.

2009-03-25 15:33:06

Ah, I didn't know I could inner join on the result set of an inner join. This solution generates exactly what I need! Thanks

C Bauer 2009-03-25 15:51:14

Answer 5

A:

SELECT of 350K rows (your example for NY) will not take 6 minutes if you order the table & index by SOURCEZIP (ALTER TABLE .. ORDER BY (SOURCEZIP) ) in MySQL. It should only take a fraction of a second ... The ALTER will take a long time (or you could create the table in that order) -- but since it is a static table it would be well worth nothing.

Dave Pullin 2009-03-26 18:45:35

ansaurus

tags:

views:

answers:

Large dataset (SQL to C#), long load time fix

related questions