I have a site I'm building, it's an application that creates mail merges (more or less...) based on a couple of user preferences. It can generate Cartesian joins worth of data without a problem, but in comes the needs of enterprise to make life a bit more difficult...
I have to build the application so that, after verifying zip codes of remote employees, it creates emails to media targets based on how far from that employee the media target is. Let's say for instance employees are well known volunteers where they work. The enterprise wants to email media within a 5 mile radius of these employees a message about the work the employee is doing. This is where things get messy... I have several choices here, which I will outline the attempts and the failures:
The largest radius is 20 miles. I create a database table that holds records of every zip code in the US, joined to every zip code within 20 miles of that zip code. The dataset looks something like (The names are different this is for the sake of argument):
[SourceZip] | [City] | [State] | [CloseZip] | [City] | [State] | [Distance]
Fails: As an example, NY has 350k records from the above dataset (and other states are worse!). Average load time on that page? 6 minutes... Not happening. I verified this by setting breakpoints, it is during the dataadapter.fill() stage that the disconnect occurs.(This one was never implemented due to a logistics problem) I make a database connection for each employee zip to media target zips with a distance of x or less. Except that the source files and the media targets combined can reach upwards of 34k individualized emails. 34k DB connections? even if I could devise a way to reuse zip code searches, I did some test checks in the DB and found that there are 500 distinct zip codes in NY where employees worked. 500 db connections? I doubt that would work but I could be surprised.
My latest scheme to get around the problem is in that hoping the web server runs a better game then the .net dataset object by getting a new dataset looks like:
[zip] | [longitude] | [latitude]
Then doing a distance formula to figure out if the data works. This relies heavily on the processors on the web server. Is this a worthwhile gamble, or will I find the same load time damage on this attempt as well?Is there a better way?
I appreciate any input, even if it confirms my fears that this project just might not work.
Additional notes: I don't have control of the server, and I'm running SQL2k :(. I'm programming the site in visual studio 2005, framework 2.0. Might get upgraded to SQL2005 and VS2008 within the next few months though.