views:

134

answers:

4

I inherited the app and what it does is get data from 4 views with an (xml file in it) in chunks of 1000 records then writes them down in an xml file all this split up by a type parameter that has 9 different possibilities. That means in a worst case there will be 36 connections to the database for each 1000 of that type/view combination.

The real data will exist of 90.000 lines and in this case 900 - 936 times fetching up to 1000 lines from database.

Now I am wondering what advantages it would give to read all data into the app and make the app work with this to write the 900+ files.

1000 lines is about 800MB, 90.000 lines is approx 81GB of data being transferred.

The code would have to be rewritten if we read it all at once and although it would make more sense this is a one time job. After the 90.000 lines, we will never use this code again. Is it worth it to spend 2, 3 hours to rewrite code that works to reduce the amount of connections this way?

+4  A: 

Does the code work already? If it does, then I wouldn't spend time rewriting it. You run in to the risk of introducing bugs in the code. Since you will use this once and never use it again, it doesn't seem like it is worth the effort.

Dismissile
+5  A: 

If it's a one-time thing then why spend any effort at all optimizing it? Answer: no.

Let me add, though, in answer to your general question of what advantage does a big query have over lots of small ones: probably none. If you run a huge query you are leaving a lot of magic up to the middleware, it may or may not work well.

While having 36 simultaneous connections isn't optimal either, its probably better than running a query that could return 80 gigabytes of data. The ideal solution (if you had to use this code more than once) would be to rewrite it to get data in chunks but not leave lots of connections open simultaneously.

jamietre
it would actually be only 1 cnnection at a time, but in the live case, 936 times repeated in a loop. if i get your reply correctly, the ideal solution is the one i have now.
Andy
Yes, I believe so! Getting data in chunks is the best practice for handling large blocks of data. Otherwise you are depending on any number of other intermediate systems (the SQL server, .NET, LINQ, a data provider, and so on) to manage the flow of what's being sent back and forth between you and the server. In the worst case, the application might try to load the entire result synchronously. Asp.net probably would manage this somehow, I'm not exactly sure what the default handling would be, but its much better to have your application decide how much data is reasonable to request at a time.
jamietre
One more observation - 800 megabytes is still a lot of data to load into memory at once. Memory allocation/deallocation could be slow, it would probably be better to reduce the chunk size. If all the application is doing is reading a row and parsing it into an XML file, and needs no data other than whats in each row to do its work, then even getting a single row at a time seems reasonable given the large amount of data in each row.
jamietre
+3  A: 

If we are talking SQL Server, the biggest disadvantage of a large query (a single batch) over many small ones (note the opposite sense to the question you are asking) is that there can only be one query plan per batch.

Mitch Wheat
+1  A: 

If it's a one time job I'd say no. Many times I have done things that i normally wouldn't (cursors) but ONLY because it was a one time job.

Ask yourself it it makes sense to spend 2 to 3 hours on something that already works and you will never use again. There are obviously other factors to take into account though. Like will this lock up your production database for 2-3 hours?

If there are no disastrous side effects I'd say use what you have.

Abe Miessler