views:

81

answers:

4

I have a DataTable populated with 250,000 records with 5 columns, I am iterating over this at least 500,000 times.

The difference in performance between a table with 1000 records is massive and I can understand and appreciate why - however is there is a way to improve the performance?

+1  A: 

I will assume that you have a good reason to be iterating over the list 500,000 times.

Depending on the work you are doing each iteration, you might be able to benifit from some parallization of the work load. Take a look at the TPL, you can use Parallel.ForEach to break the work into tasks that can be run concurrently. This way you can take advantage of more powerful hardware with more CPU cores.

Of course if you can do more of the work in fewer iterations you might also gain some performance, however without actually knowing what you are doing the only advice that can be offered are highlevel ideas without any bases on the actual problem domain.

Chris Taylor
+1  A: 

another solution would be to turn this into a list of objects, most likely just by having it on this different structure you would be able to iterate much faster

if you are not writing to the data on each iteration, you would definitely benefit from multi-threading (parallelization)

BlackTigerX
+1  A: 

Why do you need all of that data in memory at once? Why in the world are you iterating over it so many times? You need to rethink what you're doing, and find a way to do it, in the database, with set-oriented logic instead of iteration.

John Saunders
A: 

I would agree that you should have a VERY good reason to be processing 250k rows 500k times on the code side. Post some pseudocode and a basic idea of what you're trying to accomplish.

I will assume for now that you really have to go over 250k records 500k times. Maybe it's for a fractal series. If you do two relatively simple things to your algorithm, I think you'll improve performance considerably.

  1. Read each DataRow in the DataTable out into a POCO object you create, and make a List of these. DataRows and DataTables are EXTREMELY expensive to work with, because they're designed to handle ANY row or table, and so they have a lot of overhead that you don't need if you know the data structure. The one-time cost to pull them out, and then put them back in when you're done, will be paid back 499,999 times over.

  2. Parallelize the process. Look for ways to split each iteration among between 2 and 5 worker processes; one more than you have cores in your CPU. You won't divide the time by quite that much, but you'll see significant improvement. DON'T give each step of the iteration its own process; you'll clog the CPU with the overhead to manage them.

KeithS