views:

294

answers:

3

Given a legacy system that is making heavy use of DataSets and little or no possibility of replacing these with business objects or other, more efficient data structures:

Are there any techniques for reducing the memory footprint of a DataSet?

I am thinking about things like setting initial capacity (when known), removing restrictions, etc., but I have little experience with DataSets and do not know which specific options might be available to me or if any of them would matter at all.

Update:

I am aware of the long-term refactoring possibilities, but I am looking for quick fixes given a set of DataTable objects stored in a DataSet, i.e. which properties are known to affect memory overhead.

Due to the way data is stored internally, setting the initial capacity could be one method, as this would prevent the object from allocating an arbitrarily large amount of memory when adding just one more row.

A: 
  1. If you're using VS2005+, you can instantiate DataTable objects, rather than the whole DataSet. In 2003, if the DataTable is instantiated, it comes with DataSet by default. 2005 and after, you get just the DataTable.

  2. Look at your Data Access layer for filling the DataSets or DataTables. It's most often the case that there is too much data coming through. Make your queries more specific.

  3. Make sure the code you're using does not do goofy things like copy the DataSets when they're passed around. Make sure you're using .Select statements or DataViews to filter and sort, rather than making copies.

There aren't a whole lot of quick "optimizations" for DataSets. If you're having trouble with memory, use items 2 and 3. This would be the case regardless of what type of data transport object you'd use.

And get good at DataSets. If you're not familiar with them, you can do silly things, like with anything. Then you'll write articles about how they suck, which are really articles about how little you know about them. They're really quite useful and simple to maintain. A couple tips:

  • Use typed DataSets. They'll save you gobs of coding and they're typed, which helps with simple validation.
  • If you're using typed DSs, make sure you don't modify the generated code file. If you're using VS2005+, you can put any custom business object behavior in the partial class for the DS (not the .designer code file).
  • Use DataView and .Select wherever you find yourself looping through DataRow objects.
  • Look around for a good code generation tool and build a rational data access framework for filling and updating from the DSs. One of the issues is that sometimes, designers tie the design of the DS directly to tables in the db, making the design brittle to data structure changes. If you -must- do that, build or use a code generator to build your data access layer from the db, like CodeSmith. Start by looking at some of the CodeSmith templates for generating stored procs and data access classes.
  • Remember when talking to someone about "objects" vs. "DataSets", the object in this case is the DataRow, not the DataSet. And because of the partial classes you can put behavior on the "object", getting you 95% of the benefits of "objects" for those who love writing code.
Mark A Johnson
A: 

This is unluckily to help you, but it can help greatly in same cases.

If you are storing a lot of strings that are the same in the dataset, e.g. names of Towns, look at only using a single string object with each distinct string.

e.g.

Directory <string, string> towns = new Directory <string, string>();
foreach(var row in datatable)
{
    if (towns.contains(row.town))
    {
       row.town = towns[row.town]
    }
    else
    {
       towns[row.town] = row.town;
    }
}

Then the GC can reclaim most of the duplicate strings, however this only works if the datasets lives for along time.

You may wish to do this in the rowCreated event, so that all the duplicate string objects are not created in first place.

Ian Ringrose
Might be an idea. Have to test this on some real-world data to measure the reduction in memory and impact on cpu cycles.
Joergen Bech
A: 

You could try making your tables and rows implement interfaces in the code behind files. Then over time change your code to make use of these interfaces rather then the table/rows directly.

Once most of your code just uses the interfaces, you could use a code generate to create C# class that implement those interfaces without the overhead of rows/tables.

However it may be cheaper just to move to 64 bit and buy more ram...

Ian Ringrose
Eegads! Baffling!
Mark A Johnson