views:

1054

answers:

3

We have a lot of data layer code that follows this very general pattern:

public DataTable GetSomeData(string filter)
{
    string sql = "SELECT * FROM [SomeTable] WHERE SomeColumn= @Filter";

    DataTable result = new DataTable();
    using (SqlConnection cn = new SqlConnection(GetConnectionString()))
    using (SqlCommand cmd = new SqlCommand(sql, cn))
    {
        cmd.Parameters.Add("@Filter", SqlDbType.NVarChar, 255).Value = filter;

        result.Load(cmd.ExecuteReader());
    }
    return result;
}

I think we can do a little better. My main complaint right now is that it forces all the records to be loaded into memory, even for large sets. I'd like to be able to take advantage of a DataReader's ability to only keep one record in ram at a time, but if I return the DataReader directly the connection is cut off when leaving the using block.

How can I improve this to allow returning one row at a time?

+3  A: 

Once again, the act of composing my thoughts for the question reveals the answer. Specifically, the last sentence where I wrote "one row at a time". I realized I don't really care that it's a datareader, as long as I can enumerate it. That lead me to this:

public IEnumerable<IDataRecord> GetSomeData(string filter)
{
    string sql = "SELECT * FROM [SomeTable] WHERE SomeColumn= @Filter";

    using (SqlConnection cn = new SqlConnection(GetConnectionString()))
    using (SqlCommand cmd = new SqlCommand(sql, cn))
    {
        cmd.Parameters.Add("@Filter", SqlDbType.NVarChar, 255).Value = filter;
        cn.Open();

        using (IDataReader rdr = cmd.ExecuteReader())
        {
            while (rdr.Read())
            {
                yield return (IDataRecord)rdr;
            }
        }
    }
}

This will work even better once we move to 3.5 and can start using other linq operators on the results, and I like it because it sets us up to start thinking in terms of "pipeline" between each layer for queries that return a lot of results.

The down-side is that it will be very awkward for readers holding more than one result set, but that is exceedingly rare.

Joel Coehoorn
I am using a similar implementation for .NET 3.5. It works fine, but for some reason I feel some misuse of the iterator pattern, but that very subjective.It really get's messy when you want to wrap some exception handling preventing this block from exiting because the cast fails for some reason. ;-)
Theo Lenndorff
@RunningMonkey: since you've run this pattern live, have you seen the (IDataRecord) cast fail often? How worried about that should I be?
Joel Coehoorn
Maybe not much, because my implementation is more similar to http://stackoverflow.com/questions/47521/using-yield-to-iterate-over-a-datareader-might-not-close-the-connection.There I convert to base types with Convert.ChangeType, before doing a yield return. This almost cried for troubles. :-)
Theo Lenndorff
To be honest: That cast to IDataRecord has much more appeal to me.
Theo Lenndorff
What happens if the calling code is doing something weird with the returned IDataRecord and throws an exception? That iterator would instantly end and exit all using blocks, wouldn't it?
Theo Lenndorff
Exceptions are one thing, I can also see that the reader loop is now subject to execute slower since it depends on the speed of the enumerating code to move through the reader. Perhaps a bit speculative, but what happens when you pass this enumerator up to the presentation layer, an aspx page that wants to do some data binding on the collection. Instead of doing a fast scan-and-extract you're now doing a much slower databind operation.
Peter Lillevold
This solution will hold your connection open until you've iterated the entire resultset, compared to the original code which fills the DataTable using a firehose cursor and closes the connection asap. In effect you're trading one resource (memory) for another (database connections), not sure if this is an issue for you or not.
LukeH
A: 

I was never a big fan of having the data layer return a generic data object, since that pretty much dissolves the whole point of having the code seperated into its own layer (how can you switch out data layers if the interface isn't defined?).

I think your best bet is for all functions like this to return a list of custom objects you create yourself, and in your data later, you call your procedure/query into a datareader, and iterate through that creating the list.

This will make it easier to deal with in general (despite the initial time to create the custom classes), makes it easier to handle your connection (since you won't be returning any objects associated with it), and should be quicker. The only downside is everything will be loaded into memory like you mentioned, but I wouldn't think this would be a cause of concern (if it was, I would think the query would need to be adjusted).

John
We have a 4-tier architecture. The data layer returns _generic_ data objects (DataTable, DataRow, DataReader) to a translation layer that converts them into strongly-typed business objects. Minimizes the damage.
Joel Coehoorn
I see. I suppose the only downside to that would be doing the looping in both layers to get the list created, but I would think that impact would be minimal.
John
The great thing about IEnumerable is that you still only loop through the data once. So yes, the the impact should be non-existent.
Joel Coehoorn
+2  A: 

In times like these I find that lambdas can be of great use. Consider this, instead of the data layer giving us the data, let us give the data layer our data processing method:

public void GetSomeData(string filter, Action<IDataReader> processor)
{
    ...

    using (IDataReader rdr = cmd.ExecuteReader())
    {
        processor(rdr);
    }
}

Then the business layer would call it:

GetSomeData("my filter", (IDataReader reader) => 
    {
        while (rdr.Read())
        {
            ...
        }
    });
Peter Lillevold
.Net 2.0 for now, but I'll keep that in mind for the next refresh.
Joel Coehoorn
Actually, I don't think this will work because I want these to be passed up the chain towards the presentation tier. I could still doing that by repeating this pattern inside the lambda, but that feels ugly.
Joel Coehoorn
I agree. Then again, I tend to want to work through and close the data reader as quickly as possible, and rather build up the required data at the data layer, leaving it up to the layers above to dictate how those data should look like. With the Action processor, the business layer gets to dictate the format while keeping the reader loop as tight as possible (assuming that the business layer doesn't do something time consuming in the processor method, that is).
Peter Lillevold