views:

59

answers:

3

I'm having a memory leak issue in a service program that runs SQL scripts and dumps the results out to files. After running queries that produce many result rows, the memory usage of the process goes up by 50+ MB each time and doesn't come down.

Here is the code that opens the connection and retrieves the results:

using (var conn = new SqlConnection(DataSourceInfo.ConnectionString))
{
    conn.Open();

    var scmd = new SqlCommand(query_string, conn);
    scmd.CommandTimeout = 86400;

    var writer = dest.GetStream(); //the writer is disposed of elsewhere

    using (var da = new SqlDataAdapter(scmd))
    using (var ds = new DataSet())
    {
        da.Fill(ds);
        var table = ds.Tables[0];
        var rows = table.Rows;

        if (TaskInfo.IncludeColNames.Value)
        {
            object[] cols = new object[table.Columns.Count];

            for(int i = 0; i < table.Columns.Count; i++)
                cols[i] = table.Columns[i];

            LineFormatter(writer, TaskInfo.FieldDelimiter, null, false, cols);
            writer.WriteLine();
        }

        foreach(System.Data.DataRow r in rows)
        {
            var fields = r.ItemArray;

            LineFormatter(writer, TaskInfo.FieldDelimiter, TaskInfo.TextQualifier, TaskInfo.TrimFields.Value, fields);
            writer.WriteLine();
        }
    }
}

I used WinDbg with sos.dll to list the top objects by type after execution had completed and the process had plenty of time to GC:

79333470      101       166476 System.Byte[]
65245dcc      177      3897420 System.Data.RBTree`1+Node[[System.Data.DataRow, System.Data]][]
0015e680     5560      3968936      Free
79332b9c      342      3997304 System.Int32[]
6524508c   120349      7702336 System.Data.DataRow
793041d0      984     22171736 System.Object[]
7993bec4       70     63341660 System.Decimal[]
79330a00  2203630     74522604 System.String

The second column is the number of objects and the third is the total size.

There shouldn't be any System.Data.DataRow objects outstanding. It looks like they're being leaked somehow, but I'm not sure how.

What am I doing wrong?

Note: a previous version used SqlDataReader to retrieve the row data, but that approach lacked a way to get the column headers (that I know of) and sharing the data set between the DataSet and SqlDatReader would silently fail on some queries. I do not remember that version having the memory leak problem.

+2  A: 

Pick a DataRow and use !gcroot to see who's keeping a reference at the rows. See Tracking down managed memory leaks (how to find a GC leak).

Remus Rusanu
I did check a few of them and got no results. It may be a case as mentioned elsewhere that a collection isn't actually occurring. I'll check it with a manual GC when I get back to the dev machine.
Chris Smith
+1  A: 

Unless LineFormatter is doing something to hold onto references for the life of the program, I see no issue here.

You're making some big assumptions about how the garbage collector works. AFAIK, it works based on memory pressure, not time. If you were feeling really paranoid, you could run a GC.Collect() in the code and see if that brings the memory usage down, but I would never call GC.Collect() in production code -- just do this as a test.

Also make sure you're not relying on the Task Manager to tell you about how much memory is being reserved in the .NET heap. You should instead be looking at the performance counters in PerfMon to examine what's going on in the managed world.

Dave Markle
Running a manual GC.Collect() did free the memory-- from 184MB private bytes to 33MB (near where it started). So, if I'm not supposed to use GC.Collect() in production code, what am I supposed to do? This process only runs a few minutes each day. I don't want it taking up memory that would be better used in SQL Server or other caches. What's the benefit of the CLR keeping all of that unreferenced memory committed all day?
Chris Smith
When GC.Collect() does its work, it actually goes through and defragments the memory. This can be pretty intense from a load standpoint, so the collector only does that when it's under memory pressure. What would I do? If the process truly only ran for a few minutes on a schedule, I'd put it in as a Scheduled Task. As far as SQL Server goes, you should configure it to use a fixed amount of RAM. By doing that, SQL Server will put pressure on your other apps to save memory, not the other way around.
Dave Markle
The service waits for external events to respond to. Even though it only runs a few minutes each day, it really wouldn't work as a scheduled job. It has to be up at all times. Unfortunately, I don't control the SQL Server config. The service is supposed to sit unobtrusively in the background and not impact other services on the same box. What I ended up doing was to have it run a manual GC 1 minute after the number of active internal workers reaches 0. The CPU usage that occurs during the manual GC is negligible.
Chris Smith
That's not too bad of a solution if you are forced into this position. But you should raise the issue up the chain of command if your configuration is such that other apps can put memory pressure on your SQL Server. That's a Bad Thing.
Dave Markle
A: 

The best way to track down a memory leak is with a profiler, such as Nant or .Net Memory Profiler. I think both have at least a 15 day trial period, which is enough to learn what you need and to diagnose the memory leak.

I've used .Net Memory Profiler. It's very good at tracking down exactly what is being held, and what the paths are to get to the leaked memory from AppDomain or from any static objects. It works by running your app and grabbing metadata; you take a snapshot (with the profiler), run one operation that leaks emory, then take a second snapshop and compare. You can isolate what's different between the two snapshots, and sort by size, so you close in very quickly on the problem. Very good tool!

Cylon Cat