views:

140

answers:

2

I recently looked into some .NET "memory leaks" (i.e. unexpected, lingering GC rooted objects) in a WinForms app. After loading and then closing a huge report, the memory usage did not drop as expected even after a couple of gen2 collections. Assuming that the reporting control was being kept alive by a stray event handler I cracked open WinDbg to see what was happening...

Using WinDbg, the !dumpheap -stat command reported a large amount of memory was consumed by string instances. Further refining this down with the !dumpheap -type System.String command I found the culprit, a 90MB string used for the report, at address 03be7930. The last step was to invoke !gcroot 03be7930 to see which object(s) were keeping it alive.

My expectations were incorrect - it was not an unhooked event handler hanging onto the reporting control (and report string), but instead it was held on by a System.Text.RegularExpressions.RegexInterpreter instance, which itself is a descendant of a System.Text.RegularExpressions.CachedCodeEntry. Now, the caching of Regexs is (somewhat) common knowledge as this helps to reduce the overhead of having to recompile the Regex each time it is used. But what then does this have to do with keeping my string alive?

Based on analysis using Reflector, it turns out that the input string is stored in the RegexInterpreter whenever a Regex method is called. The RegexInterpreter holds onto this string reference until a new string is fed into it by a subsequent Regex method invocation. I'd expect similar behaviour by hanging onto Regex.Match instances and perhaps others. The chain is something like this:

  • Regex.Split, Regex.Match, Regex.Replace, etc
    • Regex.Run
      • RegexScanner.Scan (RegexScanner is the base class, RegexInterpreter is the subclass described above).

The offending Regex is only used for reporting, rarely used, and therefore unlikely to be used again to clear out the existing report string. And even if the Regex was used at a later point, it would probably be processing another large report. This is a relatively significant problem and just plain feels dirty.

All that said, I found a few options on how to resolve, or at least work around, this scenario. I'll let the community respond first and if no takers come forward I will fill in any gaps in a day or two.

+4  A: 

Are you using instances of Regex or the static Regex methods which take a string pattern? According to this post, Regex instances do not participate in the caching.

Josh Einstein
Yes, the usage of static Regex methods was the culprit. You can verify that caching is used by the static methods via Reflector - all the static calls create a Regex using the private ctor that takes the 'useCache' parameter.The simple solution here is to not use the static methods. Caching isn't critical because the compilation is trivial compared to processing the huge input strings. Other solutions that may be useful, depending on how the Regex is used, is to disable Regex caching by setting Regex.CacheSize to 0 or running an empty string through the Regex after processing the source.
Kevin Pullin
+1  A: 

Try switching to a compiled Regex - instantiation takes longer, but perhaps won't be subject to this odd leak.

See http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regexoptions%28v=VS.100%29.aspx for more.

Or, don't hold onto the Regex instance longer than you need to - create a fresh one for each report invocation.

Bevan