views:

70

answers:

3

I have to create a C# program that deals well with reading in huge files.

For example, I have a 60+ mB file. I read all of it into a scintilla box, let's call it sci_log. The program is using roughly 200mB of memory with this and other features. This is still acceptable (and less than the amount of memory used by Notepad++ to open this file).

I have another scintilla box, sci_splice. The user inputs a search term and the program searches through the file (or sci_log if the file length is small enough--it doesn't matter because it happens both ways) to find a regexp.match. When it finds a match, it concatenates that line with a string that has previous matches and increases a temporary count variable. When count is 100 (or 150, or 200, any number really), then I put the output in sci_splice, call GC.Collect(), and repeat for the next 100 lines (setting count = 0, nulling the string).

I don't have the code on me right now as I'm writing this from my home laptop, but the issue with this is it's using a LOT of memory. The 200mB mem usage jumps up to well over 1gB with no end in sight. This only happens on a search with a lot of regexp matches, so it's something with the string. But the issue is, wouldn't the GC free up that memory? Also, why does it go up so high? It doesn't make sense for why it would more than triple (worst possible case). Even if all of that 200mB was just the log in memory, all it's doing is reading each line and storing it (at worst).

After some more testing, it looks like there's something wrong with Scintilla using a lot of memory when adding lines. The initial read of the lines has a memory spike up to 850mB for a fraction of a second. Guess I need to just page the output.

A: 

If you are using System.String to store your matching lines, I suggest you try replacing it with System.Text.StringBuilder and see if this makes any difference.

ShellShock
A: 

Don't call GC.Collect. In this case I don't think it matters because I think this memory is going to end up on the Large Object Heap (LOH). But the point is .Net knows a lot more about memory management than you do; leave it alone.

I suspect you are looking at this using Task Manager just by the way you are describing it. You need to instead use at least Perfmon. Anticipating you have not used it before go here and do pretty much what Tess does to where it says Get a Memory Dump. Not sure you are ready for WinDbg but that maybe your next step.

Without seeing code there is almost no way to know what it is going on. The problem could be inside of Scintilla too, but I would check through what you are doing first. By running perfmon you may at least get more information to figure out what to do next.

Flory