tags:

views:

58

answers:

6

Hello,

I'm tasked with reading a large text file (around 150 MB), parsing it and displaying the records in a data grid. The file is delimited by parenthesis.

I'm accomplishing this by -- in a separate thread -- reading the entire file into memory, storing the information in a dataset and then binding the dataset to the data grid which sits on the main form in the original thread.

I have two questions/issues:

  1. Is this the best way to do it? Is reading a 150MB file into memory to large? What is the best practice when doing this type of work?

  2. The amount of memory that gets allocated for the process is HUGE.. which is understandable because I'm reading such a large file. But, the problem is it doesn't get deallocated. So if I want to do process two files, more and more memory will get allocated. Until at some point the program will just crash. I'm guessing the dataset object is being referenced by something that's preventing the memory from being allocated... is there anyway to determine what that object is? Is there a tool or a method I can use for this purpose?

Any help on this will be greatly appreciated. I've never in my coding career ever had to worry about memory management. Thanks.

A: 
  1. This is acceptable if you'r only ever reading a single file, and you dont' expect it to grow much beyond 150MB. The important factor here, is that users of your app have enough memory to open the file. 150Mb isn't much, if you get to 150GB you'll have problems.
  2. This is because you likely still have a reference to your file in memory somewhere. Likely due to the fact that you're displaying it on screen.

If you need to load the whole thing into memory so users of your application can minipulate the file, your hands are tied. You might try streaming the records in as the user needs them. The TextReader and/or StreamReader classes are probably a good starting point if you want to go down that path.

Nate Bross
A: 

You can use the unsafe keyword to manually allocate and deallocate the file.

Alternativly use the "using" keyword

using (File newFile = input.read()) {
    // Do stuff


}

No comment on the dataset

Raynos
+1  A: 

As far as the memory not getting deallocated: if you are using something like a StreamReader to read in the text, you need to call .Dispose() on it when you are done (or put it in a using() {} block). That may have some effect on why the GC isn't collecting.

For your first question, though, 150meg isn't really that much these days, assuming you are only doing one text file at a time. I wouldn't worry about it until you need multiple/concurrent processes.

mgroves
A: 

Personally I would think about paginating it and only load say, 100 records at a time. Then clear the memory and load the next 100 when they click on the 'next' button. Similar to how results in a search engine would handle it, they can' show ALL the results in one page as it would take forever to load, so they split it down into smaller chunks. Is there any reason you need all the data loaded?

w69rdy
A: 

Firstly the memory issue find a profiler (I used to use the one from JetBrains but almost any will do). That at least will tell you exactly what's consuming the memory

this might not solve you're specific problem I've not used the approach for 150mb of data but my own first approach is generally to wrap the file in an IEnumerable reading one entry at a time and doing so lazily.

if problems with memory/performance are present (which I think they will be in your case) I'll load parts of the file (using the enumerable) and and only keeping that part in memory for as long as it's displayed. (this however creates a new issue if you need to navigate backwards)

Rune FS
A: 

It's possible that this blog post may be helpfull to you:

http://blogs.msdn.com/b/ericwhite/archive/2006/08/31/linq-to-text-files.aspx

asawyer