ansaurus

Question

Answer 1

+2 A:

It seems that you just don't have enough memory to keep all the text from all the files in memory at the same time. You may need to come up with a strategy that caches a limited subset of the files and is intelligent enough to go back to the file when something's requested that isn't in the cache.

If the entire point of the exercise is that you don't have to go back to the files (e.g., building up some sort of index), you could also try to get "clever" and come up with an alternative representation for the in-memory text that takes advantage of the very-compressible nature of most western languages.

Greg D 2009-05-21 12:37:54

GregD: the KeyMap-file is actually an Index-file of the dictionary. I could split it into smaller parts and load the needed part when needed but this would slow down the spellchecking-process which retrieves suggestions, especially in this case since it's running on Windows Mobile with limited memory and hardware-resources.

moster67 2009-05-21 12:45:17

Answer 2

+2 A:

I'd say the problematic line is this one:

Dict = DictData.ToString.ToUpper.Split(mySplitSep.ToCharArray)

The GC isn't able to keep up the creation of temporary objects behind that simple line. "ToUpper" is creating a copy of the original string, and "Split" is creating a new array out of that copy (and probably using more memory for the splitting algo itself). By the way, the call to "ToString" is useless, DictData is already a string, right?

Personally, I would read from the stream by chunks, and do the splitting pieces by pieces, into a List<>. But if you want to keep your code short, try this, you never know:

DictData = ReadDictionary.ReadToEnd()
ReadDictionary.Close()
DictData = DictData.ToUpper()
GC.Collect()
Dict = DictData.Split(mySplitSep.ToCharArray)
DictData = Nothing
GC.Collect()

I never find it a good solution to call GC.Collect. Calling this generally means "something better should have been done". But memory management under .NET CF is sometimes painful.

Martin Plante 2009-05-21 13:01:29

slimCODE: I told you my code was crappy! You're right - DictData is a string but to get the conversion into UpperCase, I had to add ".To.String" in order to get ".To.Upper (using Intellisense) or perhaps I'm wrong. I can't verify now.

moster67 2009-05-21 13:13:34

slimCode: Could you elaborate your idea about "splitting pieces by pieces into a List<>". I don't understand what you mean. As to your code-suggestions, I will try that. Thanks!"

moster67 2009-05-21 13:15:14

slimCODE: I tried your code, applying the same to the 1st and 2nd file but unfortunately I still get OOM-exceptions.

moster67 2009-05-21 14:12:01

Answer 3

+3 A:

As you've not said what you're using this file for I'm assuming that you are just searching for a word for some reason.

First of all, its probably not a good idea to try and load the complete file into memory. Instead, it might more productive to search the file for the data (word) you need and also, perhaps, keep some sort of indexing information in memory to speed things up a bit.

As the data you are trying to search is just a list of words it might be a good idea to scan the file and record in a dictionary where the first letter of a word changes. e.g A's start at line 0; B's start at line 200; C's start at line 300 etc. Use these two pieces of information to populate your dictionary; the letter is the key and the line number is the value. In effect, the dictionary becomes a high level index into the word list file. This dictionary is also very small.

Then, when you start to search for a word, use the first letter of the word to search the dictionary. This will get you the line number where words that begin with that letter are located in the file. Armed with the line number (re)open the file and go straight to that line in the word file by moving stream pointer to the target line. Then search for the target word from there. Either search sequentially, a line at a time (not recommended it will be quite slow but will be easier to code). Or, search for the word using a binary chop (much quicker, but harder to code). Although for the latter you'll also need to know where the words that start with the target letter stop in the file as you'll be search a section of the file. I'd also recommend that you do the word searching in the file rather than load all those words into memory, otherwise you might be back to where you start with OOM errors.

If you're not sure of anything, stick a comment on here and I'll do my best to answer it.

Good luck

Barry Carr 2009-05-21 13:38:43

Good input! BinarySearch is already being used whenever and wherever it's possible. While the 1st file (the word list) must be loaded as shown in my code (needed for other algorithms in my library), your idea is still good, especially in regard to the 2nd file which would mean I could avoid loading the 2nd file and load/access the same only when needed. Of course this would involve a lot of file-accessing and probably a quite substantial performance-loss but still....I will give it a shot! Thanks!

moster67 2009-05-21 14:10:02

please see my edit2 for some extra information regarding the library

moster67 2009-05-21 14:26:01

Thanks for the feed back. Is there any chance you could vote for my answer? ;-)Regarding the second file. You could try and search the file on a back ground thread, that at least should keep your UI responsive.

Barry Carr 2009-05-21 14:27:29

ansaurus

tags:

views:

answers:

.net OutOfMemory exception

related questions