I'm reading in a large text file with 1.4 million lines that is 24 MB in size (average 17 characters a line).
I'm using Delphi 2009 and the file is ANSI but gets converted to Unicode upon reading, so fairly you can say the text once converted is 48 MB in size.
( Edit: I found a much simpler example ... )
I'm loading this text into a simple StringList:
AllLines := TStringList.Create; AllLines.LoadFromFile(Filename);
I found that the lines of data seem to take much more memory than their 48 MB.
In fact, they use 155 MB of memory.
I don't mind Delphi using 48 MB or even as much as 60 MB allowing for some memory management overhead. But 155 MB seems excessive.
This is not a fault of StringList. I previously tried loading the lines into a record structure, and I got the same result (160 MB).
I don't see or understand what could be causing Delphi or the FastMM memory manager to use 3 times the amount of memory necessary to store the strings. Heap allocation can't be that inefficient, can it?
I've debugged this and researched it as far as I can. Any ideas as to why this might be happening, or ideas that might help me reduce the excess usage would be much appreciated.
Note: I am using this "smaller" file as an example. I am really trying to load a 320 MB file, but Delphi is asking for over 2 GB of RAM and running out of memory because of this excess string requirement.
Addenum: Marco Cantu just came out with a White Paper on Delphi and Unicode. Delphi 2009 has increased the overhead per string from 8 bytes to 12 bytes (plus maybe 4 more for the actual pointer to the string). An extra 16 bytes per 17x2 = 34 byte line adds almost 50%. But I'm seeing over 200% overhead. What could the extra 150% be?
Success!! Thanks to all of you for your suggestions. You all got me thinking. But I'll have to give Jan Goyvaerts credit for the answer, since he asked:
...why are you using TStringList? Must the file really be stored in memory as separate lines?
That led me to the solution that instead of loading the 24 MB file as a 1.4 million line StringList, I can group my lines into natural groups my program knows about. So this resulted in 127,000 lines loaded into the string list.
Now each line averages 190 characters instead of 17. The overhead per StringList line is the same but now there are many fewer lines.
When I apply this to 320 MB file, it no longer runs out of memory and now loads in less than 1 GB of RAM. (And it only takes about 10 seconds to load, which is pretty good!)
There will be a little bit extra processing to parse the grouped lines, but it shouldn't be noticeable in real time processing of each group.
(In case you were wondering, this is a genealogy program, and this may be the last step I needed to allow it to load all the data about one million people in a 32-bit address space in less than 30 seconds. So I've still got a 20 second buffer to play with to add the indexes into the data the will be required to allow display and editing of the data.)