ansaurus

Question

Why the Excess Memory for Strings in Delphi?

Answer 1

+1 A:

Are you relying on Windows to tell you how much memory the program is using? It's notorious for overstating the memory used by a Delphi app.

I do see plenty of extra memory use in your code, though.

Your record structure is 20 bytes--if there is one such record per line you're looking at more data for the records than for the text.

Furthermore, a string has an inherent 4 byte overhead--another 25%.

I believe there is a certain amount of allocation granularity in Delphi's heap handling but I don't recall what it is at present. Even at 8 bytes (two pointers for a linked list of free blocks) you're looking at another 25%.

Note that we are already up to over a 150% increase.

Loren Pechtel 2008-11-23 05:39:14

The overhead of a UnicodeString is four bytes for the length, four bytes for the reference count, and two bytes for the null at the end.

Rob Kennedy 2008-11-23 05:55:29

In my previous example with records, I specifically stated I was comparing loading the record and assigning the string to loading the record without assigning the string. Therefore the difference was due to the string alone, and not the 20 bytes in the record.

lkessler 2008-11-23 07:36:06

Answer 2

+7 A:

What if you made your original record use AnsiString? That chops it in half immediately? Just because Delphi defaults to UnicodeString doesn't mean you have to use it.

Additionally, if you know exactly the length of each string (within a character or two) then it might be better to use short strings even and shave off a few more bytes.

I am curious if there might be a better way to accomplish what you are trying to do. Loading 320 MB of text into memory might not be the best solution, even if you can get it down to only require 320 MB

Jim McKeeth 2008-11-23 07:19:09

Good answer and I'll think about it. My program is designed for Unicode, so it would be a shame to have to resort back to ANSI for very large files. I may try file memory mapping. I don't expect that will be fast enough for what I need - but you never know until you try.

lkessler 2008-11-23 07:42:33

Answer 3

+4 A:

By default, Delphi 2009's TStringList reads a file as ANSI, unless there is a Byte Order Mark to identify the file as something else, or if you provide an encoding as the optional second parameter of LoadFromFile.

So if you are seeing that the TStringList is taking up more memory than you think, then something else is going on.

Nick Hodges 2008-11-23 08:29:37

Thanks, Nick. Hmmm... Can't imagine what else is going on. My example is quite simple.

lkessler 2008-11-24 05:39:00

Answer 4

+3 A:

Are you by any chance compiling the program with FastMM sources from sourceforge and with FullDebugMode defined? In that case, FastMM is not really releasing unused memory blocks, which would explain the problem.

gabr 2008-11-23 09:21:45

Good thought, but no. I'm using the FastMM in Delphi 2009. The only option I've changed is the compiler option to turn String Format Checking Off, as has been recommended on several blogs.

lkessler 2008-11-23 22:38:53

Answer 5

+6 A:

You asked me personally to answer your question here. I don't know the precise reason why you're seeing such high memory usage, but you need to remember that TStringList does a lot more than just loading your file. Each of these steps requires memory that may result in memory fragmentation. TStringList needs to load your file into memory, convert it from Ansi to Unicode, split it into one string for each line, and stuff those lines into an array that will be reallocated many times.

My question to you is why are you using TStringList? Must the file really be stored in memory as separate lines? Are you going to modify the file in-memory, or just display parts of it? Keeping the file in memory as one big chunk and scanning the whole thing with regular expressions that match the parts you want will be more memory efficient than storing separate lines.

Also, must the whole file be converted to Unicode? While your application is Unicode, your file is Ansi. My general recommendation is to convert Ansi input to Unicode as soon as possible, because doing so saves CPU cycles. But when you have 320 MB of Ansi data that will stay as Ansi data, memory consumption will be the bottleneck. Try keeping the file as Ansi in memory, and only convert the parts you'll be displaying to the user as Ansi.

If the 320 MB file isn't a data file you're extracting certain information from, but a data set you want to modify, consider converting it into a relational database, and let the database engine worry how to manage the huge set of data with limited RAM.

Jan Goyvaerts 2008-11-23 10:39:45

Thank you Jan for your ideas, which gives me more to think on. Your suggestion of "chunk" makes me want to try loading groups of strings, which average about 150 characters per group rather than the 17 characters per line. Genealogy software should be Unicode.

lkessler 2008-11-23 22:36:30

Of course your software should be Unicode. But that doesn't mean you need to hold 320 MB of data in memory in Unicode, when the source isn't Unicode.

Jan Goyvaerts 2008-11-24 16:27:40

Answer 6

+4 A:

I using Delphi 2009 and the file is ANSI but gets converted to Unicode upon reading, so fairly you can say the text once converted is 48 MB in size.

Sorry, but I don't understand this at all. If you have a need for your program to be Unicode, surely the file being "ANSI" (it must have some character set, like WIN1252 or ISO8859_1) isn't the right thing. I'd first convert it to be UTF8. If the file does not contain any chars >= 128 it won't change a thing (it will even be the same size), but you are prepared for the future.

Now you can load it into UTF8 strings, which will not double your memory consumption. On-the-fly-conversion of the few strings that can be visible on the screen at the same time to the Delphi Unicode string will be slower, but given the smaller memory footprint your program will perform much better on systems with little (free) memory.

Now if your program still consumes too much memory with TStringList you can always use TStrings or even IStrings in you program, and write a class that implements IStrings or inherits TStrings and does not keep all the lines in memory. Some ideas that come to mind:

Read the file into a TMemoryStream, and maintain an array of pointers to the first characters of the lines. Returning a string is easy then, you only need to return a proper string between the start of the line and the start of the next one, with the CR and NL stripped.
If this still consumes too much memory, replace the TMemoryStream with a TFileStream, and do not maintain an array of char pointers, but an array of file offsets for the line starts.
You could also use the Windows API functions for memory mapped files. That allows you to work with memory addresses instead of file offsets, but does not consume that much memory as the first idea.

mghie 2008-11-23 14:25:11

Your 3 ideas are good. But converting to UTF8 is inefficient and wrong in Delphi 2009. I either must keep it in ANSI and convert to Unicode when I need to, or absorb the 24 MB extra (which I'm willing to do) and convert to Unicode for the program to use.

lkessler 2008-11-24 22:35:04

Sorry, but I happen to disagree. UTF8 is the right format for data storage and data exchange, and since I/O is much slower than CPU processing it should give you not only smaller disk files, but better performance too. Whatever the internal string format, I would always use UTF8 for the data files.

mghie 2008-11-24 22:48:18

Data files are often of much greater value than program code, so optimizing for a particular programming environment is wrong. Their format has to be expressive yet efficient, preferably standardized. UTF8 gives you all of that, and is most common outside of Windows too. What's not to like?

mghie 2008-11-24 22:54:04

Answer 7

A:

Why are you loading that amount of data into a TStringList? The list itself will have some overhead. Maybe TTextReader could help you.

2008-11-24 15:48:10

TTextReader only helps to Parse the input. I do that already myself very efficiently. I then have to put the parsed lines someplace. I originally tried using records and found this memory use problem. Then I found the same problem in TStringList and left that on the question as a simpler example.

lkessler 2008-11-24 20:10:59

Answer 8

+1 A:

Part of it could be the block allocation algorithm. As your list grows, it starts increasing the amount of memory allocated at each chunk. I haven't looked at it in a long time, but I believe it goes something like doubling the amount of last allocated each time it runs out of memory. When you start to deal with lists that large, your allocations are also much larger than you ultimately need.

EDIT- As lkessler pointed out, this increase is actually only 25%, but it still should be considered as a part of the problem. if your just beyond the tipping point, there could be an enormous block of memory allocated to the list that isn't being used.

skamradt 2008-11-24 18:46:47

That was a good suggestion, but TStringList.Grow only increases the size 25% more each time. So the most the overhead is due to this is 25%.

lkessler 2008-11-25 06:46:47

ansaurus

tags:

views:

answers:

Why the Excess Memory for Strings in Delphi?

related questions