views:

204

answers:

2

This is Delphi 2009, so Unicode applies.

I had some code that was loading strings from a buffer into a StringList as follows:

      var Buffer: TBytes; RecStart, RecEnd: PChar; S: string;

      FileStream.Read(Buffer[0], Size);

      repeat
         ... find next record RecStart and RecEnd that point into the buffer;        

         SetString(S, RecStart, RecEnd - RecStart);
         MyStringList.Add(S);
      until end of buffer

But during some modifications, I changed my logic so that I ended up adding the identical records, but as a strings derived separately and not through SetString, i.e.

      var SRecord: string;

      repeat
        SRecord := '';
        repeat
          SRecord := SRecord + ... processed line from the buffer;
        until end of record in the buffer

        MyStringList.Add(SRecord);
      until end of buffer

What I noticed was the memory use of the StringList went up from 52 MB to about 70 MB. That was an increase of over 30%.

To get back to my lower memory usage, I found I had to use SetString to create the string variable to add to my StringList as follows:

      repeat
        SRecord := '';
        repeat
          SRecord := SRecord + ... processed line from the buffer;
        until end of record in the buffer

        SetString(S, PChar(SRecord), length(SRecord));
        MyStringList.Add(S);
      until end of buffer

Inspecting and comparing S and SRecord, they are in all cases exactly the same. But adding SRecord to MyStringList uses much more memory than adding S.

Does anyone know what's going on and why the SetString saves memory?


Followup. I didn't think it would, but I checked just to make sure.

Neither:

  SetLength(SRecord, length(SRecord));

nor

  Trim(SRecord);

releases the excess space. The SetString seems to be required to do so.

+11  A: 

If you concatenate the string, the memory manager will allocate more memory because it assumes that you add more and more text to it and allocates additional space for future concatenations. This way the allocation size of the string is much larger than the used size (depending on the used memory manager). If you use SetString, the allocation size of the new string is almost the same as the used size. And when the SRecord string goes out of scope and its ref-count becomes zero, the memory occupied by SRecord is released. So you end up with the smallest needed allocation size for your string.

Andreas Hausladen
That sounds plausible @Andreas, but 30% more memory!? My strings are long and average 500 characters each and I'm loading 100,000 of them. Maybe 40 concatenations are needed to build one up. Then the "SRecord" string gets reused for the next record, so I would hope the memory manager would reuse the space. I could understand 2 or 3 percent from your explanation, but not 30%.
lkessler
Why should the memory manager reuse the SRecord string if you still reference it from the StringList. It can't it has to create a new SRecord string for every.
Andreas Hausladen
Those two lines in all my examples are each in a loop that loops once for each record or 100,000 times. At the beginning of the loop, I set SRecord := ''; and then loop for the lines in the record and append the lines to SRecord. So SRecord is only about 500 characters long. For the next record, I would think that setting S to '' will allow the memory manager to clean up. Let me update the example to show the loops.
lkessler
Assigning '' to SRecord doesn't release the string because its ref-count is 2 (@SRecord and @TStringList.FList[].FString). It only decrements the RefCount to 1 (@TStringList.FList[].FString) and sets the SRecord variable to null. So the over-sized concatenation-string is still alive and used by the StringList. Your code allocates a new SRecord-string with every iteration. For every concatenation that doesn't fit into the string's allocation size, your string grows by at least 100% + 31 bytes for small blocks, 25% for medium blocks and 25% rounded up to 64KB for large blocks (FastMM).
Andreas Hausladen
@Andreas: Then you're saying that in my first set of code, the StringList uses the over-allocated concatenated string. In the third set of code, a new S is created which is exactly sized, and the overallocated string is then released by the memory manager since the S and not the SRecord is used by the StringList. Very subtle if that's the case.
lkessler
@lkessler: Maybe I was too technical, but that is what I wanted to tell you.
Andreas Hausladen
A: 

Try to install memory manager filter (Get/SetMemoryManager), which passes all calls to GetMem/FreeMem to default memory manager, but it also performs stats garhtering. You'll probably see that both variants are equal in memory consumption.

It's just memory fragmentation.

Alexander
It's not; Andreas Hausladen gives the correct answer.
himself
@himself Except that he is saying that one code's variant is overallocating memory, which is a memory fragmentation
Alexander
Fragmentation is when the memory is available, just in small blocks. It's not overallocation.
himself
@himself One definition of fragmentation is "it's when storage is allocated without intention to use it" and other is "it's when free storage becomes divided into many small pieces" (see wiki for example). That is exactly whan happens. From the point of view of calling code: memory is available (since I didn't call for MemManager.GetMem and I have no allocated memory on that place), but still inaccessible for re-use. What is it, if not fragmentation?
Alexander
You have a point, but it's not your original point. Memory fragmented like that will be considered acquired by GetMem/FreeMem, and thus memory consumption will not be equal. This is not a case of fragmented allocation, but of pre-allocating more than needed.
himself