views:

560

answers:

4

Hi, I have quite a lot of data to store, read and modify in memory while the application works. The data could be compared to a tree, where each node is described by limited number of strings and integers, and has quite a lot of subelements. Currently the data is stored using classes/objects, like

TRootElement = class
fName, fDescription: string;
fPos: integer;
/// etc
fDocs: TObjectList; //list of TVariable = class(TRootElement)
fClasses: TObjectList; // list of TClass=class(TRootElement)

currently the memory consumed by the program is unacceptable, thus I'm looking for the solution to limit it.

My question is: will the consumption be significantly reduced if I replace current, OOP and Objects based architecture with one based on records? For example, the general record could contain:

TRootElement = record
 fType: TElemType; // enum: root, variable, class, etc ... 
 fName, fDesc: string; 
 // all the fields used by root elem and it's descendants there

Should I replace TList with pointers to next / previous elements? Since I'm never accessing the elements of list by index, I'm always looping through the whole list, it shouldn't be really hard to do... however I'd like to avoid it if not necessary.

Thanks! m.

+8  A: 

Changing a class into a record will reduce memory usage, but the significance of the savings decreases as the number of fields in the class or record increases. The size difference between a class and the corresponding record is exactly four bytes, which accounts for the VMT pointer that a class holds but which is absent from a record. That difference is usually negligible when you consider the trade-off: To save four bytes, you give up inheritance, polymorphism, data-hiding, and other object-oriented features. (Some of that might be mitigated with Delphi's new "records with methods," but if you only have Delphi 2005, you don't have that feature yet.)

In fact, if those four bytes really make the difference for your program, then you probably have a bigger problem to solve. That four-byte savings is wiped away simply by adding another node to your tree. With a large enough dataset, it won't matter how small you make any one node since you won't be able to keep them all in memory anyway. You'd need to investigate some kind of caching scheme, so only some nodes are kept in memory and the rest are kept elsewhere, such as in a file or a database.

If you replace your current lists with doubly linked lists of nodes, you'll probably see your memory use increase because now each of your nodes is keeping track of its next and previous neighbors whereas before the TObjectList was managing all that itself.

Rob Kennedy
I don't get it how is that possible that instance of TObjectList is smaller then n_reocords * 2 * sizeof(pointer)... what about all the fields inherited by TObjectList from it's parent class?
migajek
TObject is 4 bytes. TList adds 12 bytes, and TObjectList adds 4 more for a total empty TObjectList size of 20 bytes. Internally, TList uses an array of pointers to its elements (an additional 4 bytes per item), so the formula for total memory usage is *y = 4x + 20*. With a doubly linked list, you use two pointers (8 bytes) for *every* node in list object, so the formula is *y = 8x*. Although it starts out lower, its slope is twice that of the TList version, so it grows twice as fast. The intersection point, where you break even, is just **5**.
Rob Kennedy
+1  A: 

currently the memory consumed by the program is unacceptable

What is the meaning of unacceptable? Have you mesure it? What are the facts (number of objets, size of objects, used memory)?

Have you check with FastMM if your programm has a memory leak? If not this is the first think you should do.

If your lists often growing then perhaps you have a problem with memory fragmentation. Use the capacity property of the list (if possible). In this situation a linked list can help but a linked list needs more memory than TList (if capacity is used sensible). See How to monitor or visualize memory fragmentation of a delphi application for more Information how to check it.

For Delphi <= 2005 it can be helpfull to replace the Borland Memory Manager with FastMM it is so easy todo.

At least, like Rob, I don't think that a change to records solve your problems.

Heinz Z.
migajek
If the data is fairly static, a common CGI trick is to put it in an serivice that shares it over shared mem and share it with all cgi instances, and post mutations back to the service over some IPC means.
Marco van de Voort
its not a web application ;) and the data is not static, unfortunately ...
migajek
10 megabytes is nothing in a machine with 2 gigabytes of RAM.
Warren P
10 megabytes is not a problem, but that is only for php5. If user wants to have a code completion for framework or his own code, the usage will increment drastically.
migajek
+1  A: 

If you can set limits on strings as Kornel says, it really matters. ansistring has some internal overhead, and the additional overhead too. However shortstring is always allocated, even when not used.

If you are really tight on memory, doing your own allocation for the strings is more sensible, specially if the data is relatively immutable. Then simply allocate a big block, and put all strings in there, prefixed with a 16-bit length or so.

Less lowlevel tricks, like simply deduping (some of the) strings saves a lot of storage space too.

Note that the record vs class discussion of Rob only goes if you manage to statically instantiate the class in memory you allocate very cheaply, which you probably don't do. This because you can use array of record. Otherwise the fact that it is always a reference type causes heapoverhead and -slack (fastmm, 16 byte granularity)

I would recommend against using tstringlist/tlist/tobjectlist, because insertion deletion in very large lists (millions) can be painfull because the deletion/insertion is O(n), and inserting in the middle means shifting half the data. This gets painful somewhere between 20-100k and 1M elements, depending on how your access pattern is.

Using a tlist of tlists, and not letting every tlist get too big is already a good workaround.

When I did this (for an OLAP cluster when 2GB server memory was still $2000), I at one point even used the alignment bits in pointers to store the size-class of allocations. I wouldn't recommend that though :-)

Of course going 64-bit with FPC is also an option. I've got the core server part of above 32-bit solution working in 64-bit in under an hour.

Marco van de Voort
well, the memory I'm managing is not that big, definitely. That is just *unacceptable* comparing to what I'm actually storing.
migajek
I saw meanwhile in the comment. Just make it records, and allocate strings manually as described in above post.
Marco van de Voort
+1  A: 

Compared to the vast majority of IDEs, a memory increase for loading all of the PHP5 meta data of only 10 megabyte is actually pretty good.

If it is really worth the effort for you, I'd start with string literal merging.

Make all the strings go in a global table (or dictionary), and point to that from all your strings.

You can take this a step further, since the PHP 5 language and libraries are pretty static: convert your whole data structure from dynamic into static constants (using records), and all the indexes enumeration types.

What you might be able to do is make all your strings resourcestrings or string constants, and see if the Delphi compiler can do the literal merging for you.

I just noticed that you also load the documentation for all of the PHP5 stuff. That accounts for quite a bit of memory. You might want to load those into compressed streams.

Jeroen Pluimers
not the full documentation in fact, just a brief description + description of parameters. This data is being retrieved quite often, on each code completion execution, thus the access time is also very important. However its memory representation is too big, comparing to a plain file size (text file with classes/functions names and descriptions) which is around 1.8 Mb.
migajek