views:

441

answers:

5

I have an app that loads records from a binary log file and displays them in a virtual TListView. There are potentially millions of records in a file, and the display can be filtered by the user, so I do not load all of the records in memory at one time, and the ListView item indexes are not a 1-to-1 relation with the file record offsets (List item 1 may be file record 100, for instance). I use the ListView's OnDataHint event to load records for just the items the ListView is actually interested in. As the user scrolls around, the range specified by OnDataHint changes, allowing me to free records that are not in the new range, and allocate new records as needed.

This works fine, speed is tolerable, and the memory footprint is very low.

I am currently evaluating TVirtualStringTree as a replacement for the TListView, mainly because I want to add the ability to expand/collapse records that span multiple lines (I can fudge it with the TListView by incrementing/decrementing the item count dynamically, but this is not as straight forward as using a real tree).

For the most part, I have been able to port the TListView logic and have everything work as I need. I notice that TVirtualStringTree's virtual paradigm is vastly different, though. It does not have the same kind of OnDataHint functionality that TListView does (I can use the OnScroll event to fake it, which allows my memory buffer logic to continue working), and I can use the OnInitializeNode event to associate nodes with records that are allocated.

However, once a tree node is initialized, it sees that it remains initialized for the lifetime of the tree. That is not good for me. As the user scrolls around and I remove records from memory, I need to reset those non-visual nodes without removing them from the tree completely, or losing their expand/collapse states. When the user scrolls them back into view, I can re-allocate the records and re-initialize the nodes. Basically, I want to make TVirtualStringTree act as much like TListView as possible, as far as its virtualization is concerned.

I have seen that TVirtualStringTree has a ResetNode() method, but I encounter various errors whenever I try to use it. I must be using it wrong. I also thought of just storing a data pointer inside each node to my record buffers, and I allocate and free memory, update those pointers accordingly. The end effect does not work so well, either.

Worse, my largest test log file has ~5 million records in it. If I initialize the TVirtualStringTree with that many nodes at one time (when the log display is unfiltered), the tree's internal overhead for its nodes takes up a whopping 260MB of memory (without any records being allocated yet). Whereas with the TListView, loading the same log file and all the memory logic behind it, I can get away with using just a few MBs.

Any ideas?

A: 

You shouldn't use ResetNode because this method invokes InvalidateNode and initializes node again, leading to opposite effect than expected. I don't know if it's possible to induce VST to free memory size specified in NodeDataSize without actually removing node. But why not set NodeDataSize to size of Pointer ( http://stackoverflow.com/questions/2288461/delphi-virtualstringtree-classes-objects-instead-of-records ) and manage data yourself? Just an idea...

That is what I do right now. In fact, I am currently using NodeDataSize=0, but the app still uses up 200+ MB of memory. This is because the bare minimum size of TVirtualNode itself is 44 bytes, so multiplied by 5 million nodes is (44 x 5000000) = 220000000 = 214843.75 Kb = 209.81 MB, and that is just for the TVirtualNode memory alone, not including the overhead that the VCL memory manager uses on top of that.
Remy Lebeau - TeamB
I think it's possible in VST that child nodes looks exactly like their parent node so it's imperceptible to user that they are child nodes. You could add five thousands top-level nodes which every has one thousand child nodes.
A: 

If I understand it correctly, the memory requirement of TVirtualStringTree should be:

nodecount * (SizeOf(TVirtualNode) + YourNodeDataSize + DWORD-align-padding)

To minimize the memory footprint, you could perhaps initialize the nodes with only pointers to offsets to a memory-mapped file. Resetting nodes which have already been initialized doesn't seem necessary in this case - the memory footprint should be nodecount * (44 + 4 + 0) - for 5 million records, about 230 MB.

IMHO you can't get any better with the tree but using a memory-mapped file would allow you to read the data directly from the file without allocating even more memory and copying the data to it.

You could also consider using a tree structure instead of a flat view to present the data. That way you could initialize child nodes of a parent node on demand (when the parent node is expanded) and resetting the parent node when it's collapsed (therefore freeing all its child nodes). In other words, try not to have too many nodes at the same level.

TOndrej
In fact, I already do use a memory mapped file, but due to the possible size of the file (can be up to 1+ GB), and the way the display filtering is implemented, I do not map the entire file into memory at one time. As for implementing a tree structure in code, that will not help very much as most of the log file items will not have child nodes. That is why a ListView is currently being used for the display. Supporting child nodes is to accomodate a feature where multi-line log entries are stored in the log file as separate records that I would merge together in memory for easier management
Remy Lebeau - TeamB
+1  A: 

You probably shouldn't switch to VST unless you have a use for at least some of the nice features of VST that a standard listbox / listview don't have. But there is of course a large memory overhead compared to a flat list of items.

I don't see a real benefit in using TVirtualStringTree only to be able to expand and collapse items that span multiple lines. You write

mainly because I want to add the ability to expand/collapse records that span multiple lines (I can fudge it with the TListView by incrementing/decrementing the item count dynamically, but this is not as straight forward as using a real tree).

but you can implement that easily without changing the item count. If you set the Style of the listbox to lbOwnerDrawVariable and implement the OnMeasureItem event you can adjust the height as required to draw either only the first or all lines. Drawing the expander triangle or the little plus symbol of a tree view manually should be easy. The Windows API functions DrawText() or DrawTextEx() can be used both to measure and draw the (optionally word-wrapped) text.

Edit:

Sorry, I completely missed the fact that you are using a listview right now, not a listbox. Indeed, there is no way to have rows with different heights in a listview, so that's no option. You could still use a listbox with a standard header control on top, but that may not support everything you are using now from listview functionality, and it may itself be as much or even more work to get right than dynamically showing and hiding listview rows to simulate collapsing and expanding.

mghie
I currently use a TListView in vsReport mode, not a TListBox. TListView does not support variable-height items in vsReport.
Remy Lebeau - TeamB
I'm marking this one as the answer for now, but only because it tells me to not switch to VST.
Remy Lebeau - TeamB
A: 

Give "DeleteChildren" a try. Here's what this procedure's comment says:

// Removes all children and their children from memory without changing the vsHasChildren style by default.

Never used it, but as I read it, you can use that in the OnCollapsed event to free the memory allocated to nodes that just became invisible. And then re-generate those nodes in OnExpading so the user never knows the node went away from memory.

But I can't be sure, I never had a need for such behaviour.

Cosmin Prund
DeleteChildren() does not address the concerns I have. Most of the nodes would not have any child nodes to begin with. I was hoping to add support for the few ones that would. What I need is to free memory for the top-level nodes that become invisible while scrolling.
Remy Lebeau - TeamB
All visible nodes in TVirtualTree require an TVirtualNode because all the events used to get stuff related to the nodes (ex: CellText) expect an PVirtualNode as the only way to identify the Node. If that's a problem for you then you might want to look into the TVTNodeMemoryManager (and maybe hack it to be backed by a Memory Mapped File) OR go back to your initial ListView solution as suggested by mghie. And there's always the "100% custom control" solution, considering your very special needs! Making your own might be faster then hacking the alternatives.
Cosmin Prund
Although a custom control would be nice, I don't have that kind of spare time to develop one. I'll look at TVTNodeMemoryManager, but I don't think it will help much since all of the physical TVirtualNode items still have to be allocated in order for the tree to function.
Remy Lebeau - TeamB
A: 
joe snyder
You are adjusting the grid's RowCount based on how many rows need to be added/removed. I stated in my original message that I already knew about that technique: "I can fudge it with the TListView by incrementing/decrementing the item count dynamically". That is the route I am going to have to take.
Remy Lebeau - TeamB
@Remy: but you also said that was "not straight forward". with a drawgrid, expand/collapse is about as simple as you can get, plus you never have any loading/allocating/freeing/nodes/parents/children/etc to worry about. if listview works but is messy, and drawgrid works and is simple, go with the drawgrid. be sure to try the sample code. if you compare it side-by-side to the code necessary for the same functionality in listview, i think most coders would find the drawgrid solution easier to follow/maintain/modify/etc.
joe snyder
Unfortunately, due to the dynamic nature of my data loading and memory management, it is not a simple solution whether I use a ListView or a DrawGrid (also, being a custom control, a DrawGrid has extra internal overhead that a standard ListView does not).
Remy Lebeau - TeamB
@Remy: sorry,i can't guess what data/memory issues prevent using a simple drawgrid. it sounded like all u needed to do was browse a filtered log file. i would just make RecordContents() seek and read the record of interest, possibly via an index, and never load or manage anything. i also don't know what u mean by "custom" or "extra overhead". to me, a listview is like a stringgrid, but a drawgrid lets you avoid overhead + do jit data fetches. i use it for instant browsing of a file of 50 million recs, with no loading/memory headaches at all. what does the sample code not do for you?
joe snyder
When dealing with multi-GB files, doing random seeks and reads on individual records of the file can very slow, especially if the seeks are large. My code has a complex buffering system so only a portion of the file data is accessed and displayed onscreen at a time (I also have to take into account that multi-line text spans multiple records in the log file, so the buffering code takes care of putting them back together)...
Remy Lebeau - TeamB
... This works very well with very little overhead in my app right now, but as the cost that I do not get to expand/collapse multi-line text because I have to display each line in a separate ListView item instead of grouping them together.
Remy Lebeau - TeamB
TDrawGrid is a TCustomControl descendant, which means it is implemented completely from scratch by Borland, so all of its overhead is in the VCL. And I would have the overhead of having to load all of my data at one time, or doing slow seek+read operations. TListView is a wrapper around the standard Win32 API ListView control, so all its overhead is inside the OS, and its virtual mode is optimized for handling large amounts of data. Your sample code does not actually deal with any file I/O. Try adding that and see how well it performs with millions of records.
Remy Lebeau - TeamB
@Remy: you would think "seeks and reads" would be slow, but they're really not for humans browsing logs. try the sample code and you'll see.if you like "complex buffering", use it, but such code should be independent and segregated from whatever control is used."I do not get to expand/collapse". note that the sample code expands/collapses with a double-click. try it.(continued...)
joe snyder
@Remy: your drawgrid vs. listview comparison isn't convincing. drawgrid is open source, with no preload requirement, and simple. your claim seeks and reads would be too slow is a guess, not a measurement. data access is all done from one simple RecordContents() routine, so any buffering or index method can still be bolted on without changing the component.listview is closed source, more complicated, and poses multiline problems.drawgrid sounds better.(continued...)
joe snyder
@Remy: "Your sample...does not actually deal with any...I/O" because i don't know the format of your log file. go ahead, add the 2 lines of code necessary to fetch your records: a seek and a read."see how well it performs with millions of records". like i said, i'm using drawgrid to browse 50 million records, and it works great.i can lead you to the water, but i can't make u drink. try the sample to see the collapsing/exanding on fake lines. then stick a seek and a read in RecordContents() to directly grab your actual log records. i think you'll be surprised how nicely it works.
joe snyder