views:

172

answers:

1

I have an old structure class like this: typedef vector<vector<string>> VARTYPE_T; which works as a single variable. This variable can hold from one value over a list to data like a table. Most values are long,double, string or double [3] for coordinates (x,y,z). I just convert them as needed. The variables are managed in a map like this : map<string,VARTYPE_T *> where the string holds the variable name. Sure, they are wrapped in classes. Also i have a tree of nodes, where each node can hold one of these variablemaps.

Using VS 2008 SP1 for this, i detect a lot of memory fragmentation. Checking against the stlport, stlport seemed to be faster (20% ) and uses lesser memory (30%, for my test cases).

So the question is: What is the best implementation to solve this requirement with fast an properly used memory ? Should i write an own allocator like a pool allocator. How would you do this ?

Thanks in advance,

Howie

A: 

Change typedef vector<vector<string>> VARTYPE_T; to typedef deque<deque<string>> VARTYPE_T; and check if there is still memory fragmentation.

By the way, how do you measure it with VS 2008 SP1 ?

Update: I know how HP handles on HP-UX memory fragmentation. After doing some search I found an interesting link Low-fragmentation Heap. This is a quote: The low-fragmentation heap (LFH) helps to reduce heap fragmentation. To me it is interesting since it resembles the approach of HP to memory fragmentation judging of course from its description. I think (besides using deque) it is another good idea to try and check. `

Update 2: 1) Speed. You have told nothing about speed before so I don't have any guidance in order to give you some good advice. I also don't know where your program spends most of the time while executing.

If you think that it mainly spends time in converting strings to longs and doubles then you have to do this conversion only once and use it when accessing values of variables. Probably it is a good idea to store not strings but real values. For example in a union or Boost.Variant. If you think that the variable lacks indexes that let quick access to its values you should add these indexes.

2) Memory usage. Are you actually writing a server application running for days and should care a lot about its memory consumption ? Then I already told you, use Low-fragmentation Heap on Windows and try to use block sizes fewer then 16K. Also get rid of memory leaks. Of course check it but as far as I understand Low-fragmentation Heap must handle you news.

3) If you write a server application then std::vector sometimes is not a good choice. It allocates memory in one block. If it is more than 16K this might result in memory fragmentation.

4) Finally, look how a good report from HP-UX about memory allocations looks like (in order to find memory fragmentation):

Actual Heap Usage:
    Heap Start  =   0x60000000000fea38
    Heap End    =   0x6000000026580000
    Heap Size   =   642258376 bytes

Outstanding Allocations:
    251948524 bytes allocated in 4239425 blocks
                              Detailed Report

-------------------------------------------------------------------------
65343264 bytes in 1361318 blocks (25.94% of all bytes allocated)
These range in size from 48 to 48 bytes and are allocated
#0  stlp_std::__malloc_alloc::allocate(unsigned long&)   from ./libstlport.so.5.1
#1  boost::multi_index::detail::ordered_index<boost::multi_index::identity<csubs::clnt_tax_hist_t>, stlp_std::less<csubs::clnt_tax_hist_t>, boost::multi_index::detail::nth_layer<1, csubs::clnt_tax_hist_t, boost::multi_index::indexed_by<boost::multi_index::ordered_non_unique<boost::multi_index::identity<csubs::clnt_tax_hist_t>, mpl_::na, mpl_::na>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, hrs_allocator::counting_allocator<csubs::clnt_tax_hist_t, (hrs_allocator::allocator_enums)9> >, boost::mpl::vector0<mpl_::na>, boost::multi_index::detail::ordered_non_unique_tag>::insert(csubs::clnt_tax_hist_t const&)   at _alloc.h:381



-------------------------------------------------------------------------
47510512 bytes in 848402 blocks (18.86% of all bytes allocated)
These range in size from 56 to 56 bytes and are allocated
#0  stlp_std::__malloc_alloc::allocate(unsigned long&)   from ./libstlport.so.5.1
#1  csubs::cache_impl<csubs::subs_data_t, csubs::search_policy_range<false>, csubs::erase_policy_range, hrs_allocator::counting_allocator<csubs::subs_data_t, (hrs_allocator::allocator_enums)5>, boost::multi_index::multi_index_container<csubs::subs_data_t, boost::multi_index::indexed_by<boost::multi_index::ordered_non_unique<boost::multi_index::identity<csubs::subs_data_t>, mpl_::na, mpl_::na>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, hrs_allocator::counting_allocator<csubs::subs_data_t, (hrs_allocator::allocator_enums)5> > >::put(csubs::subs_data_t const&)   at _alloc.h:381
#2  csubs::db_cache_loader<csubs::cache_impl<csubs::subs_data_t, csubs::search_policy_range<false>, csubs::erase_policy_range, hrs_allocator::counting_allocator<csubs::subs_data_t, (hrs_allocator::allocator_enums)5>, boost::multi_index::multi_index_container<csubs::subs_data_t, boost::multi_index::indexed_by<boost::multi_index::ordered_non_unique<boost::multi_index::identity<csubs::subs_data_t>, mpl_::na, mpl_::na>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, hrs_allocator::counting_allocator<csubs::subs_data_t, (hrs_allocator::allocator_enums)5> > >, csubs::ctrl_loader_nocheck>::operation()   at cache_subs_loaders.h:121
#3  csubs::group_subs_cache::load_caches_from_db(db_date const&, csubs::cache_options const&, otl_connect&, csubs::cache_stat*)   at ./caches/cache_subs_caches.cpp:1452

and so on.

skwllsp
This is how i get the memory usage:double MemoryGetWorkingSetSize(){ DWORD curPro = GetCurrentProcessId(); HANDLE hProcess; PROCESS_MEMORY_COUNTERS pmc; hProcess = OpenProcess( PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, FALSE, curPro ); if (NULL == hProcess) return -1.0; if ( GetProcessMemoryInfo( hProcess, CloseHandle( hProcess ); return rc; } CloseHandle( hProcess ); return -1.0;} /
howieh
Your function returns WorkingSetSize in Kb. Why do you say about `memory fragmentation`?
skwllsp
Ok.. i am using the process viewer tooHowie
howieh
Sorry, don't get your point about the process viewer. Is it capable of telling you how exactly memory is allocated by malloc()?
skwllsp
Maybe this image can give a better overview. Forget the old structure. What is the best for redesign ?http://img696.imageshack.us/img696/7633/cellstructure.jpgThanks,Howie
howieh
What goal of redisign? Still getting rid of memory fragrentation? Or something else.
skwllsp
We want use this stuff more for a project and want to have a good (maybe better than existing) approach for speed and memory usage. This is old stuff. Today nobody would implement this like it is (directly inherit from stl classes etc.). Also the implementation should all be wrapped and hidden for users. So if we do a lot of new, we should think about it.
howieh
You have so many questions here.
skwllsp
@skwllspThanks.First: I changed from vector to deque with not realy significant difference. But i understand the main difference. Vector allocates "en" block and deque does not.Second: I tried a lot of simple tests and it comes out as you wrote:The conversation between string and long/double take a lot of time.I tried also with union where instead of a string, a void pointer is available.So this is the best (tested) union cell-structure for me now: long m_long; double m_double[3]; void *m_vPointer;
howieh
With this pointer i can attach a string or other complex structures.I will think about it..Howie
howieh
I testet boost::any and boost::variant also. The simple structure above is always faster. boost::any has a smaller memory footprint but is 3 times slower.Thanks
howieh
I see that you care a lot about speed. You haven't told anything about your task but perhaps you might benefit from using threads in your application. Of course it is worth only if you have a multicore processor.
skwllsp
I redesigned the part of the app with my approach. Result: 40% less memory usage,40-50% more speed.@skwllspThreads are not easy in this kind of software. But you are right, maybe it is worth testing this.
howieh