views:

110

answers:

5

I've got a multithreaded app that manipulates in-memory data (no database or network access). I tried this on 2 machines, one machine is Xeon dual quad core CPU, the other is twin dial-cores. 5 threads are spawned.

Then this multithreaded process starts it runs very quickly and the CPU usage is at 60% for 5 cores, the physical memory is 50% of the RAM capacity. (Info from task manager). After it's about 1/3 of the way through it starts to slow down and the CPU utilisation drops to just below 20%. By the time it gets to 2/3s of the way it's so slow that it takes 1 day to complete the last third while it takes half an hour to do the first 1/3.

The process creates many SortedLists and Lists, so I am starting to suspect that the Garbage Collector can't cope, although the Task Manager memory usage is not so bad... I want to try to force the GC to free the unused collections immediately, is this a reasonable or even doable? And why does CPU utilitsation drop?

+1  A: 

Forcing the garbage collector to run is almost always a bad idea. (In some instances, forcing it to run early could actually promote the lifetimes of objects)

Download a tool like Memprofiler, Ants or dotTrace (they all have trial versions), to identify whether you are leaking memory. Are you allocating objects larger than 85Kb?

Also, what version of the OS and .NET Framework are you using? (there were differences in how the server and PC versions of the GC worked)

Also, be aware that insertion into a SortedList is O(N) (whereas a SortedDictionary insertion is O(logN):

The SortedList generic class is a binary search tree with O(log n) retrieval, where n is the number of elements in the dictionary. In this, it is similar to the SortedDictionary generic class. The two classes have similar object models, and both have O(log n) retrieval. Where the two classes differ is in memory use and speed of insertion and removal:

  • SortedList uses less memory than SortedDictionary.

  • SortedDictionary has faster insertion and removal operations for unsorted data, O(log n) as opposed to O(n) for SortedList.

  • If the list is populated all at once from sorted data, SortedList is faster than SortedDictionary.

Ref.

How are you managing multithreaded access to these lists? Can you post some cut-down code?

Mitch Wheat
+1 for forcing the gc
Rob
I am using XP Professional SP2, with .NET 3.5 which is teh same as 2.0 core as far as I am aware. The lists I am building are long, but they contain integers and doubles. The lists themselves are bigger then 85K I am sure.
Misha
Yes, this is the right direction to go. I need to download a profiler and figure out how to use it, o/w its just poking around in the dark.
Misha
@Misha: You're right - don't poke in the dark. On the other hand, profilers may not be much help. The method I use is stackshots (http://stackoverflow.com/questions/375913/what-can-i-use-to-profile-c-code-in-linux/378024#378024). If the problem is lock contention, you'll find that out right away.
Mike Dunlavey
+1  A: 

I guess adding lots of items to a heavily loaded collection isn;t as performant as it could be. I noticed something similar with an old SQL query - 100 records in the recordset was quick, but half a million records slowed things down exponentially.

To check the GC, run up perfmon and view (or log) the performance counters for the garbage collector and memory allocations.

gbjbaanb
This is true for many scenarios, but here the size of the list does not increase across the iterations - it gets discarded and rebuilt everytime, so its' length does not increase... I need to look into perfmon
Misha
Actually, Im lying - there is a results list thats used across the threads, but its only a thosand items long, so I don;t think it's the issue.
Misha
@Misha: it could be, if all threads are trying to access it often.
Mitch Wheat
+1 for perfmon. Add a counter to perfmon to check time % spent in GC for your process.
JulianR
A: 

Sounds like a data structure locking issue. It's difficult to say without knowing exactly what you're doing.

Try using one of the lock-free non-contiguous collections such as ConcurrentDictionary or ConcurrentBag and/or a proper queue like BlockingCollection.

Stephen Cleary
A: 

You are more than likely using all you physical memory up with your data and Windows starts to use virtual memory after that which is much slower. You should try a memmory profiler to see which object are takeing up all your memmory, and consider disposing of some of those objest periodically to keep from using up all your RAM.

JohnEgbert
A: 

60% CPU on 5 cores from 5 threads. I assume that is 60% on each core. This is actually very bad. You cannot drive the CPU to 100% doing memory operation alone (no Database, no network, no file IO) it means your contention on locking is huge. As the program progresses, your structures presumably grow in size (more elements in some list/dictionaries), you are holding locks for longer, and the result is less CPU and even slower performance.

Is hard to tell w/o any real performance data, but this does not look like GC related. It looks more like high contention in the data structures. You should trace your app under profiler and see where the CPU/wait times are spent most. See Pinpoint a performance issue using hotpath in Visual Studio 2008 for a quick introduction to the sampling profiler.

Remus Rusanu
Remus, this is very helpful. I can't see the Call Tree View in VS2008? Do I need to download it from MS first?
Misha
I think is only in VSTS 2008, and Ultimate and Premium 2010. Standard 2008/2010 don't have it. http://msdn.microsoft.com/en-us/library/ms182372.aspx
Remus Rusanu