views:

412

answers:

5

I am running a Perl server with 10 threads. They never get destroyed until the program exits, but this is something I intend to have as much uptime as possible, so that's why this is an issue for me. The threads handle a simple task many times. When I start the server and all the threads are started, I see that I have 288.30 MB free. After a couple iterations through each thread, it reports 285.96 MB free. That isn't so bad..maybe it's just some stack space getting allocated or something during those iterations. But after 15 minutes, the free memory goes down to 248.24 MB! What has happened to my memory?? Now, interestingly, it DOES plateau. It continues to slowly consume but not as quickly as at first. I thought maybe it was my fault so I tried double checking the scope of all my variables and even undefining all of them at the end of the thread loop.

I print out the free space after EVERY iteration of the threads so I can watch it slowly go down. Now what is also interesting is that it doesn't decrease every time. Sometimes the free memory stays the same after iterating.

I am using Perl 5.8.8 built from source on Linux 2.6

Does anybody have any ideas at all or even suggestions as to what may be causing this? I am considering upgrading my Perl to a later version to rule out a memory leak within the Perl core.

UPDATE: Could this be a thread stack size issue? Could I be allocating more memory for the stack than I need. When I create my threads I don't change the setting from the default. Should I? The threads doc says default is generally 16MB depending on system. 16x10 threads=160MB -> which could be the culprit. Thoughts?

UPDATE: I built and installed Perl 5.12.1 and rebuilt the modules and everything. Been running the script for about an hour now and here's what I've noticed. The memory usage is manageable now, but not ideal.

  • At the beginning just after spawning my threads seemed to be a bit lower. Down from ~60-66MB allocated to my 10 threads to ~45-50MB.
  • After some iterations, their usage increased by 3MB total (about the same as before).
  • Up to this point is what I expected. All that memory upfront for the spawning, and then just a little bit for the variables I use in my threads. This is the part I don't like. After running for about 10 minutes, I loose an additional 65MB! Why does it do this? If it already iterated a few times fine with just 3MB, why keep allocating?
  • It's been running for an hour and a half at this point and they aren't using an additional 65MB anymore, it's an additional 84MB!
  • It slowly takes more memory BUT strange thing is that the amount of free memory is not decreasing each iteration. I print out the free memory before and after each iteration and it will stay the same for a while or hover +- around a certain number for a while and then change by 5-10MB all of a sudden. I can't leave this running more than one, two days tops because it starts getting close to 80/90% of my available memory.

Are there any other ideas? Any at all I could try? I am already undef'ing all my variables.

UPDATE: I really want to keep recompiling perl with glibc as a last resort since I have found some reports that on some flavors of linux it will segfault. So since I last posted I explored further the possibility of cycles in my hashes. Found nothing. So I spent the last few days analyzing my subroutine and caching anything that gets used in another iteration. A lot of new stuff gets recreated each time and perl isn't cleaning up all of it even if I explicitly undef it all. So if it won't cooperate, I just won't destroy it. Will see if caching my objects help at all. Will post memory usage stats later.

UPDATE: Hm, very strange. Even after caching my data to be reused later the memory rises at about the same rate. It starts higher now because I'm caching, but then keeps rising even though it's mostly using my cached objects. That's puzzling. Guess it's time to give glibc a try...or else this is just a drawback of choosing perl and will have to live with rebooting the server every couple days.

UPDATE: Tried without the caching, no glibc, again. Works fine for a while, a few hours, then it starts growing. Just wanted you to see a graph.
http://tinypic.com/r/311nc08/3
http://i32.tinypic.com/311nc08.jpg

UPDATE: Here's an excerpt of a log documenting the free memory before and after each thread over about a minute. Maybe this can help someone to understand the problem better. It seems to be stable for a bit and then every so often will eat more memory like this. Here I loose almost 40 MB!

[9:8:30, Fri Jul 23, 2010] [0] Memory usage at end thread 1: 253.812736MB (obj cache: 136)
[9:8:30, Fri Jul 23, 2010] [0] Memory usage at idle thread 1: 253.812736MB (obj cache: 136)
[9:8:34, Fri Jul 23, 2010] [204] Sending data to thread
[9:8:34, Fri Jul 23, 2010] [0] 3 - Creating a new obj
[9:8:34, Fri Jul 23, 2010] [206] Sending data to thread
[9:8:34, Fri Jul 23, 2010] [0] 4 - Creating a new obj
[9:8:35, Fri Jul 23, 2010] [0] Memory usage at end thread 3: 253.812736MB (obj cache: 136)
[9:8:35, Fri Jul 23, 2010] [0] Memory usage at idle thread 3: 253.812736MB (obj cache: 136)
[9:8:35, Fri Jul 23, 2010] [0] Memory usage at end thread 4: 253.812736MB (obj cache: 136)
[9:8:35, Fri Jul 23, 2010] [0] Memory usage at idle thread 4: 253.812736MB (obj cache: 136)
[9:8:41, Fri Jul 23, 2010] [225] Sending data to thread
[9:8:41, Fri Jul 23, 2010] [0] 2 - Creating a new obj
[9:8:42, Fri Jul 23, 2010] [0] Memory usage at end thread 2: 253.681664MB (obj cache: 136)
[9:8:42, Fri Jul 23, 2010] [0] Memory usage at idle thread 2: 253.681664MB (obj cache: 136)
[9:8:47, Fri Jul 23, 2010] [243] Sending data to thread
[9:8:47, Fri Jul 23, 2010] [0] 1 - Creating a new obj
[9:8:48, Fri Jul 23, 2010] [0] Memory usage at end thread 1: 253.935616MB (obj cache: 136)
[9:8:48, Fri Jul 23, 2010] [0] Memory usage at idle thread 1: 253.935616MB (obj cache: 136)
[9:9:1, Fri Jul 23, 2010] [277] Sending data to thread
[9:9:1, Fri Jul 23, 2010] [0] 3 - Creating a new obj
[9:9:2, Fri Jul 23, 2010] [280] Sending data to thread
[9:9:2, Fri Jul 23, 2010] [0] 4 - Creating a new obj
[9:9:2, Fri Jul 23, 2010] [0] Memory usage at end thread 3: 253.935616MB (obj cache: 136)
[9:9:2, Fri Jul 23, 2010] [0] Memory usage at idle thread 3: 253.935616MB (obj cache: 136)
[9:9:3, Fri Jul 23, 2010] [283] Sending data to thread
[9:9:3, Fri Jul 23, 2010] [0] 2 - Creating a new obj
[9:9:4, Fri Jul 23, 2010] [284] Sending data to thread
[9:9:4, Fri Jul 23, 2010] [0] 1 - Creating a new obj
[9:9:4, Fri Jul 23, 2010] [0] Memory usage at end thread 2: 253.935616MB (obj cache: 136)
[9:9:4, Fri Jul 23, 2010] [0] Memory usage at idle thread 2: 253.935616MB (obj cache: 136)
[9:9:5, Fri Jul 23, 2010] [287] Sending data to thread
[9:9:5, Fri Jul 23, 2010] [0] 3 - Creating a new obj
[9:9:5, Fri Jul 23, 2010] [0] Memory usage at end thread 4: 253.93152MB (obj cache: 136)
[9:9:5, Fri Jul 23, 2010] [0] Memory usage at idle thread 4: 253.93152MB (obj cache: 136)
[9:9:6, Fri Jul 23, 2010] [290] Sending data to thread
[9:9:6, Fri Jul 23, 2010] [0] 2 - Creating a new obj
[9:9:7, Fri Jul 23, 2010] [0] Memory usage at end thread 3: 253.804544MB (obj cache: 136)
[9:9:7, Fri Jul 23, 2010] [0] Memory usage at idle thread 3: 253.804544MB (obj cache: 136)
[9:9:7, Fri Jul 23, 2010] [0] Memory usage at end thread 1: 253.804544MB (obj cache: 136)
[9:9:7, Fri Jul 23, 2010] [0] Memory usage at idle thread 1: 253.804544MB (obj cache: 136)
[9:9:9, Fri Jul 23, 2010] [0] 4 - Creating a new obj
[9:9:9, Fri Jul 23, 2010] [301] Sending data to thread
[9:9:9, Fri Jul 23, 2010] [0] 3 - Creating a new obj
[9:9:9, Fri Jul 23, 2010] [302] Sending data to thread
[9:9:9, Fri Jul 23, 2010] [0] 1 - Creating a new obj
[9:9:10, Fri Jul 23, 2010] [0] 3 - Creating a new obj
[9:9:11, Fri Jul 23, 2010] [0] 3 - Creating a new obj
[9:9:11, Fri Jul 23, 2010] [0] Memory usage at end thread 4: 253.93152MB (obj cache: 136)
[9:9:11, Fri Jul 23, 2010] [0] Memory usage at idle thread 4: 253.93152MB (obj cache: 136)
[9:9:12, Fri Jul 23, 2010] [308] Sending data to thread
[9:9:12, Fri Jul 23, 2010] [0] 4 - Creating a new obj
[9:9:13, Fri Jul 23, 2010] [0] Memory usage at end thread 1: 253.804544MB (obj cache: 136)
[9:9:13, Fri Jul 23, 2010] [0] Memory usage at idle thread 1: 253.804544MB (obj cache: 136)
[9:9:14, Fri Jul 23, 2010] [0] Memory usage at end thread 4: 253.804544MB (obj cache: 136)
[9:9:14, Fri Jul 23, 2010] [0] Memory usage at idle thread 4: 253.804544MB (obj cache: 136)
[9:9:14, Fri Jul 23, 2010] [0] Memory usage at end thread 3: 253.93152MB (obj cache: 136)
[9:9:14, Fri Jul 23, 2010] [0] Memory usage at idle thread 3: 253.93152MB (obj cache: 136)
[9:9:15, Fri Jul 23, 2010] [313] Sending data to thread
[9:9:15, Fri Jul 23, 2010] [0] 1 - Creating a new obj
[9:9:16, Fri Jul 23, 2010] [0] Memory usage at end thread 2: 214.482944MB (obj cache: 136)
[9:9:16, Fri Jul 23, 2010] [0] Memory usage at idle thread 2: 214.482944MB (obj cache: 136)
[9:9:16, Fri Jul 23, 2010] [315] Sending data to thread
[9:9:16, Fri Jul 23, 2010] [0] 4 - Creating a new obj
[9:9:17, Fri Jul 23, 2010] [0] Memory usage at end thread 1: 214.355968MB (obj cache: 136)
[9:9:17, Fri Jul 23, 2010] [0] Memory usage at idle thread 1: 214.355968MB (obj cache: 136)
[9:9:18, Fri Jul 23, 2010] [316] Sending data to thread
[9:9:18, Fri Jul 23, 2010] [0] 3 - Creating a new obj
[9:9:18, Fri Jul 23, 2010] [317] Sending data to thread
[9:9:18, Fri Jul 23, 2010] [0] 2 - Creating a new obj
[9:9:18, Fri Jul 23, 2010] [318] Sending data to thread
[9:9:18, Fri Jul 23, 2010] [0] 1 - Creating a new obj
[9:9:19, Fri Jul 23, 2010] [0] Memory usage at end thread 4: 214.355968MB (obj cache: 136)
[9:9:19, Fri Jul 23, 2010] [0] Memory usage at idle thread 4: 214.355968MB (obj cache: 136)
[9:9:19, Fri Jul 23, 2010] [0] Memory usage at end thread 1: 214.355968MB (obj cache: 136)
[9:9:19, Fri Jul 23, 2010] [0] Memory usage at idle thread 1: 214.355968MB (obj cache: 136)
[9:9:20, Fri Jul 23, 2010] [0] Memory usage at end thread 3: 214.482944MB (obj cache: 136)
[9:9:20, Fri Jul 23, 2010] [0] Memory usage at idle thread 3: 214.482944MB (obj cache: 136)
[9:9:20, Fri Jul 23, 2010] [0] Memory usage at end thread 2: 214.482944MB (obj cache: 136)
[9:9:20, Fri Jul 23, 2010] [0] Memory usage at idle thread 2: 214.482944MB (obj cache: 136)

UPDATE (8/12/2010): Just ran it for a day with a new compiled version of perl 5.12 with threads and system malloc. Strangely I get the same behavior. Loose chunks of MB's at a time, slowly. Might try Valgrind to see why I'm loosing it. While I was playing around with something else though I thought of something else. My script creates and destroys (allegedly) many many SSL sockets. Is it possible that a widely used module such as IO::Socket::SSL leaks a little bit? Or maybe OpenSSL? (Using v0.9.8o). Going to try synchronizing access to the SSL module to see if it has any effect, might have problems with threads accessing it.

UPDATE: Tried loading the modules separately within each thread, faster memory usage. Tried locking the areas using the socket functions so only one thread at a time used them, still lost memory the same as before. Increased the number of worker threads from 4 to 10 with the exact same amount of work. Memory didn't last 30min. Leads me to believe it is either a Perl problem internally with its thread implementation, or a stack problem (no pun intended). I tried changing the stack size using the built in thread methods but same result. Going to look for another way. Maybe a lower level way. Increasing the number of threads makes the memory go faster...appears to be something with the threads' stack implementation or the size of the stack

UPDATE (9/15/2010): Found this interesting tidbit in the IO::Socket::SSL doc...

This is due to the fact that a circular reference is required to make IO::Socket::SSL sockets act simultaneously like objects and glob references.

"Circular reference" huh? Another possible explanation is that these sockets are all sticking around for a while even though I explicitly undef'd them. Going to look into Weaken to see if that does anything with the sockets. Will let you know if I find anything interesting.

SOLVED (9/16/2010): See my answer that I posted containing the solution

+2  A: 

There is not enough information here to solve your problem. But for general tips on managing your memory footprint see perlfaq3: How can I make my Perl program take less memory?

mobrule
+14  A: 

Your Perl is 4 and half years old. Upgrade to 5.12. Just search the 5.12 build notes and look at the shit ton of major threads improvements it has that might just magically fix your nebulous problem:

  • 5.12:: @_ and $_ no longer leak under threads (RT #34342 and #41138, also #70602, #70974)'
  • 5.10:: Under ithreads, the regex in PL_reg_curpm is now reference counted. This eliminates a lot of hackish workarounds to cope with it not being reference counted.
  • 5.9:: threads: Several fixes, for example for join() problems and memory leaks. In some platforms (like Linux) that use glibc the minimum memory footprint of one ithread has been reduced by several hundred kilobytes.
  • 5.9:: threads::shared Many memory leaks have been fixed.

I mean the list just goes on when you speak of four years of thead development and the wide range of stuff that can cause this problem, check out the modern changelog of threads::shared

I commented on your post, this would be my next set of suggestions: If you're not using glibc and you're using perl malloc (default), you will never release memory to the OS. The process size will represent the maximum size perl has every taken up. Try rebuilding with glibc malloc (requires recompilation) see if that provides for a different memory profile. Other than that, it will be time to show code.

Evan Carroll
Why was this downvoted? With a question like that, this answer might just be helpful.
Konerak
That is helpful. I downloaded 5.10 on to the server but haven't built it yet. I think I may just get 5.12 and build that one instead now. Still wondering if it could be a thread default stack size issue?
casey
Been running my script now for about an hour on v5.12.1. It was a big step in the right direction thanks. Posted my results onto question
casey
+1  A: 

Try upgrading threads.pm and threads::shared. Upgrading to perl 5.12.1 is also a good idea.

Alexandr Ciornii
yea, I upgraded those modules first. Didn't help much. Just rebuilt perl 5.12.1 and installed all the modules again. Will wait and see what the memory usage looks like in a few hours
casey
+1  A: 

I ran into the same problem with 5.10 and despite numerous protestations that newer Perl didn't leak memory when threads were used, Perl leaked memory when threads were used.

My solution was to use Thread::Pool::Simple to create a thread pool and use that instead of creating new threads. In my case, I wasn't expecting to have a lot of simultaneous thread, nor for them to last for a long while (maybe 30 seconds at most). I don't know if this would be an option for you, but if you really are just having your threads handle simple tasks, it might be.

Phil
Problem with that is I am having ten simultaneous threads always running. I'm not creating or destroying any after the initial 10. So I have no need for a pool manager object like Thread::Pool. But, if what you're saying is that switching to Thread::Pool saved you from leaking memory, then I'll give it a go. My second hesitation is that I must create all my threads at the start and not sure if this allows me to. Maybe you have that kind of info. The memory of the parent process becomes too large and I want to keep the threads as workers only.
casey
My situation had a lot of threads coming and going, rather than lingering, so what worked for me might not work for you. The problem that I had noticed that that creating threads, even if they did no work and exited immediately, would leak a little memory. After a while it starts to add up to real usage.I'm not too familiar with the implementation of Thread::Pool::Simple but it does have 'min' and 'max' options which suggests to me that it creates 'min' threads when you start the pool, but they don't do any work until you call the pool's 'add' method.
Phil
+2  A: 

Finally nailed the leak. First, I'd like to show you the improvement. You shouldn't look at the actual numbers because the user base has increased since the first graph was taken, just look at the difference in slope.

The before memory usage graph: http://i32.tinypic.com/311nc08.jpg
The after memory usage graph: http://i51.tinypic.com/29goill.jpg

For many months I was stuck restarting the server every couple days, but over the last 14 hours, there has been no increase in memory usage.

Every tutorial, example, presentation, and book that I used to help develop the server, all omitted one very very important fact regarding IO::Socket::SSL. And everyone using that module within a threaded app better listen up. Nobody ever emphasized one of the last lines in IO::Socket::SSL's documentation, which caused me to very foolishly assume that any socket I create, like almost any other data structure, will be freed once it goes out of scope (And yes, I do know the exceptions to that rule). I thought I would do everyone a favor and call out the line I am referring to.

...IO::Socket::SSL sockets will stay open until the program ends or you explicitly close them. This is due to the fact that a circular reference is required to make IO::Socket::SSL sockets act simultaneously like objects and glob references.
http://search.cpan.org/dist/IO-Socket-SSL/SSL.pm#LIMITATIONS

I was never made aware of the fact that these sockets had a circular reference in them, and if I didn't know about them, then every one's blogs and books I read don't know either (hence the call-out).

So as you can imagine, this had a very simple fix. Within my thread's work loop where a socket gets created every iteration, I simply placed a eval { close $socket; };undef $socket; at the bottom to ensure it gets closed before processing the next client. I started up my server and waited and watched as the memory usage was solid as you can see in my second graph. So after two months of troubleshooting this issue on and off, I finally came to a solution. I hope this provides some insight to any other amateurs programming with sockets. Thanks to everyone who provided answers/comments/suggestions, every bit helped, and it helped to have a place to bounce ideas off of.

casey
Congrats! Sounds to me like the dev of `IO::Socket::SSL` should `sub CLONE { warn 'read the docs LIMITATION section' }` and require you to opt-out to turn it off. I can't see any reason why a circular ref is ever required. I'd bet this was written by a C hacker gone perl, rather than a perl wiz. File a bug report and push to get it fixed or more feedback added.
Evan Carroll
I would like to add, this sounds misquoted, it says if you have `Scalar::Util` this won't happen. And, you have `Scalar::Util` if you're using `5.8.8` -- and especially if you upgraded. Maybe you should bitch out the jerk bags at Red Hat or whatever distro you use that decided to ship without core modules... Either, that or per your numbers the pod is wrong. I could still see the module being more verbose though.
Evan Carroll
@Evan I installed the standard distro from source, upgraded to 5.12.1. It does indeed come with Scalar::Util, but for some reason it was not helping. I tried upgrading Scalar::Util to the latest version and also installed (as the doc says will also help) WeakenRef. Still had the same problem. Very strange, but finally just manually closing them all before going out of scope did the trick. That's why I left out that bit about those modules, because they didn't seem to help me, but yes, they're supposed to take care of it for you.
casey
Then you didn't just stumble across a small bug, you stumbled across a huge bug. You should supply a test cast and get in contact with the author. You've done a lot of work, I'd hate to see someone else have to do it again.
Evan Carroll