tags:

views:

142

answers:

4

Basically, i have a list of 30,000 URLs. The script goes through the URLs and downloads them (with a 3 second delay in between). And then it stores the HTML in a database.

And it loops and loops...

Why does it randomly get "Killed."? I didn't touch anything.

Edit: this happens on 3 of my linux machines. The machines are on a Rackspace cloud with 256 MB memory. Nothing else is running.

+1  A: 

Is it possible that it's hitting an uncaught exception? Are you running this from a shell, or is it being run from cron or in some other automated way? If it's automated, the output may not be displayed anywhere.

Graeme Perrow
+8  A: 

Looks like you might be running out of memory -- might easily happen on a long-running program if you have a "leak" (e.g., due to accumulating circular references). Does Rackspace offer any easily usable tools to keep track of a process's memory, so you can confirm if this is the case? Otherwise, this kind of thing is not hard to monitor with normal Linux tools from outside the process. Once you have determined that "out of memory" is the likely cause of death, Python-specific tools such as pympler can help you track exactly where the problem is coming from (and thus determine how to avoid those references -- be it by changing them to weak references, or other simpler approaches -- or otherwise remove the leaks).

Alex Martelli
I think it is running out of memory, right? Mem: 262364k total, 258264k used, 4100k free, 884k buffersSwap: 524280k total, 285204k used, 239076k free, 14568k cached
TIMEX
the SWAP keeps going up and up.
TIMEX
@alex, so definitely it looks like a "leak". Besides pympler, which I already suggested, try guppy -- http://guppy-pe.sourceforge.net/ -- they can help you pinpoint **where** all that memory's going (looking at your code, which you posted as another question, without knowing about all the third-party libraries you're using, is nowhere as helpful!).
Alex Martelli
+1  A: 

In cases like this, you should check the log files.

I use Debian and Ubuntu, so the main log file for me is: /var/log/syslog

If you use Red Hat, I think that log is: /var/log/messages

If something happens that is as exceptional as the kernel killing your process, there will be a log event explaining it.

I suspect you are being hit by the Out Of Memory Killer.

steveha
+1  A: 

Are you using some sort of queue manager or process manager of some sort ? I got apparently random killed messages when the batch queue manager I was using was sending SIGUSR2 when the time was up.

Otherwise I strongly favor the out of memory option.

Stefano Borini