views:

48

answers:

1

I am writing a python program that seems to be leaking memory.

The program takes in a list of URL's and makes sure their status code is 200. If the status code is not 200, then the script will alert me via email. The script threaded so the URL's can be checked in parallel of each other.

I have set the program on a one of our server's schedule tasks that runs every 5 minuets. Since then the server's physical memory has been fully consumed. The server is running Windows Server 2008 and Python version 2.6.

Where is the memory leak??

The Following code calls the threaded class UrlChecker.py (Also included below):

    from ConfigParser import ConfigParser
    import re

    from UrlChecker import UrlCheckerThread
    from Logger import Logger
    from classes.EmailAlert import EmailAlert

    ... {More Code is here} ...

    urls = cfg.items('urls')

    defaulttimeout = int(cfg.get('timeout', 'default', 0))

    threadList = []

    for name, url in urls:
        m = re.search("\([0-9]*\)", name)          
        s = m.start() + 1
        e = m.end() - 1
        name = name[s:e]

        checker = UrlCheckerThread(url, name)
        threadList.append(checker)
        checker.start()

    for threads in threadList:
        threads.join()

    for x in threadList:
        status = x.status
        url = x.url
        name = x.name
        runtime = x.runtime

        """
        If there is an error, put information in a dict for furher
        processing. 
        """

        if(status != None and status != 200 or runtime >= defaulttimeout):
            self.logDict[name]= (name, url, status, runtime)

UrlChecker.py

import socket
from threading import Thread, Lock
from urllib2 import Request, urlopen
from ConfigParser import ConfigParser
from TimeoutController import TimeoutController
from classes.StopWatch import StopWatch

class UrlCheckerThread(Thread):
lock = Lock()
threadId = 0

def __init__(self, url, name):
    Thread.__init__(self)
    self.url = url
    self.name = name
    self.cfg = ConfigParser()
    self.cfg.read('c:\Websites\ServerManager\V100\webroot\Admin\SiteMonitor\config.cfg')
    self.thisId = UrlCheckerThread.threadId
    self.extendedTimeout = int(self.cfg.get('timeout', 'extended', 0))
    self.tc = TimeoutController()
    self.tc.setTimeout(self.extendedTimeout)
    UrlCheckerThread.threadId += 1

def run(self):
    """
    getHeader uses urlopen to check wether an website is online or not
    """
    self.sw = StopWatch()
    self.sw.start()
    self.checker = UrlChecker()
    UrlCheckerThread.lock.acquire()
    self.status = self.checker.getStatus(self.url)
    self.sw.stop()
    self.runtime = self.sw.time()
    """
    if(isinstance(self.status, socket.timeout)):
        self.tc.setTimeout(self.extendedTimeout)
        self.status = self.checker.getStatus(self.url)
        if(self.status == 200):
            self.status = 'short time out'
        self.tc.setTimeout(self.defaultTimeout)
    """
    UrlCheckerThread.lock.release()

class UrlChecker:

def getStatus(self, url):
    """
    getHeader uses urlopen to check wether an website is online or not
    """
    request = Request(url, None)
    try:
        urlReq = urlopen(request)

        """
        getcode() return the HTTP status header, which should be 200
        in most cases.
        """
        return urlReq.getcode()
    except IOError, e:
        if hasattr(e, 'reason'):
            """
            e.reason returns an IOError object, which cannot be just
            inserted in the database. The IOError object is basically
            a 2-Tuple with an errornumber and an errorstring.
            Since an errornumber is less readable then a string,
            we use e.reason.strerror to just return IOError's string
            """
            return e.reason.strerror
        elif hasattr(e, 'code'):
            """
            e.code is an int object, which is perfectly fine to insert in
            the database. So no further modification needed.
            """
            return e.code

Thank You!

A: 

You're trying to open config file for each thread, that takes some memory.
How much URL's are you checking?
What is ConfigParser implementation.
Are you sure that each thread joins?
Are batch programs finished before next scheduled run?

bua
You're trying to open config file for each thread, that takes some memory.Your right, but does that memory get released by automatic garbage collection?How much URL's are you checking?Im currently checking 19 URLs.What is ConfigParser implementation?Im using Python's built-in parser.from ConfigParser import ConfigParserAre you sure that each thread joins?No Im not... Do I have to end the thread after the process is complete?Are batch programs finished before next scheduled run? Yep, Windows schedule task program is configured to not run multiple at a time.
Mitchell Guimont