tags:

views:

144

answers:

3

Hi, I'm developing a logger daemon to squid to grab the logs on a mongodb database. But I'm experiencing too much cpu utilization. How can I optimize this code?


from sys import stdin

from pymongo import Connection

connection = Connection()
db = connection.squid
logs = db.logs
buffer = []
a = 'timestamp'
b = 'resp_time'
c = 'src_ip'
d = 'cache_status'
e = 'reply_size'
f = 'req_method'
g = 'req_url'
h = 'username'
i = 'dst_ip'
j = 'mime_type'
L = 'L'

while True:
    l = stdin.readline()
    if l[0] == L:
        l = l[1:].split()
        buffer.append({
            a: float(l[0]),
            b: int(l[1]),
            c: l[2],
            d: l[3],
            e: int(l[4]),
            f: l[5],
            g: l[6],
            h: l[7],
            i: l[8],
            j: l[9]
            }
        )
    if len(buffer) == 1000:
        logs.insert(buffer)
        buffer = []

    if not l:
        break

connection.disconnect()
A: 

I'd suspect it might actually be readline() causing cpu utilization. Try running the same code with the readline replaced with just looking at some constant buffer provided by you. And try running with the database inserts commented out. Establish which one of these is the culprit.

MK
+1  A: 

This might be a better question for a python profiler. There's a few builtin Python profiling modules such as cProfile; you can read more about it here.

saramah
A: 

The cpu usage is given by that active loop While True. How many lines / minute do you have? put the

if len(buffer) == 1000:    
    logs.insert(buffer)
    buffer = []

check after the buffer.append

I will tell you more after you tell me how many insertions you get so far

fabrizioM