views:

121

answers:

4

I run 2 python scripts from crontab at the same time each 30 min, e.g.

00,30 6-19 * * 0-5 /.../x.py site1
*/3 6-19 * * 0-5 /.../y.py site2

At the beginning the both scripts make import of a module that prints some data to a log, e.g.

name = os.path.basename(sys.argv[0])
site = sys.argv[1]
pid = os.getpid()

Occasionally (!) the second script y prints to the log input arguments of script x: name = x and site = site1 printed PID of the processes is not the same. Why this is happening and how can i avoid this?

P.S. I suspect the problem is related to the logger I use. Can a script use a logger created in another script? In this case it will print on each line data related to the first script. Each script executes the same code:

log = logging.getLogger('MyLog')
    log.setLevel(logging.INFO)
    dh = RotatingSqliteHandler(os.path.join(progDirs['log'],'sqlitelog'),processMeta, 5000000)
    log.addHandler(dh)

The logger handler is defined as follows:

class RotatingSqliteHandler(logging.Handler):
   def __init__(self, filename, progData, maxBytes=0):
       logging.Handler.__init__(self)
       self.user = progData['user']
       self.host = progData['host']
       self.progName = progData['name']
       self.site = progData['site']
       self.pid = random.getrandbits(50)
    .....

in the log I see that process ID which logger generates in the last line is the same for the both scripts.

I'll try to use logger name unique to each script run instead of 'MyLog'. Although it is strange that a logger instance can be got from another process.

+2  A: 

When two scripts "run at the same time", the lines that they print can be mixed, depending on how the operating system allocates priority to the processes.

You can thus obtain, in your logs, something like:

x.py: /tmp/x.py
…
… # Other processes logging information
…
y.py: /tmp/y.py
x.py: site1  # Not printed by y!!
x.py: PID = 123
…
… # Other processes logging information
…
y.py: site2
y.py: PID = 124

Do you still observe the problem if you prefix each line by each program base name?

EOL
I suspect the problem is related to the logger. See update.
tuzik
The problem is not in line order. At 18:30, i see e.g. "x.py started" twice, while "y.py started" is missing.
tuzik
@tuzik: I see… Puzzling…
EOL
A: 

Could this be related in any way to the following point?

getLogger() returns a reference to a logger instance with the specified name if it is provided, or root if not. The names are period-separated hierarchical structures. Multiple calls to getLogger() with the same name will return a reference to the same logger object.

Could both of your scripts be "connected" enough to each other that this plays a role? For instance, if y.py import x.py, then you get the same logger in both x.py and y.py when you call logging.getLogger('myLog') in each of them.

EOL
no, both scripts are "main" called from the same crontab and importing the same module with line "log = logging.getLogger('MyLog')". The problem occurred once a week or two, while the scripts run simultaneously each 30 min. I'll see how it works when the logger name is combined from script name, site and PID.
tuzik
A: 

This question puzzles me: here is yet another idea! The random generator can be seeded with "the current system time" (if not source of random numbers exists on the computer). In Python 2.7, this is done through a call to time.time(). The point is that "not all systems provide time with a better precision than 1 second." More generally, would it be possible that sometimes your x.py and y.py run sufficiently close to each other that time.time() is the same for both processes, so that random.getrandbits(50) yields the same for both of them? This would be compatible with the problem only appearing exceptionally, as you observed.

What is the "resolution" of time.time() on your machine (smallest interval between different times)? maybe it's large enough for the two random generators to exceptionally be seeded in the same way.

EOL
A logger handler is initialized with parent process metadata (name, site, host). While random numbers may be the same, where does it get parameters of another process? No, it is logger instance somehow crossing process boundary.
tuzik
you are right. getrandbits is not enough random to be a primary key. I have 2 log tables - one for messages, another for processes. When I tried to insert a row to the processes table, where this ID already existed, it was silently ignored. I done this, since log DB can be splitted to multiple sqlite files of limited size. Thank you.
tuzik
+1  A: 

It's not possible for one Python process to access an object from another Python process, unless specific provision is made for this using e.g. the multiprocessing module. So I don't believe that's what's happening, no matter how much it looks like it on the surface.

To confirm this, use an alterative handler (such as a FileHandler or RotatingFileHandler) to see if the problem still occurs. If it doesn't, then you should examine the RotatingSqliteHandler logic.

If it does, and if you can come up with a small standalone script which demonstrates the problem repeatably, please post an issue to bugs.python.org and I'll certainly take a look. (I maintain the Python logging package.)

Vinay Sajip
+1: That's pretty much my feeling too…
EOL