views:

264

answers:

3

Suppose I have a program A. I run it, and performs some operation starting from a file foo.txt. Now A terminates.

New run of A. It checks if the file foo.txt has changed. If the file has changed, A runs its operation again, otherwise, it quits.

Does a library function/external library for this exists ?

Of course it can be implemented with an md5 + a file/db containing the md5. I want to prevent reinventing the wheel.

A: 

Cant we just check the last modified date . i.e after the first operation we store the last modified date in the db , and then before running again we compare the last modified date of the file foo.txt with the value stored in our db .. if they differ ,we perform the operation again ?

NM
That's what make does, and I frankly prefer not to.
Stefano Borini
What is the problem using modification time?
Lars Wirzenius
suppose the file is downloaded every hour from a remote website, or generated from any source that actually recreates the file and it is beyond my control. The modification time will change, but if the actual content is the same, there's no point in re-executing the task.
Stefano Borini
Of course you can workaround it (for example, write to a temporary file, and then overwrite only if changed, after md5 comparison of the two). I agree there are other solutions.
Stefano Borini
+2  A: 

This is one of those things that is both so trivial to implement and so app-specific that there really wouldn't be any point in a library, and any library intended for this purpose would grow so unwieldy trying to adapt to the many variations required, learning and using the library would take as much time as implementing it yourself.

Nicholas Knight
+2  A: 

It's unlikely that someone made a library for something so simple. Solution in 13 lines:

import pickle
import md5
try:
    l = pickle.load(open("db"))
except IOError:
    l = []
db = dict(l)
path = "/etc/hosts"
checksum = md5.md5(open(path).read())
if db.get(path, None) != checksum:
    print "file changed"
    db[path] = checksum
pickle.dump(db.items(), open("db", "w")
Sufian
It would probably be worthwhile first checking st_mtime and st_size: if they've changed, you don't need to checksum, saving time.
Lars Wirzenius
A number of things could be done to make this as configurable/one-size-fits-all of a solution as you'd like. My point is simply that it's an easy problem, and it will take longer to look for and configure a general case library than to roll your own.
Sufian
There are many simple functionalities in the standard library that are solved with a few lines of code, but there they are :)Thanks for the code!
Stefano Borini