In my app I need to watch a directory for new files. The amount of traffic is very large and there are going to be a minimum of hundreds of new files per second appearing. Currently I'm using a busy loop with this kind of idea:
while True:
time.sleep(0.2)
if len(os.listdir('.')) > 0:
# do stuff
After running profiling I'm seeing a lot of time spent in the sleep, and I'm wondering if I should change this to use polling instead.
I'm trying to use one of the available classes in select
to poll my directory, but I'm not sure if it actually works, or if I'm just doing it wrong.
I get an fd for my directory with:
fd = os.open('.', os.O_DIRECT)
I've then tried several methods to see when the directory changes. As an example, one of the things I tried was:
poll = select.poll()
poll.register(fd, select.POLLIN)
poll.poll() # returns (fd, 1) meaning 'ready to read'
os.read(fd, 4096) # prints largely gibberish but i can see that i'm pulling the files/folders contained in the directory at least
poll.poll() # returns (fd, 1) again
os.read(fd, 4096) # empty string - no more data
Why is poll() acting like there is more information to read? I assumed that it would only do that if something had changed in the directory.
Is what I'm trying to do here even possible?
If not, is there any other better alternative to while True: look for changes
?