views:

1209

answers:

7

I'm writing a Python backup script and I need to find the oldest file in a directory (and its sub-directories). I also need to filter it down to *.avi files only.

The script will always be running on a Linux machine. Is there some way to do it in Python or would running some shell commands be better?

At the moment I'm running df to get the free space on a particular partition, and if there is less than 5 gigabytes free, I want to start deleting the oldest *.avi files until that condition is met.

+1  A: 

Check out the linux command find.

Alternatively, this post pipes together ls and tail to do it. That could be done in a loop while there isn't enough free space.

Michael Haren
+9  A: 

To do it in Python, you can use os.walk(path) to iterate recursively over the files, and the st_size and st_mtime attributes of os.stat(filename) to get the file sizes and modification times.

dF
+6  A: 

You can use stat and fnmatch modules together to find the files

ST_MTIME refere to the last modification time. You can choose another value if you want

import os, stat, fnmatch
file_list = []
for filename in os.listdir('.'):
    if fnmatch.fnmatch(filename, '*.avi'):
        file_list.append((os.stat(filename)[stat.ST_MTIME], filename))

Then you can order the list by time and delete according to it.

file_list.sort(key=lambda a: a[0])
Nadia Alramli
A: 

The os module provides the functions that you need to get directory listings and file info in Python. I've found os.walk to be especially useful for walking directories recursively, and os.stat will give you detailed info (including modification time) on each entry.

You may be able to do this easier with a simple shell command. Whether that works better for you or not depends on what you want to do with the results.

Parappa
+7  A: 

I think the easiest way to do this would be to use find along with ls -t (sort files by time).

something along these lines should do the trick (deletes oldest avi file under specified directory)

find / -name "*.avi" | xargs ls -t | tail -n 1 | xargs rm

step by step....

find / -name "*.avi" - find all avi files recursively starting at the root directory

xargs ls -t - sort all files found by modification time, from newest to oldest.

tail -n 1 - grab the last file in the list (oldest)

xargs rm - and remove it

John T
He mentions running this in a loop. As `find` tends to be an expensive operation, it's probably a better idea to keep the results of `xargs ls` around (perhaps in an array variable) and pull filenames from it one at a time.
Ben Blank
Perhaps replace find with locate and grep?
John Fouhy
+5  A: 

Hm. Nadia's answer is closer to what you meant to ask, however, for finding the (single) oldest file in a tree, try this:

import os
def oldest_file_in_tree(rootfolder, extension=".avi"):
    return min(
        (os.path.join(dirname, filename)
        for dirname, dirnames, filenames in os.walk(rootfolder)
        for filename in filenames
        if filename.endswith(extension)),
        key=lambda fn: os.stat(fn).st_mtime)

With a little modification, you can get the n oldest files (similar to Nadia's answer):

import os, heapq
def oldest_files_in_tree(rootfolder, count=1, extension=".avi"):
    return heapq.nsmallest(count,
        (os.path.join(dirname, filename)
        for dirname, dirnames, filenames in os.walk(rootfolder)
        for filename in filenames
        if filename.endswith(extension)),
        key=lambda fn: os.stat(fn).st_mtime)

Note that using the .endswith method allows calls as:

oldest_files_in_tree("/home/user", 20, (".avi", ".mov"))

to select more than one extension.

Finally, should you want the complete list of files, ordered by modification time, in order to delete as many as required to free space, here's some code:

import os
def files_to_delete(rootfolder, extension=".avi"):
    return sorted(
        (os.path.join(dirname, filename)
         for dirname, dirnames, filenames in os.walk(rootfolder)
         for filename in filenames
         if filename.endswith(extension)),
        key=lambda fn: os.stat(fn).st_mtime),
        reverse=True)

and note that the reverse=True brings the oldest files at the end of the list, so that for the next file to delete, you just do a file_list.pop().

By the way, for a complete solution to your issue, since you are running on Linux, where the os.statvfs is available, you can do:

import os
def free_space_up_to(free_bytes_required, rootfolder, extension=".avi"):
    file_list= files_to_delete(rootfolder, extension)
    while file_list:
        statv= os.statvfs(rootfolder)
        if statv.f_bfree*statv.f_bsize >= free_bytes_required:
            break
        os.remove(file_list.pop())

statvfs.f_bfree are the device free blocks and statvfs.f_bsize is the block size. We take the rootfolder statvfs, so mind any symbolic links pointing to other devices, where we could delete many files without actually freeing up space in this device.

ΤΖΩΤΖΙΟΥ
+1  A: 

Here's another Python formulation, which a bit old-school compared to some others, but is easy to modify, and handles the case of no matching files without raising an exception.

import os

def find_oldest_file(dirname="..", extension=".avi"):
    oldest_file, oldest_time = None, None
    for dirpath, dirs, files in os.walk(dirname):
        for filename in files:
            file_path = os.path.join(dirpath, filename)
            file_time = os.stat(file_path).st_mtime
                if file_path.endswith(extension) and (file_time<oldest_time or oldest_time is None):
                oldest_file, oldest_time = file_path, file_time
    return oldest_file, oldest_time

print find_oldest_file()
tom10