views:

204

answers:

5

I'm looking for a solution to the following:

Given today's date, figure out what month was before. So 2 should return for today, since it is currently March, the third month of the year. 12 should return for January.

Then based on that, I need to be able to iterate through a directory and find all files that were created that month.

Bonus points would include finding the most current file created for the previous month.

+3  A: 

Simplest, where adate is an instance of datetime.date:

def previousmonth(adate):
    m = adate.month - 1
    return m if m else 12

There's no real way in most Unix filesystems to determine when a file was created, as they just don't keep that information around. Maybe you want the "latest inode change time" (could be creation, could be some other inode change):

import os, datetime
def cmonth(filename):
    ts = os.stat(filename).st_ctime
    return datetime.date.fromtimestamp(ts).month

Of course, this could mean that month in any year -- you sure, in both questions, you don't want the year as well as the month? That would be the .year attribute.

Anyway, sticking with month only, as per your question, for a single directory (which is the letter of your question), to get all files you can use os.listdir (for a tree rooted in the directory you'd use os.walk instead). Then to keep only those with latest-inode-change in a given month:

def fileswithcmonth(dirname, whatmonth):
    results = []
    for f in os.listdir(dirname):
        fullname = os.path.join(dirname, f)
        if whatmonth == cmonth(fullname):
            results.append(fullname)
    return results

You could code this as a list comprehension, but there's just too much code there for a listcomp to be elegant and concise.

To get the "latest" time, you can either repeat the os.stat call (slower but probably simpler), or change cmonth to return the timestamp as well. Taking the simple route:

def filetimestamp(fullname):
    return os.stat(fullname).st_ctime

Now, the "most recent file" given a list files of files' full names (i.e. inc. path) is

max(files, key=filetimestamp)

Of course there are many degrees of freedom in how you put this all together, depending on your exact specs -- given that the specs don't appear to be necessarily precise or complete I've chosen to show the building blocks that you can easily tweak and put together towards your exact needs, rather than a full-blown solution that would likely solve a problem somewhat different from your actual one;-).

Edit: since the OP clarified that they need both year and month, let's see what changes would be needed, using tuples ym for (year, month) in lieu of the bare month:

def previousym(adate):
    y = adate.year
    m = adate.month - 1
    return (y, m) if m else (y - 1, 12)

import os, datetime
def cym(filename):
    ts = os.stat(filename).st_ctime
    dt datetime.date.fromtimestamp(ts)
    return cym.year, cym.month

def fileswithcym(dirname, whatym):
    results = []
    for f in os.listdir(dirname):
        fullname = os.path.join(dirname, f)
        # if you need to avoid subdirs, uncomment the following line
        # if not os.path.isfile(fullname): continue
        if whatym == cym(fullname):
            results.append(fullname)
    return results

Nothing deep or difficult, as you can see (I also added comments to show how to skip subdirectories if you're worried about those). And btw, if what you actually need is to walk a subtree, rather than just a directory, that change, too, is pretty localized:

def fileswithcymintree(treeroot_dirname, whatym):
    results = []
    for dp, dirs, files in os.walk(treeroot_dirname):
        for f in files:
            fullname = os.path.join(dp, f)
            if whatym == cym(fullname):
                results.append(fullname)
    return results
Alex Martelli
Thanks for the excellent explanation Alex. You're right, I should take year into consideration and didn't make that clear.
randombits
@Alex Martelli Don't you have to check os.path.isfile(os.path.join(dirname, f)) with listdir :P ?From what i know, listdir returns file and folders.
systempuntoout
@systempuntoout, if you need to avoid subdirectories, yes.
Alex Martelli
@Alex Martelli It was in the OP request "find all files that were created".Fantastic explanation anyway :).
systempuntoout
A: 

http://docs.python.org/library/datetime.html

First of the previous month from a date

def first_of( today )
    yr_mn = today.year*12 + (today.month-1) - 1
    return datetime.date( year= yr_mn//12, month= yr_mn%12+1, day=1 )

You can then use this with os.walk to locate the files in question.

S.Lott
+1  A: 

There are several parts being asked about here...

T. Stone
Though note that stat() returns the `ctime` of the file, which is only the creation time in Windows, not Unix systems. I'm not sure if there's a way to actually get the creation time on Unix...
Daniel G
+2  A: 

It's fairly easy to find the previous month - see, for example, Alex Martelli's answer - but to find the most recently created file in that month is a bit harder:

from datetime import date
import os

def filesLastMonth(directory):
    """
    Given a directory, returns a tuple containing
    1. A list with all files made in the previous month (disregards year...)
    2. The file most recently created in the previous month
    """
    def fileCreationTime(filePath):
        return os.path.getctime(filePath)
    today = date.today()
    lastMonth = today.month-1 if today.month != 1 else 12
    # gets each item in the directory that was created last month
    createdLastMonth = [item for item in os.listdir(directory) if
                        date.fromtimestamp(os.path.getctime(item)).month == lastMonth]
    # and this is the most recent of the above
    mostRecentlyLastMonth = max(createdLastMonth, key=fileCreationTime)
    return (createdLastMonth, mostRecentlyLastMonth)

You can use os.path.getctime on Windows to get the creation time of a path, but this doesn't work on Unix - the creation time is not stored in that case (you'll just get the time of the last change).

Daniel G
@Daniel, creation time is not saved in most Unix filesystems. But you're making it look harder than it should be to find "the latest file" in a group, since the built-in `max` function can do that just fine -- you only need to factor your code correctly (see my edited answer for a suggestion of such a code factorization, for example).
Alex Martelli
That's true - unfortunately I was being lazy. *Spring cleaning!*
Daniel G
A: 

Date manipulation is made trivial with a library like Labix's python-dateutil.

You want to do something like this:

In [8]: from dateutil.relativedelta import relativedelta

In [9]: from datetime import date

In [10]: d = date(2010,2,12)

In [11]: print (d-relativedelta(months=1)).month
1

In [12]: print (date(2010,1,4)-relativedelta(months=1)).month
12
Victor Ng