views:

45

answers:

4

I am trying to match file names in the format filename-isodate.txt

>>> DATE_NAME_PATTERN = re.compile("((.*)(-[0-9]{8})?)\\.txt")
>>> DATE_NAME_PATTERN.match("myfile-20101019.txt").groups()
('myfile-20101019', 'myfile-20101019', None)

However I need to get the filename and -isodate parts in seperate groups.

Any suggestions and/or explainations would be much appreciated

+1  A: 

You need: DATE_NAME_PATTERN = re.compile("((.*?)(-[0-9]{8})?)\\.txt")

.* performs a gready match so the second part is never used.

FYI in my opiniomy you shouldn't use regular expression where normal string manipulation is enough ( simple split() will do ).

Piotr Duda
Thanks, change .* to non greed .*? and that works. Should have spotted that!
+1  A: 

Remove the outermost group and put the - between the groups:

>>> DATE_NAME_PATTERN = re.compile(r'(.*)-([0-9]{8})?\.txt')
>>> DATE_NAME_PATTERN.match("myfile-20101019.txt").groups()
('myfile', '20101019')
larsmans
+2  A: 

If you know the filename format will not change, you don't need re:

filename = 'myfile-20101019.txt'
basename, extension = filename.rsplit('.', 1)
firstpart, date = basename.rsplit('-', 1)


In : firstpart, date, extension
Out: ('myfile', '20101019', 'txt')

or just without extension:

firstpart, date = filename.rsplit('.', 1)[0].rsplit('-', 1)
# ['myfile', '20101019']

Works with more complicated filenames too:

filename = 'more.complicated-filename-20101004.txt'
firstpart, date = filename.rsplit('.', 1)[0].rsplit('-', 1)
# ['more.complicated-filename', '20101004']

Or, just to split the extension even more nicely:

import os

filename = 'more.complicated-filename-20101004.txt'
firstpart, date = os.path.splitext(filename)[0].rsplit('-', 1)
# ['more.complicated-filename', '20101004']
eumiro
In our situation using re fits better in our situation as we are using lots of regular expressions to match different file name formats. Thanks Anyway
A: 

Don't use regular expressions for this:

import os

basename, extension= os.path.splitext(filename)
namepart, _, isodate= basename.rpartition('-')

I'm suggesting rpartition since the isodate (as defined in your question) won't contain dashes.

ΤΖΩΤΖΙΟΥ