views:

136

answers:

4

hi, well this is my problem, i'm working in a script (python) to find some files, i compare names of files against a regular expression pattern, now, i have to find files ended with a "~" (tilde), so i builded the next regex:

if re.match("~$", string_test):
    print "ok!"

well, python seems to not recognize the regex, i dont know why, i tried the same regex in other languages and it works perfectly, any idea??

PD: i read in a web that i have to insert

# -*- coding: utf-8 -*-

but doesn't help :( .

Thanks a lot, meanwhile i'm going to keep reading to see if a find something.

+10  A: 

re.match() is only successful if the regular expression matches at the beginning of the input string. To search for any substring, use re.search() instead:

if re.search("~$", string_test):
    print "ok!"
sth
-1 He wants to match strings that END in '~'. search/match is irrelevant.
John Machin
@John Machin: The `$` makes sure it matches only at the end (assuming there are no newlines in the file names). The difference to other languages mentioned in the question is surely just due to match/search.
sth
@sth: sorry I misread your message; please do a minimal edit to your answer so that I can upvote it.
John Machin
@John: No problem
sth
@sth: upvoted ...
John Machin
+10  A: 

Your regex will only match strings "~" and (believe it or not) "~\n".

You need re.match(r".*~$", whatever) ... that means zero or more of (anything except a newline) followed by a tilde followed by (end-of-string or a newline preceding the end of string).

In the unlikely event that a filename can include a newline, use the re.DOTALL flag and use \Z instead of $.

"worked" in other languages: you must have used a search function.

r at the beginning of a string constant means raw escapes e.g. '\n' is a newline but r'\n' is two characters, a backslash followed by n -- which can also be represented by '\n'. Raw escapes save a lot of \\ in regexes, one should use r"regex" automatically

BTW: in this case avoid the regex confusion ... use whatever.endswith('~')

John Machin
thanks it works, one more question, what is the meaning for the initial r that you put in re.match(r".*~$", whatever) i mean the r inr".*~$" . thanks for you great help!
fkn_man
@fkn_man, read about string prefixes here: http://docs.python.org/release/2.5.2/ref/strings.html
Nick D
+1 for the more simplistic `endswith()`.
Johnsyweb
+1 for the `.endswith()` from me too
ΤΖΩΤΖΙΟΥ
+8  A: 

For finding files, use glob instead,

import os
import glob

path = '/path/to/files'
os.chdir(path)
files = glob.glob('./*~')

print files
Jesse Dhillon
+1  A: 

The correct regex and the glob solution have already been posted. Another option is to use the fnmatch module:

import fnmatch
if fnmatch.fnmatch(string_test, "*~"):
    print "ok!"

This is a tiny bit easier than using a regex. Note that all methods posted here are essentially equivalent: fnmatch is implemented using regular expressions, and glob in turn uses fnmatch.

Note that only in 2009 a patch was added to fnmatch (after six years!) that added support for file names with newlines.

Philipp
The correct `str.endswith('~')` solution has also been posted. This is *much* easier to use than `fnmatch`.
John Machin