views:

306

answers:

4

I have a regex I need to match against a path like so: "C:\Documents and Settings\User\My Documents\ScanSnap\382893.pd~". I need a regex that matches all paths except those ending in '~' or '.dat'. The problem I am having is that I don't understand how to match and negate the exact string '.dat' and only at the end of the path. i.e. I don't want to match {d,a,t} elsewhere in the path.

I have built the regex, but need to not match .dat

[\w\s:\.\\]*[^~]$[^\.dat]

[\w\s:\.\\]* This matches all words, whitespace, the colon, periods, and backspaces. [^~]$[^\.dat]$ This causes matches ending in '~' to fail. It seems that I should be able to follow up with a negated match for '.dat', but the match fails in my regex tester.

I think my answer lies in grouping judging from what I've read, would someone point me in the right direction? I should add, I am using a file watching program that allows regex matching, I have only one line to specify the regex.

This entry seems similar: http://stackoverflow.com/questions/698596/regex-to-match-multiple-strings

+5  A: 

You want to use a negative look-ahead:

^((?!\.dat$)[\w\s:\.\\])*$

By the way, your character group ([\w\s:\.\\]) doesn't allow a tilde (~) in it. Did you intend to allow a tilde in the filename if it wasn't at the end? If so:

^((?!~$|\.dat$)[\w\s:\.\\~])*$
Jeremy Stein
No, I didn't realize that, but I do not with to include a tilde, I wish to exclude both the file suffix ".pd~" and ".dat" which are created as temp files.
Radix
Then you don't have to worry about the tilde at all. Since tilde can't occur in the filename at all, you don't have to explicitly check that the filename doesn't end in a tilde. You can use the first, simpler regex.
Jeremy Stein
Okay, I see what you mean now when you include the tilde in the character group. I don't expect a tilde to appear in any of the file strings we will be using, though I will include it just in case.THANK YOU so much! It took a couple tries before I understood how the program I am using works, or I would have replied sooner.Do I understand correctly, that either ~ or \.dat strings are matched against the '$' "match at the end" character? Thus the neg lookahead checks that neither are there before continuing. If so I appreciate the reference, it's better than what google was teaching me.
Radix
Both proposed solutions will also reject file names containing the character '~', not only ending with them. This was not the OP's intention, AFAIK. Personally I find my suggestion to be clearer (and correct!). :)
Bart Kiers
@Bart: People generally find their own solutions to be clearer. I tried to start with what he was using and fix it.
Jeremy Stein
@Jeremy: I wouldn't have said so if the difference was small (IMO, of course). But, looking at the response you posted under my suggestion, you seem to agree with me! :)
Bart Kiers
A: 

I believe you are looking for this:

[\w\s:\.\\]*([^~]|[^\.dat])$

which finds, like before, all word chars, white space, periods (.), back slashes. Then matches for either tilde (~) or '.dat' at the end of the string. You may also want to add a caret (^) at the very beginning if you know that the string should be at the beginning of a new line.

^[\w\s:\.\\]*([^~]|[^\.dat])$
sanscore
this is not what is being asked for, the [^...] looks anything single character that is not in the list
Dave
Thanks, that is what I was getting at, but I was wrong. That matches both '.dat' and '~' as correct. I don't understand why yet.
Radix
+3  A: 

The following regex:

^.*(?<!\.dat|~)$

matches any string that does NOT end with a '~' or with '.dat'.

^             # the start of the string
.*            # gobble up the entire string (without line terminators!)
(?<!\.dat|~)  # looking back, there should not be '.dat' or '~'
$             # the end of the string

In plain English: match a string only when looking behind from the end of the string, there is no sub-string '.dat' or '~'.

Edit: the reason why your attempt failed is because a negated character class, [^...] will just negate a single character. A character class always matches a single character. So when you do [^.dat], you're not negating the string ".dat" but you're matching a single character other than '.', 'd', 'a' or 't'.

Bart Kiers
Oh, you're right. I've learned more this way though. Can I add an arbitrary number of extensions to ignore in this negative look ahead grouping?
Radix
Yes, simply OR it. The regex `^.*(?<!\.dat|~|\.txt)$` would now also reject '.txt' files.
Bart Kiers
Great, that's what I meant to ask. That is, by using '|' (pipe, OR) it would work. Thank you.
Radix
Someone give this guy points.
Radix
;) no problem Radix.
Bart Kiers
Hey, that was clever to use a negative look-behind instead of a negative look-ahead. It's much clearer that way.
Jeremy Stein
+2  A: 
^((?!\.dat$)[\w\s:\.\\])*$

This is just a comment on an earlier answer suggestion:

. within a character class, [], is a literal . and does not need escaping.

^((?!\.dat$)[\w\s:.\\])*$

I'm sorry to post this as a new solution, but I apparently don't have enough credibility to simply comment on an answer yet.

genio
I don't either, and unfortunately I don't have enough credibility to give you more either. Thanks for making explicit what I had guessed from the other answers.
Radix