tags:

views:

2138

answers:

6

I'm trying to get a regex that will match:

somefile_1.txt
somefile_2.txt
somefile_{anything}.txt

but not match:

somefile_16.txt

I tried

somefile_[^(16)].txt

with no luck (it includes even the "16" record)

+5  A: 

Some regex libraries allow lookahead:

somefile(?!16\.txt$).*?\.txt

Otherwise, you can still use multiple character classes:

somefile([^1].|1[^6]|.|.{3,})\.txt

or, to achieve maximum portability:

somefile([^1].|1[^6]|.|....*)\.txt

[^(16)] means: Match any character but braces, 1, and 6.

Edit: Corrected error pointed out by Piotr Lesnicki.
Edit2: Added greediness qualifier, thanks Martin Brown
Edit3: Corrected second and third regexp as pointed out by Mattias Andersson. Sorry for the previous, completely wrong ones.

phihag
You probably want a ? in there like this: somefile(?!16).*?\.txt
Martin Brown
@Martin Brown: Why? .*? is afaik invalid in most dialects. . means any character, * zero or more occurrences. What should the question mark do?
phihag
@phihag: .*? means to make the .* non-greedy. It's a special use of the question mark.
Ben Doom
@Ben Doom, @Martin Brown: Thanks for the clarification, added
phihag
Your "multiple character classes" ("somefile([^1][^6]|.|.{3,})\.txt") and "maximum portability" ("somefile([^1][^6]|.|....*)\.txt") regexes are very wrong. Try matching strings like "somefile19.txt", "somefile46.txt", and "somefile1654.txt".
Mattias Andersson
@Mattias Andersson: Thanks for pointing out. You're right, they were plain wrong.
phihag
+2  A: 
somefile_(?!16).*\.txt

(?!16) means: Assert that it is impossible to match the regex "16" starting at that position.

madgnome
This will break if {anything} includes a dot:somefile_19700101.archive.txtwill not match.
phihag
+3  A: 

The best solution has already been mentioned:

somefile_(?!16\.txt$).*\.txt

This works, and is greedy enough to take anything coming at it on the same line. If you know, however, that you want a valid file name, I'd suggest also limiting invalid characters:

somefile_(?!16)[^?%*:|"<>]*\.txt

If you're working with a regex engine that does not support lookahead, you'll have to consider how to make up that !16. You can split files into two groups, those that start with 1, and aren't followed by 6, and those that start with anything else:

somefile_(1[^6]|[^1]).*\.txt

If you want to allow somefile_16_stuff.txt but NOT somefile_16.txt, these regexes above are not enough. You'll need to set your limit differently:

somefile_(16.|1[^6]|[^1]).*\.txt

Combine this all, and you end up with two possibilities, one which blocks out the single instance (somefile_16.txt), and one which blocks out all families (somefile_16*.txt). I personally think you prefer the first one:

somefile_((16[^?%*:|"<>]|1[^6?%*:|"<>]|[^1?%*:|"<>])[^?%*:|"<>]*|1)\.txt
somefile_((1[^6?%*:|"<>]|[^1?%*:|"<>])[^?%*:|"<>]*|1)\.txt

In the version without removing special characters so it's easier to read:

somefile_((16.|1[^6]|[^1).*|1)\.txt
somefile_((1[^6]|[^1]).*|1)\.txt
Douglas Mayle
+2  A: 

To obey strictly to your specification and be picky, you should rather use:

^somefile_(?!16\.txt$).*\.txt$

so that somefile_1666.txt which is {anything} can be matched ;)

but sometimes it is just more readable to use...:

ls | grep -e 'somefile_.*\.txt' | grep -v -e 'somefile_16\.txt'
Piotr Lesnicki
+1  A: 

Sometimes it's just easier to use two regular expressions. First look for everything you want, then ignore everything you don't. I do this all the time on the command line where I pipe a regex that gets a superset into another regex that ignores stuff I don't want.

If the goal is to get the job done rather than find the perfect regex, consider that approach. It's often much easier to write and understand than a regex that makes use of exotic features.

Bryan Oakley
A: 

Without using lookahead

somefile_(|.|[^1].+|10|11|12|13|14|15|17|18|19|.{3,}).txt

Read it like: somefile_ followed by either:

  1. nothing.
  2. one character.
  3. any one character except 1 and followed by any other characters.
  4. three or more characters.
  5. either 10 .. 19 note that 16 has been left out.

and finally followed by .txt.

Pierre