tags:

views:

297

answers:

4

I am trying to figure out a regular expression which matches any string which doesn't start with mpeg. A generalization of this is matching any string which doesn't start with a given regular expression.

I tried something like as follows:

[^m][^p][^e][^g].*

The problem with this is that it requires at least 4 characters to be present in the string. I was not able to figure out a good way to handle this and a generalized way to handle this in a general purpose manner.

I will be using this in Python.

Thanx in advance.

+2  A: 

Try a look-ahead assertion:

(?!mpeg)^.*

Or if you want to use negated classes only:

^(.{0,3}$|[^m]|m([^p]|p([^e]|e([^g])))).*$
Gumbo
your "negated class" regex won't work. check your syntax.
J-16 SDiZ
@J-16 SDiZ: Why do you think that?
Gumbo
Probably because he thinks that you're trying to "not-match" mpeg before the start of the string. Even though it's perfectly legal since ^ is a zero-width anchor - he's right though inasmuch as it look confusing.
Tim Pietzcker
This won't match "mpe" since as you've written it, 'mpe' must have a letter following it which isn't a 'g', and you don't allow the end-of-string possibility.
Andrew Dalke
@dalke: the `.{0,3}` portion would match "mpe".
Amber
@Tim Pietzcker: That expression is called look-ahead because it looks ahead without stepping forward.
Gumbo
+8  A: 
^(?!mpeg).*

This uses a negative lookahead to only match a string where the beginning doesn't match mpeg. Essentially, it requires that "the position at the beginning of the string cannot be a position where if we started matching the regex mpeg, we could successfully match" - thus matching anything which doesn't start with mpeg, and not matching anything that does.

However, I'd be curious about the context in which you're using this - there might be other options aside from regex which would be either more efficient or more readable, such as...

if inputstring[:4] != "mpeg":
Amber
+1 for both answering the question, and providing a (probably) better alternative.
Edan Maor
The regex is being entered by a user through a web interface. So I am not writing the regex myself in the python program. The regex is sort of a filter setting for a watch folder from which my software picks up files. the user uses the user interface to fill in the regex. My python code takes this regex as the filtering criteria and picks up appropriate files from the watch folder. Thanx a lot about the answer.
Shailesh Kumar
+5  A: 

don't lose your mind with regex.

if len(mystring) >=4 and mystring[:4]=="mpeg":
    print "do something"

or use startswith() with "not" keyword

if len(mystring)>=4 and not mystring.startswith("mpeg")
ghostdog74
Note that you don't actually need the `len()` check - you can slice strings beyond their boundaries, you'll just get fewer characters back.
Amber
yes, i know that. just that maybe i misread OP's requirement. He said "it requires at least 4 characters to be present in the string". The keyword is "in the string". It may be a long string and he may have that requirement as well. Anyway, its up to OP now to get it done right.
ghostdog74
I think that bit was saying that his original attempt at a regex required 4 characters in the string, when he actually wanted to match anything not beginning with "mpeg", even if it was less than 4 characters.
Amber
Please see my comment in the post above, The regex are provided by the user through a UI and used internally by python code as it is.
Shailesh Kumar
well, i think you should indicate that when you post the question.
ghostdog74
A: 

Your regexp wouldn't match "npeg", I think you would need come up with ^($|[^m]|m($|[^p]|p($|[^e]|e($|[^g])))), which is quite horrible. Another alternative would be ^(.{0,3}$|[^m]|.[^p]|..[^e]|...[^g]) which is only slightly better.

So I think you should really use a look-ahead assertion as suggested by Dav and Gumbo :-)

Jérémie Koenig
Your alternative is not an alternative since it’s not correct. It wouldn’t match *npeg*.
Gumbo
Did you try? re.match(r"^(.{0,3}$|[^m]|.[^p]|..[^e]|...[^g])", "npeg") returns a Match object. It works because [^m] passes.
Andrew Dalke