tags:

views:

1564

answers:

8

Hello.

I have a python template engine that heavily uses regexp. It's uses concatenation like

re.compile( regexp1 + "|" + regexp2 + "*|" + regexp3 + "+" )

I can modify individual substrings (regexp1, regexp2 etc). Is it any small and light expression that match nothing so i can use it inside a template where i don't want any matches? Unfortunately, it's sometimes '+' or '*' after regexp atom so i can't use empty string - "nothing to repeat" error will rise.

A: 

A null will find nothing, so "re.compile()" should theoretically work.

Correspondingly in Perl or VIM, // as "/" is the regex delimiter marker.

Autocracy
None + "+|" + "abc" will be painful for any language - Null will not concatenate with rest of strings. I can't replace ENTIRE string inside re.compile(), only parts that i have marked with 'regexp1', 'regexp2' etc
Eye of Hell
+7  A: 

This shouldn't match anything:

re.compile('$^')

So if you replace regexp1, regexp2 and regexp3 with '$^' it will be impossible to find a match. Unless you are using the multi line mode.


After some tests I found a better solution

re.compile('a^')

It is impossible to match and will fail earlier than the previous solution. You can replace a with any other character and it will always be impossible to match

Nadia Alramli
That will not match anything for sure and is lightweight for regexp engine to process? (don't want my stub regexps to eat a lot of cpu)
Eye of Hell
@Eye of hell. It should be lightweight. This will try to match a line end followed by a line start. Which is impossible in one line.
Nadia Alramli
But possible with multiple lines of course (depending on if the flag is enabled) - for a solution that works whether the flag is enabled or not, see my answer.
Peter Boughton
+1  A: 

Maybe '.{0}'?

Steef
Will return a match object
Eye of Hell
+2  A: 
"()"

matches nothing and nothing only.

balpha
This will match an empty string. It depends on what @Eye of Hell is requesting. If he wants no match at all then it will not work.
Nadia Alramli
Nope - this matches anything but is considered a bad pattern in many regex implementations (dependent on flags sometimes)
ShuggyCoUk
Do not want to match anything. I will check how python interprets "()".
Eye of Hell
Python regexp will return an empty match object. For ^$ it always return None (nothing found).
Eye of Hell
I'm sorry, I misunderstood the question.
balpha
+7  A: 

To match an empty string - even in multiline mode - you can use \A\Z, so:

re.compile('\A\Z|\A\Z*|\A\Z+')

The difference is that \A and \Z are start and end of string, whilst ^ and $ these can match start/end of lines, so $^|$^*|$^+ could potentially match a string containing newlines (if the flag is enabled).

And to fail to match anything (even an empty string), simply attempt to find content before the start of the string, e.g:

re.compile('.\A|.\A*|.\A+')

Since no characters can come before \A (by definition), this will always fail to match.

Peter Boughton
Yours looks nicer than mine since I assume it would exit out faster than using end of line.
ShuggyCoUk
Peter, you use \z (lower-case) while my Python pocket guide tells me the end-of-string assertion is \Z (upper-case)?!
ThomasH
ThomasH, they both are end of string, but the uppercase version allows a trailing newline whilst the lowercase one does not.
Peter Boughton
Mh, interesting, I find this nowhere documented. Also, _re.search("boo\z", "fooboo")_ doesn't returns a match object, while _re.search("boo\Z", "fooboo)_ does. Rather, _re.search("boo\z", "foobooz")_ matches, which speaks to the fact that '\z' is simply interpreted as 'z', right?! (This is in Python 2.6).
ThomasH
Ah sorry, I thought Python was PCRE, but it turns out there's a few differences, and this is one of them. ( See 'Anchors' at http://www.regular-expressions.info/refflavors.html )
Peter Boughton
Great. - As this question was somewhat Python-oriented, maybe you want to update your otherwise excellent answer.
ThomasH
oops, thanks for the reminder on that - just fixed it.
Peter Boughton
+1  A: 

You could use
\z..
This is the absolute end of string, followed by two of anything

If + or * is tacked on the end this still works refusing to match anything

ShuggyCoUk
A: 

Or, use some list comprehension to remove the useless regexp entries and join to put them all together. Something like:

re.compile('|'.join([x for x in [regexp1, regexp2, ...] if x != None]))

Be sure to add some comments next to that line of code though :-)

Mike Miller
+4  A: 

(?!) should always fail to match. It is the zero-width negative look-ahead. If what is in the parentheses matches then the whole match fails. Given that it has nothing in it, it will fail the match for anything (including nothing).

Chas. Owens
Right, I was just going to post this too. This is the best way, if your language supports lookaheads. Likewise (?=) matches every string.
Brian Carper