views:

527

answers:

10

This might sound like a stupid question, but I had a long talk with some of my fellow developers and it sounded like a fun thing to think of.

So; what's your thought - what does a Regex look like, that will never be matched by any string, ever!

Edit: Why I want this? Well, firstly because I find it interesting to think of such an expression and secondly because I need it for a script.

In that script I define a dictionary as Dictionary<string, Regex>. This contains, as you see, a string and an expression.

Based on that dictionary I create methods that all use this dictionary as only reference on how they should do their work, one of them matches the regexes against a parsed logfile.

If an expression is matched, another Dictionary<string, long> is added a value that is returned by the expression. So, to catch any log-messages that are not matched by an expression in the dictionary I created a new group called "unknown".

To this group everything that didn't match anything other is added. But to prevent the "unknown"-expression to mismatch (by accident) a log-message, I had to create an expression that is most certainly never matched, no matter what string I give it.

Thus, there you have my reason for this "not a real question"...

A: 

How about $^ or maybe ?!?

Bob
`?!` is syntactically incorrect, since the `?` quantifier doesn't quantify anything.
Joey
A line break will be matched by this expression in the mode where `^` matches the begin and `$` the end of a line.
Gumbo
Maybe he meant `(?!)` - a negative lookahead for an empty string. But some regex flavors will treat that as a syntax error, too.
Alan Moore
+1  A: 
'[^0-9a-zA-Z...]*'

and replace ... with all printable symbols ;). That's for a text file.

Drakosha
I think there has to be a shorter way for that, but that was my first thought too^^
ApoY2k
This will match the empty string. To catch every possible character, use `[^\x00-\xFF]+` (for byte-based implementations).
Ferdinand Beyer
A better expression would be `[^\s\S]`. But as Ferdinand Beyer already said, it would match an empty string.
Gumbo
Drakosha's regex can match an empty string because of the `*`; leave that off, or replace it with `+`, and it has to match at least one character. If the class excludes all possible characters, it can't match anything.
Alan Moore
+6  A: 

This seems to work:

$.
Jerry Fernholz
That’s similar to Ferdinand Beyer’s example.
Gumbo
And it will match in dot-matches-newlines mode.
Tim Pietzcker
+14  A: 

This is actually quite simple, although it depends on the implementation / flags*:

$a

Will match a character a after the end of the string. Good luck.


*) Originally I did not give much thought on multiline-mode regexp, where $ also matches the end of a line. In fact, it would match the empty string right before the newline, so an ordinary character like a can never appear after $.

Ferdinand Beyer
`\Za` `\Z` matches end of the string
Amarghosh
+10  A: 

a\bc, where \b is a zero-width expression that matches word boundary.

It can't appear in the middle of a word, which we force it to.

Pavel Shved
+2  A: 

[^.]+

At least one or more of something not in the set of all elements.


Okay, you learn something new every day. At least this works: [^\w\W]+

Will
`.` loses its special meanings inside character classes, so your pattern matches any character sequence excluding the period.
Ferdinand Beyer
That won't actually work, a dot in a character class is a literal dot.
Daniel Vandersluis
-1 because I was wrong, fine, but + to you for teaching me something today.
Will
@Will: I did not vote on your answer.
Ferdinand Beyer
NP, answer was wrong; lucky I didn't end up with more!
Will
+15  A: 

look around:

(?=a)b

For regex newbies: The positive look ahead (?=a) makes sure that the next character is a, but doesn't change the search location (or include the 'a' in the matched string). Now that next character is confirmed to be a, the remaining part of the regex (b) matches only if the next character is b. Thus, this regex matches only if a character is both a and b at the same time.

Amarghosh
That one look familiar! :)
Bart Kiers
You bet it does :)
Amarghosh
/me creates collation where `*` is equal to every character, and thus is both `a` and `b` at the same time.
Roger Pate
+3  A: 

Maximal matching

a++a

At least one a followed by any number of a's, without backtracking. Then try to match one more a.

or Independent sub expression

This is equivalent to putting a+ in an independent sub expression, followed by another a.

(?>a+)a
Brad Gilbert
A: 

A portable solution that will not depend on the regexp implementation is to just use a constant string that you are sure will never appear in the log messages. For instance make a string based on the following:

cat /dev/urandom | hexdump | head -20
0000000 5d5d 3607 40d8 d7ab ce72 aae1 4eb3 ae47
0000010 c5e2 b9e8 910d a2d9 2eb3 fdff 6301 c85f
0000020 35d4 c282 e439 33d8 1c73 ca78 1e4d a569
0000030 8aca eb3c cbe4 aff7 d079 ca38 8831 15a5
0000040 818b 323f 0b02 caec f17f 387b 3995 88da
0000050 7b02 c80b 2d42 8087 9758 f56f b71f 0053
0000060 1501 35c9 0965 2c6e 03fe 7c6d f0ca e547
0000070 aba0 d5b6 c1d9 9bb2 fcd1 5ec7 ee9d 9963
0000080 6f0a 2c91 39c2 3587 c060 faa7 4ea4 1efd
0000090 6738 1a4c 3037 ed28 f62f 20fa 3d57 3cc0
00000a0 34f0 4bc2 3067 a1f7 9a87 086b 2876 1072
00000b0 d9e1 6b8f 5432 a60e f0f5 00b5 d9ef ed6f
00000c0 4a85 70ee 5ec4 a378 7786 927f f126 2ec2
00000d0 18c5 46fe b167 1ae6 c87c 1497 48c9 3c09
00000e0 8d09 e945 13ce 7da2 08af 1a96 c24c c022
00000f0 b051 98b3 2bf5 4d7d 5ec4 e016 a50d 355b
0000100 0e89 d9dd b153 9f0e 9a42 a51f 2d46 2435
0000110 ef35 17c2 d2aa 3cc7 e2c3 e711 d229 f108
0000120 324e 5d6a 650a d151 bc55 963f 41d3 66ee
0000130 1d8c 1fb1 1137 29b2 abf7 3af7 51fe 3cf4

Sure, this is not an intellectual challenge, but more like duct tape programming.

hlovdal
A: 
new Regex(Guid.NewGuid().ToString())

Creates a pattern containing only alphanumerics and '-' (none of which are regex special characters) but it is statistically impossible for the same string to have appeared anywhere before (because that's the whole point of a GUID.)

finnw