tags:

views:

481

answers:

6
+3  Q: 

Combine Regexp

After collecting user input for various conditions like

  1. Starts with : /(^@)/
  2. Ends with : /(@$)/
  3. Contains : /@/
  4. Doesn't contains

To make single regex if user enter multiple conditions, I combine them with "|" so if 1 and 2 given it become /(^@)|(@$)/

This method works so far but,

I'm not able to determine correctly, What should be regex for 4 the condition? And combining regex this way work?


Update: @(user input) won't be same for two conditions and not all four conditions always present but they can be and in future I might need more conditions like "is exactly" and "is exactly not" etc. so, I'm more curious to know this approach will scale ?

Also there may be issues of user input cleanup so regex escaped properly, but that is ignored right now.

+2  A: 

Doesn't contain @: /(^[^@]*$)/

Combining works if the intended result of combination is that any of them matching results in the whole regexp matching.

Ants Aasma
+2  A: 

If a string must not contain @, every character must be another character than @:

/^[^@]*$/

This will match any string of any length that does not contain @.

Another possible solution would be to invert the boolean result of /@/.

Gumbo
+1  A: 

In my experience with regex you really need to focus on what EXACTLY you are trying to match, rather than what NOT to match.

for example \d{2}

[1-9][0-9]

The first expression will match any 2 digits....and the second will match 1 digit from 1 to 9 and 1 digit - any digit. So if you type 07 the first expression will validate it, but the second one will not.

See this for adnvanced reference:

http://www.regular-expressions.info/refadv.html

EDITED:

^((?!my string).)*$ Is the regular expression for does not contain "my string".

gnomixa
Can you give more detail on how "Doesn't contains" condition matched using above suggestion.
nexneo
I assume you want an expression that "doesn't contain" something (it's not clear what you wish the expression not to contain). My suggestion show how you would do it if you don't want the expression to contain a digit 0 at the first position, in which which case you would limit the first position digit to 1-9. It's not very clear what you mean by "doesn't contains". Doesn't contain what? Please clarify so we could help you. My answer was more of a general answer. Sorry if that didn't help you.
gnomixa
gnomixa, Little bit testing shows your version works good.
nexneo
glad I could help
gnomixa
+1  A: 

Combining the regex for the fourth option with any of the others doesn't work within one regex. 4 + 1 would mean either the string starts with @ or doesn't contain @ at all. You're going to need two separate comparisons to do that.

Sugerman
@ won't be same for two conditions and not all four conditions always present but they can be and in future I might need more conditions like "is exactly" and "is exactly not" etc. so, I'm more curious to know this approach will scale ?
nexneo
+4  A: 

Will the conditions be ORed or ANDed together?

Starts with: abc
Ends with: xyz
Contains: 123
Doesn't contain: 456

The OR version is fairly simple; as you said, it's mostly a matter of inserting pipes between individual conditions. The regex simply stops looking for a match as soon as one of the alternatives matches.

/^abc|xyz$|123|^(?:(?!456).)*$/

That fourth alternative may look bizarre, but that's how you express "doesn't contain" in a regex. By the way, the order of the alternatives doesn't matter; this is effectively the same regex:

/xyz$|^(?:(?!456).)*$|123|^abc/

The AND version is more complicated. After each individual regex matches, the match position has to be reset to zero so the next regex has access to the whole input. That means all of the conditions have to be expressed as lookaheads (technically, one of them doesn't have to be a lookahead, I think it expresses the intent more clearly this way). A final .*$ consummates the match.

/^(?=^abc)(?=.*xyz$)(?=.*123)(?=^(?:(?!456).)*$).*$/

And then there's the possibility of combined AND and OR conditions--that's where the real fun starts. :D

Alan Moore
Yes, I'm fine with OR now. But thanks for putting AND version.Combining AND and OR is not for me. :)
nexneo
I tried the last AND regex and I noticed it has a syntax error, an extra ")" at the end.I removed this character but the regex didn't appear to work as intended, not sure what I did wrong? I'm using .Net to test.
Ralph Willgoss
It's actually the second-to-last `)` that doesn't belong there. Once that's fixed, the reason it doesn't work is because there's nothing in it that consumes characters--it's all lookaheads. I could make the last part not a lookahead, but for clarity's sake I'd rather add a `.*` to the end. I'm fixing it now; thanks for bringing it to my attention.
Alan Moore
A: 

1 + 2 + 4 conditions: starts|ends, but not in the middle

  /^@[^@]*@?$|^@?[^@]*@$/

is almost the same that:

  /^@?[^@]*@?$/

but this one matches any string without @, sample 'my name is hal9000'

javier fuentes