tags:

views:

221

answers:

4

I don't know whether it's really easy and I'm out of my mind....

In Ruby's regular expressions, how to match strings which do not contain two consecutive underscores, i.e., "__".

Ex:

Matches: "abcd", "ab_cd", "a_b_cd", "%*##_@+"
Does not match: "ab__cd", "a_b__cd"

-thanks

EDIT: I can't use reverse logic, i.e., checking for "__" strings and excluding them, since need to use with Ruby on Rails "validates_format_of()" which expects a regular expression with which it will match.

A: 

Negative lookahead

\b(?!\w*__\w*)\w+\b

Search for two consecutive underscores in the next word from the beginning of the word, and match that word if it is not found.

Edit: To accommodate anything other than whitespaces in the match:

(?!\S*__\S*)\S+

If you wish to accommodate a subset of symbols, you can write something like the following, but then it will match _cd from a_b__cd among other things.

(?![a-zA-Z0-9_%*#@+]*__[a-zA-Z0-9_%*#@+]*)[a-zA-Z0-9_%*#@+]+
Amarghosh
`\w` does not contain `_`.
Gumbo
@Gumbo: \w stands for "word character", usually [A-Za-z0-9_]. Notice the inclusion of the underscore and digits
Andomar
@Gumbo http://www.regular-expressions.info/charclass.html says `\w` includes `_`
Amarghosh
Fixed to search for `__` anywhere in the next word
Amarghosh
Ah, sorry. Mixed it up with `-`.
Gumbo
Sorry, matches every string.
Vikrant Chaudhary
And after your update, fails to match "+".
Vikrant Chaudhary
what are the possible characters? replace `\w` with something like `[a-zA-Z0-9%*#_@+]` to match the required characters.
Amarghosh
Got the concept. But, @Andomar has a better pattern for negative lookahead (talking by length). Still, will give a +1. Many thanks for stopping by and answering
Vikrant Chaudhary
@Andomar's regex assumes that you are passing only the test string. I was trying to find matches in a given string. Test the following string with both regexes (with global flag on) and see the outputs: ` "abcd", "ab_cd", "a_b_cd", "%*##_@+" "ab__cd", "a_b__cd"`
Amarghosh
I'm really sorry, but I fail to see the point. Both work well for me. With only the difference that for string "%*##_@+", @Andomar's pattern matches at 0 and yours at 4th.Also, I didn't get the global flag thing.
Vikrant Chaudhary
+8  A: 

You could use negative lookahead:

^((?!__).)*$

The beginning-of-string ^ and end of string $ are important, they force a check of "not followed by double underscore" on every position.

Andomar
In Ruby `^` and `$` are beginning-of-line and end-of-line respectively (for strings that contain multiple lines). Strictly, use `\A` and `\Z` for beginning-of-string and end-of-string
glenn jackman
Thanks, works great. But, unfortunately I can select only one answer, so chosen "Mark Byers", as his solution is, well, simple. Nonetheless, +1 for sure.
Vikrant Chaudhary
Yesterday, I didn't select your answer as accepted one, because then I didn't know about the "negative lookahead" thing, but now I do. And so, now I find your answer like, "wow". I tried to modify it, but can't get any better.Thanks for answering.
Vikrant Chaudhary
+2  A: 

Would altering your logic still be valid?

You could check if the string contains two underscores with the regular expression [_]{2} and then just ignore it?

StevieJ
+3  A: 
/^([^_]*(_[^_])?)*_?$/

Tests:

regex=/^([^_]*(_[^_])?)*_?$/

# Matches    
puts "abcd" =~ regex
puts "ab_cd" =~ regex
puts "a_b_cd" =~ regex
puts "%*##_@+" =~ regex
puts "_" =~ regex
puts "_a_" =~ regex

# Non-matches
puts "__" =~ regex
puts "ab__cd" =~ regex
puts "a_b__cd" =~ regex

But regex is overkill for this task. A simple string test is much easier:

puts ('a_b'['__'])
Mark Byers
Works great, and looks so "not-complex" (comparatively). Also, thanks for taking time for writing those examples. I just had to do a "copy-paste" to my `irb` session for validations. Also, another right answer - http://stackoverflow.com/questions/1873436/regular-expression-for-not-matching-two-underscores/1873474#1873474
Vikrant Chaudhary