tags:

views:

84

answers:

3

Hello.

I need help with a RegEx problem:

I want to find occurences of two known words ("foo" and "bar" for example), that have any white space other than EXACTLY ONE SPACE CHARACTER between them.

In the text that I have to grep, there may be spaces, tabs, CRs, LFs or any combination of them between the two words.

In RegEx words: I need one regular expression that matches "foo[ \t\n\r]+bar" but does NOT match "foo bar".

Everything I've tried so far either missed some combinations or also matched the single-space-case which is the only one that should NOT match.

Thanks in advance for any solutions.

EDIT: To clarify, I'm using Perl compatible RegEx here.

+1  A: 

You could use (assuming ERE, i.e. grep -E)

foo[:space:]{2,}bar

The syntax x{min,} means the pattern x must appear at least min times.


If by "other than EXACTLY ONE SPACE CHARACTER" you mean except the 0x20 space character, you need an alternation:

foo([\t\n\r]|[ \t\n\r]{2,})bar
KennyTM
Both of these miss the cases where there is a single tab, CR or LF beteen the words.
Techpriester
@Techpriester: The alternation solution should work, and is quite readable.
polygenelubricants
No, like some other posted solutions here, it misses the case when there is one single "\t" between the words.
Techpriester
+4  A: 

You could also use a negative lookahead:

foo(?! \b)\s+bar

If lookaheads are not supported you can write it explicitly:

foo(?:[^\S ]| \s)\s*bar

The expression [^\S ] includes a double negative and it might not be immediately obvious how this works. If you work it out the logic it means any whitespace apart from a space.

Mark Byers
This seems to work. Interesting twist with the not-non-whitespace-thing. The lookahead is nice, too.
Techpriester
I think, I'll go with the lookahead. It's easier to remember and to read.
Techpriester
A: 

use [:space:]{2,}

{2,} means 2 or more

explodus
Nope. This misses "foo\tbar" for example.
Techpriester