tags:

views:

186

answers:

5

Hi,

I have a text file with text like:

"Lorem ipsum text. Second lorem ipsum. How are You. It's 
ok. Done. Something else now.

New line. Halo. Text. Are You ok."

I need a regex to find all sentences (between .) except ones with the word "else" within it. I'm trying many regex patterns but nothing works.

Can I do this with regex?

A: 

Yes, you can use a regex to match strings containing "else" very easily. The expression is very simple:

\belse\b

The \b at either end of the expression indicates a "word boundary", which means that the expression will only match the whole word else and will not match when else is part of another word. Note however that word boundaries don't continue on into punctuation characters, which is useful if you're parsing sentences as you are here.

Hence the expression \belse\b will match these sentences:

  • Blah blah else blah
  • else blah blah blah
  • blah blah blah else
  • blah blah blah else. // note the full stop

...but not this one...

  • blah blahelse blah

You don't say which language you're coding in, but here's a quick example in C#, using the System.Text.RegularExpressions.Regex class and written as an NUnit test:

        [Test]
        public void regexTest()
        {
            // This test passes

            String test1 = "This is a sentence which contains the word else";
            String test2 = "This is a sentence which does not";
            String test3 = "Blah blah else blah blah";
            String test4 = "This is a sentence which contains the word else.";

            Regex regex = new Regex("\\belse\\b");
            Assert.True(regex.IsMatch(test1));
            Assert.False(regex.IsMatch(test2));
            Assert.True(regex.IsMatch(test3));
            Assert.True(regex.IsMatch(test4));
        }
sgreeve
+1  A: 
Chris Smith
That will match 'else' regardless of whether it's a word in its own right or not (i.e., it will exclude abcelse123). You can replace `else` with `\belse\b` in the regex to constrain it to full words
Chris Smith
+1 for disclaimer: it's not quite a task for regex.
incarnate
A: 

if you are on unix, you can use awk.

$ awk -vRS="." '!/else/' file
"Lorem ipsum text
 Second lorem ipsum
 How are You
 It's
ok
 Done


New line
 Halo
 Text
 Are You ok
"
ghostdog74
A: 
sed 's/\(.[^.]*\)\./&\n/g;s/.*else.*//g' textfile
A: 

This is easier if you invert your approach: instead of constructing a regexp matching lines that do not contain "else", make one matching lines that do contain "else" (like sgreeve suggested), then select the lines that don't match.

markusk