views:

284

answers:

4

I have something like the following in a string:

blah blah

BEGINIGNORE
   this stuff should get stripped out
ENDIGNORE

more stuff here

I would like to do this (perl syntax): s/BEGINIGNORE.*ENDIGNORE//s -- namely, strip out everything between BEGINIGNORE and ENDIGNORE, inclusive. You would think the following would do that in Mathematica:

StringReplace[str, re["BEGINIGNORE[.\\s]*ENDIGNORE"]->""]

But it doesn't. How do I do this in Mathematica?

PS: I define the following alias: re = RegularExpression;

+3  A: 

It turns out that for some reason "[.\\s]" and "[.\\n]" don't work but "(.|\\n)" does. So the following works:

strip[s_String] := StringReplace[s, re@"BEGINIGNORE(.|\\n)*ENDIGNORE" -> ""]
dreeves
The reason is that "[.\s]" (inside square brackets) matches a dot, a slash, or an "s", whereas "(.|\n)" matches any character except a newline (the dot) or a newline (the "\n"), which is what you want,
MarkusQ
I see. Thanks Markus! (I did confirm that in Perl this works the way I had expected, which I think is the more reasonable behavior.)
dreeves
+1  A: 

Try:

StringReplace[str, re["BEGINIGNORE(.|\\n)*ENDIGNORE"]->""]
MarkusQ
A: 

As you followed up, you need parens rather than square brackets around the expression that you wanted to *.

The square brackets define a character class here, as in most regular expression languages. That's why [.\\s] isn't working as you expected, it stands for a set of characters rather than a parenthesized expression. Maybe the Mathematica use of [] for expressions got you thinking in that direction?

jfklein
Actually, in Perl it works as I expected so I do blame Mathematica here.
dreeves
+1  A: 

Insert the (?s) modifier in the regex. That's equivalent to Perl's /s modifier and is part of standard PCRE syntax.

StringReplace[str, re["BEGINIGNORE(?s).*ENDIGNORE"]->""]

More details in this answer to a related question: http://stackoverflow.com/questions/2257884/bug-in-mathematica-regular-expression-applied-to-very-long-string/2261807#2261807

dreeves