tags:

views:

369

answers:

8

I think I'm burnt out, and that's why I can't see an obvious mistake. Anyway, I want the following regex:

#BIZ[.\s]*#ENDBIZ

to grab me the #BIZ tag, #ENDBIZ tag and all the text in between the tags. For example, if given some text, I want the expression to match:

#BIZ
some text some test
more text
maybe some code
#ENDBIZ

At the moment, the regex matches nothing. What did I do wrong?

ADDITIONAL DETAILS

I'm doing the following in PHP

preg_replace('/#BIZ[.\s]*#ENDBIZ/', 'my new text', $strMultiplelines);

+1  A: 

Depending on the environment you're using your regex in, it may need special care to properly parse multiline text, eg re.DOTALL in Python. So what environment is that?

Alex Martelli
It requires the single line option (`.` matches also newline) and not the multiline option (`$` and `^` anchor also at newlines).
Daniel Brückner
You're right -- DOTALL, not MULTILINE! Tx for the spotting -- editing now.
Alex Martelli
A: 

It looks like you're doing a javascript regex, you'll need to enable multiline by specifying the m flag at the end of the expression:

var re = /^deal$/mg
Soviut
+10  A: 

The dot loses its special meaning inside a character class — in other words, [.\s] means "match period or whitespace". I believe what you want is [\s\S], "match whitespace or non-whitespace".

preg_replace('/#BIZ[\s\S]*#ENDBIZ/', 'my new text', $strMultiplelines);

Edit: A bit about the dot and character classes:

By default, the dot does not match newlines. Most (all?) regex implementations have a way to specify that it match newlines as well, but it differs by implementation. The only way to match (really) any character in a compatible way is to pair a shorthand class with its negation — [\s\S], [\w\W], or [\d\D]. In my personal experience, the first seems to be most common, probably because this is used when you need to match newlines, and including \s makes it clear that you're doing so.

Also, the dot isn't the only special character which loses its meaning in character classes. In fact, the only characters which are special in character classes are ^, -, \, and ]. Check out the "Metacharacters Inside Character Classes" section of the character classes page on Regular-Expressions.info.

Ben Blank
No it doesn't - at least, not in PCRE, which preg_ should be using. The dot means the same thing inside and outside regular expressions when I use them.
Chris Lutz
that worked!!!! thanks!
@Chris Lutz — I just tested this. Using PCRE, "[.\s]" only matches the space in "foo bar". Try it yourself.
Ben Blank
Ben's right: inside a character class, the dot just matches a dot.
Alan Moore
Damn. I believe you, and took off my -1, but I can't get this to work. PHP is being asinine on me. Anyway, I keep getting thrown off when PCRE languages work differently than Perl does. It makes no sense that _Perl_-Compatable Regular Expressions aren't Perl-compatable.
Chris Lutz
Actually, Perl works the same. Which is weird, because I swear I've used the dot successfully inside brackets before. I'm having a bad day and need to stop procrastinating doing my final.
Chris Lutz
be careful, the solution above is greedy... need to use [\s\S]*? (with the question mark)
動靜能量
+2  A: 
// Replaces all of your code with "my new text", but I do not think
// this is actually what you want based on your description.
preg_replace('/#BIZ(.+?)#ENDBIZ/s', 'my new text', $contents);

// Actually "gets" the text, which is what I think you might be looking for.
preg_match('/(#BIZ)(.+?)(#ENDBIZ)/s', $contents, $matches);
list($dummy, $startTag, $data, $endTag) = $matches;
Beau Simensen
Clearly you have not actually tried it because it works just fine for me. Matching newlines isn't an issue since I'm not using ^ or $ anywhere in the expression.
Beau Simensen
^ and $ are irrelevant. This regex works because the /s modifier allows the dot to match newlines.
Alan Moore
I haven't tried it because PHP hates me (and I hate it back, but that's another story). I missed the /s modifier. Otherwise, what Alan M said.
Chris Lutz
+1  A: 

This should work

#BIZ[\s\S]*#ENDBIZ

You can try this online Regular Expression Testing Tool

Robert Kozak
+1 for the tool
sharkin
+1  A: 

The mistake is the character group [.\s] that will match a dot (not any character) or white space. You probably tried to get .* with . matching newline characters, too. You achieve this by enabling the single line option ((?s:) does this in .NET regex).

(?s:#BIZ.*?#ENDBIZ)
Daniel Brückner
A: 

Unless I am missing something, you handle this the same way that you would in Perl, with either the /m or /s modifier at the end? Oddly enough the other answers that rather correctly pointed this out got down voted?!

D.Shawley
I was tired and I missed the /s part. I've since corrected the downvote.
Chris Lutz
A: 

you can use

preg_replace('/#BIZ.*?#ENDBIZ/s', 'my new text', $strMultiplelines);

the 's' modifier says "match the dot with anything, even the newline character". the '?' says don't be greedy, such as for the case of:

foo

#BIZ
some text some test
more text
maybe some code
#ENDBIZ

bar

#BIZ
some text some test
more text
maybe some code
#ENDBIZ

hello world

the non-greediness won't get rid of the "bar" in the middle.

動靜能量