ansaurus

Question

Answer 1

+1 A:

Depending on the environment you're using your regex in, it may need special care to properly parse multiline text, eg re.DOTALL in Python. So what environment is that?

Alex Martelli 2009-05-27 23:47:42

It requires the single line option (`.` matches also newline) and not the multiline option (`$` and `^` anchor also at newlines).

Daniel Brückner 2009-05-28 00:10:43

You're right -- DOTALL, not MULTILINE! Tx for the spotting -- editing now.

Alex Martelli 2009-05-28 00:43:51

Answer 2

A:

It looks like you're doing a javascript regex, you'll need to enable multiline by specifying the m flag at the end of the expression:

var re = /^deal$/mg

Soviut 2009-05-27 23:51:37

Answer 3

+10 A:

The dot loses its special meaning inside a character class — in other words, [.\s] means "match period or whitespace". I believe what you want is [\s\S], "match whitespace or non-whitespace".

preg_replace('/#BIZ[\s\S]*#ENDBIZ/', 'my new text', $strMultiplelines);

Edit: A bit about the dot and character classes:

By default, the dot does not match newlines. Most (all?) regex implementations have a way to specify that it match newlines as well, but it differs by implementation. The only way to match (really) any character in a compatible way is to pair a shorthand class with its negation — [\s\S], [\w\W], or [\d\D]. In my personal experience, the first seems to be most common, probably because this is used when you need to match newlines, and including \s makes it clear that you're doing so.

Also, the dot isn't the only special character which loses its meaning in character classes. In fact, the only characters which are special in character classes are ^, -, \, and ]. Check out the "Metacharacters Inside Character Classes" section of the character classes page on Regular-Expressions.info.

Ben Blank 2009-05-27 23:59:56

No it doesn't - at least, not in PCRE, which preg_ should be using. The dot means the same thing inside and outside regular expressions when I use them.

Chris Lutz 2009-05-28 00:02:25

that worked!!!! thanks!

2009-05-28 00:04:17

@Chris Lutz — I just tested this. Using PCRE, "[.\s]" only matches the space in "foo bar". Try it yourself.

Ben Blank 2009-05-28 00:04:37

Ben's right: inside a character class, the dot just matches a dot.

Alan Moore 2009-05-28 00:14:42

Damn. I believe you, and took off my -1, but I can't get this to work. PHP is being asinine on me. Anyway, I keep getting thrown off when PCRE languages work differently than Perl does. It makes no sense that _Perl_-Compatable Regular Expressions aren't Perl-compatable.

Chris Lutz 2009-05-28 00:17:45

Actually, Perl works the same. Which is weird, because I swear I've used the dot successfully inside brackets before. I'm having a bad day and need to stop procrastinating doing my final.

Chris Lutz 2009-05-28 00:20:06

be careful, the solution above is greedy... need to use [\s\S]*? (with the question mark)

動靜能量 2009-05-28 01:25:32

Answer 4

+2 A:

// Replaces all of your code with "my new text", but I do not think
// this is actually what you want based on your description.
preg_replace('/#BIZ(.+?)#ENDBIZ/s', 'my new text', $contents);

// Actually "gets" the text, which is what I think you might be looking for.
preg_match('/(#BIZ)(.+?)(#ENDBIZ)/s', $contents, $matches);
list($dummy, $startTag, $data, $endTag) = $matches;

Beau Simensen 2009-05-28 00:02:07

Clearly you have not actually tried it because it works just fine for me. Matching newlines isn't an issue since I'm not using ^ or $ anywhere in the expression.

Beau Simensen 2009-05-28 00:12:06

^ and $ are irrelevant. This regex works because the /s modifier allows the dot to match newlines.

Alan Moore 2009-05-28 00:27:50

I haven't tried it because PHP hates me (and I hate it back, but that's another story). I missed the /s modifier. Otherwise, what Alan M said.

Chris Lutz 2009-05-28 00:29:27

Answer 5

+1 A:

This should work

#BIZ[\s\S]*#ENDBIZ

You can try this online Regular Expression Testing Tool

Robert Kozak 2009-05-28 00:05:09

+1 for the tool

sharkin 2009-10-01 16:45:14

Answer 6

+1 A:

The mistake is the character group [.\s] that will match a dot (not any character) or white space. You probably tried to get .* with . matching newline characters, too. You achieve this by enabling the single line option ((?s:) does this in .NET regex).

(?s:#BIZ.*?#ENDBIZ)

Daniel Brückner 2009-05-28 00:06:42

Answer 7

A:

Unless I am missing something, you handle this the same way that you would in Perl, with either the /m or /s modifier at the end? Oddly enough the other answers that rather correctly pointed this out got down voted?!

D.Shawley 2009-05-28 00:27:50

I was tired and I missed the /s part. I've since corrected the downvote.

Chris Lutz 2009-05-28 00:32:45

Answer 8

A:

you can use

preg_replace('/#BIZ.*?#ENDBIZ/s', 'my new text', $strMultiplelines);

the 's' modifier says "match the dot with anything, even the newline character". the '?' says don't be greedy, such as for the case of:

foo

#BIZ
some text some test
more text
maybe some code
#ENDBIZ

bar

#BIZ
some text some test
more text
maybe some code
#ENDBIZ

hello world

the non-greediness won't get rid of the "bar" in the middle.

動靜能量 2009-05-28 01:23:27

ansaurus

tags:

views:

answers:

Regex - Multiline Problem

ADDITIONAL DETAILS

related questions