views:

337

answers:

4

i often forget about the regular expression modifier "m" and "s" and their differences. what is a good way to remember them?

as i understand them, they are:

'm' is for multiline, so that ^ and $ will match beginning of string and end of string multiple times. (as divided by "\n")

's' is so that the dot will match even the newline character

often, i just use

/some_pattern/ism

but it probably is better to use them accordingly (usually "s" in my cases).

what do you think can be a good way to remember them, instead of forgetting which is which every time?

+1  A: 

Maybe add your own question as a favorite? So next time you forget you will head on over to StackOverflow to ask (instead of just googling it) and you will see it under your account as a favorite question! What you said above is correct. If you don't use the m modifier, the ^ and $ only match at the beginning and the end of the whole string (as opposed to beginning and ending of lines).

Hopefully you ALWAYS use the m modifier so you don't ACCIDENTALLY THE WHOLE STRING!!!!

GreenieMeanie
+3  A: 

I like the explanation in 'man perlre':

m Treat string as multiple lines.
s Treat string as single line.

With multiple lines, ^ and $ apply to individual lines (i.e. just before and after newlines).
With a single line, ^ and $ apply to the whole, and \n just becomes another character you can match.

[Wrong]By using both m and s as you described, I would expect the second one to take precedence, so you would always be in multiline mode with /ism.[/Wrong]

I didn't read far enough:
The "/s" and "/m" modifiers both override the $* setting. That is, no matter what $* contains, "/s" without "/m" will force "^" to match only at the beginning of the string and "$" to match only at the end (or just before a newline at the end) of the string. Together, as /ms, they let the "." match any character whatsoever, while still allowing "^" and "$" to match, respectively, just after and just before newlines within the string.

JimG
hm... is it true that if we don't use 'm' or 's', then it is neither multiple line nor single line? you would think it has to be either one.
動靜能量
by the way, this way of describing it will make it a conflict for the ^ and $ if we use both 'm' and 's'... i am using the def from PHP... so maybe the def is a bit different on other platform.
動靜能量
A: 

maybe this way, i will never forget:

when i want to match across lines (usually using .*? to match something that doesn't matter if it span across multiple line), i will naturally think of multiline, and therefore, 'm'. Well, 'm' is actually not the one, so it is 's'.

(since i already remember 'ism' so well... so i can always remember it is not 'm', then it must be 's').

other lame attempt includes:

s is for DOTALL, it is for DOT to match ALL.
m is multiline, it is for ^ and $ to match a lot of times.

動靜能量
s is for "super match", so you can even match invisible characters ;)
JimG
+1  A: 

It's not uncommon to find someone who's been using regexes for years who still doesn't understand how those two modifiers work. As you observed, the names "multiline" and "singleline" are not very helpful. They sound like they must be mutually exclusive, but they're completely independent. Ignore the names and concentrate on what they do: /m changes the behavior of the ^ and $ anchors, and /s changes the behavior of the dot.

Another thing to keep in mind is that Ruby does things differently. In Ruby, there's no /s modifier; instead, /m does what /s does in most other regex flavors: it lets the dot match newlines. They even call that multiline mode, not singleline or DOTALL mode like everyone else does. Actually, Ruby doesn't need a multiline modifier (as everyone else defines it) because ^ and $ always match at line boundaries.

Alan Moore
what if in Ruby, I want it to match only beginning and end of string, ignoring the \n ?
動靜能量
Then you use \A and \z. Those are available in most other flavors, too; you just don't see them used very much.
Alan Moore