ansaurus

Question

Regular expression does not match what I would expect it to match

Answer 1

A:

Because once it's matched " MsoClass2\t", the matcher is looking at the m in msoclass3, which doesn't match the initial space.

Simon Nickerson 2009-08-26 11:05:43

Answer 2

A:

This is becaue you are using ^ OR \s(whitespace) for first match while the string has NO whitespace for class 3. To get the results you want, use the following inside match():

/mso.*?(\s|$)/ig

Crimson 2009-08-26 11:07:47

Answer 3

A:

Hi,

I am not sure you can use something like (^|\s) and (\s|$), first -- maybe you can, but I have to thikn to understand the regex -- and it's never good when someone has to think to understand a regex : those are often quite too complicated :-(

If you want to match words that begins by "mso", be it upper or lowercase, I'd probably use something like this :

"class1 MsoClass2\tmsoclass3\t MSOclass4 msoc5".match(/\s?(mso[^\s]*)\s?/ig);

Which gets you :

[" MsoClass2 ", "msoclass3 ", " MSOclass4 ", "msoc5"]

Which is (almost : there are a couple white-spaces differences) what you asked.

Or, even simpler :

"class1 MsoClass2\tmsoclass3\t MSOclass4 msoc5".match(/(mso[^\s]*)/ig);

Which gets you :

["MsoClass2", "msoclass3", "MSOclass4", "msoc5"]

Whithout aby whitespace.

More easy to read / understand, too ;-)

Pascal MARTIN 2009-08-26 11:07:57

(^|\s) and (\s|$) are legit

Nerdling 2009-08-26 11:12:21

@Nerdling : thanks. (That's what I meant by "having to think" ^^ )

Pascal MARTIN 2009-08-26 11:21:45

Answer 4

+2 A:

The tabulator character before msoclass3 is already consumed by the first match " MsoClass2\t". Maybe you want to use a non-consuming look-ahead assertion instead:

/(^|\s)mso[^\s]*(?=\s|$)/

Gumbo 2009-08-26 11:07:58

Answer 5

+2 A:

Because the first match consumes the tab character, so there is no white space character left before the second MSO string. Same with the space after the second match.

Perhaps you want to match word boundaries instead of the separating characters. This code:

"class1 MsoClass2\tmsoclass3\t MSOclass4 msoc5".match(/\bmso.*?\b/ig)

will give you this result:

["MsoClass2","msoclass3","MSOclass4","msoc5"]

Guffa 2009-08-26 11:08:22

Didn't know about the \b wildcard; very elegant!

Tim Molendijk 2009-08-26 11:21:35

ansaurus

tags:

views:

answers:

Regular expression does not match what I would expect it to match

related questions