views:

534

answers:

2

I wrote a regex to fetch string from html, but it seems the multiline flag doesn't work.

this is my pattern and I want to get the text in h1 tag.

var pattern= /<div class="box-content-5">.*<h1>([^<]+?)<\/h1>/mi
m = html.search(pattern);
return m[1];

I created a string to test it. When the string contains "\n" the result is always null. If I remove all the "\n" , it gave me the right result, no matter with or without /m flag.

what's wrong with my regex?

+2  A: 

You want the s (dotall) modifier, which apparently doesn't exist in Javascript - you can replace . with [\s\S] as suggested by @molf. The m (multiline) modifier makes ^ and $ match lines rather than the whole string.

Greg
You might add that the /s" modifier sets singleline mode as opposed to multiline mode. +1
Cerebrus
+6  A: 

You are looking for the /.../s modifier, also known as the dotall modifier. It forces the dot . to also match newlines, which it does not do by default.

The bad news is that it does not exist in Javascript. The good news is that you can work around it by using a character class (e.g. \s) and its negation (\S) together, like this:

[\s\S]

So in your case the regex would become:

/<div class="box-content-5">[\s\S]*<h1>([^<]+?)<\/h1>/i
molf