views:

319

answers:

2
var ss= "<pre>aaaa\nbbb\nccc</pre>ddd";
var arr= ss.match( /<pre.*?<\/pre>/gm );
alert(arr);     // null

I'd want the PRE block be picked up, even though it spans over newline characters. I thought the 'm' flag does it. Does not.

Found the answer here before posting. SInce I thought I knew JavaScript (read three books, worked hours) and there wasn't an existing solution at SO, I'll dare to post anyways. throw stones here

So the solution is:

var ss= "<pre>aaaa\nbbb\nccc</pre>ddd";
var arr= ss.match( /<pre[\s\S]*?<\/pre>/gm );
alert(arr);     // <pre>...</pre> :)

Does anyone have a less cryptic way?

Edit: this is a duplicate but since it's harder to find than mine, I don't remove.

It proposes [^] as a "multiline dot". What I still don't understand is why [.\n] does not work. Guess this is one of the sad parts of JavaScript..

+1  A: 

[.\n] doesn't work, because dot in [] (by regex definition; not javascript only) means the dot-character. You can use (.|\n) (or (.|[\n\r])) instead.

Y. Shoham
`[\s\S]` is the most common JavaScript idiom for matching everything including newlines. It's easier on the eyes and much more efficient than an alternation-based approach like `(.|\n)`. (It literally means "any character that *is* whitespace or any character that *isn't* whitespace.)
Alan Moore
You're right, but the question was about `.` and `\n`, and why `[.\n]` doesn't work. As mentioned in the question, the `[^]` is also nice approach.
Y. Shoham
+1  A: 

[.\n] does not work because . has no special meaning inside of [], it just means a literal .. (.|\n) is probably the most clear way of expressing "any character, including a newline".

In general, you shouldn't try to use a regexp to match the actual HTML tags. See, for instance, these questions for more information on why.

Instead, try actually searching the DOM for the tag you need (using jQuery makes this easier, but you can always do document.getElementsByTagName("pre") with the standard DOM), and then search the text content of those results with a regexp if you need to match against the contents.

Brian Campbell
What I'm doing is making .wiki -> HTML conversion on the fly, using JavaScript. Therefore, I don't have the DOM available, yet.Wiki file is mostly its own syntax, but I allow HTML tags to be used if needed. Your advice is *very* valid, if I was dealing in DOM with this. Thanks. :)
akauppi
Fair enough. I suppose that is a valid reason to want to use regexes on HTML, though wiki syntaxes mixed with HTML can have all kinds of fun corner cases themselves.
Brian Campbell