tags:

views:

40

answers:

3

I want to capture all text & blocks of text between <% and %>.

For example:

<html>
<head>
<title>Title Here</title>
</head>
<body>
<% include("/path/to/include") %>
<h1>Test Template</h1>
<p>Variable: <% print(second_var) %></p>
<%

variable = value;

foreach(params here)
{
    code here
}

%>
<p><a href="/" title="Home">Home</a></p>
</body>
</html>

I have tried \<\%(.*)\%\> but that will capture everything including <h1>Test Template</h1> block as well.

+1  A: 

I've been using Microsoft's Regex engine (provided by JScript in IE) and it has a 'multi-line' switch that effects the behaviour of ., but then still I've had problems I had to resolve using [\u0000-\uFFFF] which matches everything including EOL's or any control chars...

So have a go with <%([\u0000-\uFFFF]*?)%>

Stijn Sanders
The multiline (`m`) modifier does not affect the behavior of `.`. It's the single-line (DOTALL, `s`) modifier that does that, but JavaScript doesn't support it. The most common idiom for matching anything-including-newlines in JavaScript is `[\s\S]`, as @Tim demonstrated in his answer.
Alan Moore
+1  A: 

\<\%(.*?)\%\>. You need to use .*? to get non-greedy pattern matching.

EDIT To solve the multiline problem, you can't use the . wildcard, as it matches everything except newline. This option differs depending on your regular expressions engine. So, I can tell you what to do if you tell me your regex engine.

Rafe Kettler
some regex engine take `\<` for _beginning of word_ and `\>` for _end of word_
Stijn Sanders
@Stijn: in that case, you can just escape \< and \>
Rafe Kettler
+2  A: 

Which regex engine are you using?

<%(.*?)%>

should work with the "dot matches newline" option enabled. If you don't know how to set that, try

<%([\s\S]*?)%>

or

(?s)<%(.*?)%>

No need to escape <, %, or > by the way.

Tim Pietzcker