Given the text:
[*] test1
[list]
[*] test2
[*] test3
[*] test4
[/list]
[*] test5
the regex:
\[\*]\s*([^\r\n]+)(?=((?!\[list])[\s\S])*\[/list])
matches only [*] test2
, [*] test3
and [*] test4
. But if the [list]
's can be nested, or a more broader set of a BB-like language needs to be parsed, I opt for a proper parser.
To do the replacements, replace the regex I suggested with:
<li>$1</li>
and then replace [list]
with <ul>
and [/list]
with </ul>
(assuming [list]
and [/list]
are only used for lists and are not present in comments or string literals or something).
When running the following snippet:
var text = "[*] test1\n"+
"\n"+
"[list]\n"+
"[*] test2\n"+
"[*] test3\n"+
"[*] test4\n"+
"[/list]\n"+
"\n"+
"[*] test5\n"+
"\n"+
"[list]\n"+
"[*] test6\n"+
"[*] test7\n"+
"[/list]\n"+
"\n"+
"[*] test8";
print(text + "\n============================");
text = text.replace(/\[\*]\s*([^\r\n]+)(?=((?!\[list])[\s\S])*\[\/list])/g, "<li>$1</li>");
text = text.replace(/\[list]/g, "<ul>");
text = text.replace(/\[\/list]/g, "</ul>");
print(text);
the following is printed:
[*] test1
[list]
[*] test2
[*] test3
[*] test4
[/list]
[*] test5
[list]
[*] test6
[*] test7
[/list]
[*] test8
============================
[*] test1
<ul>
<li>test2</li>
<li>test3</li>
<li>test4</li>
</ul>
[*] test5
<ul>
<li>test6</li>
<li>test7</li>
</ul>
[*] test8
A small explanation might be in order:
\[\*]\s*
matches the sub string [*]
followed by zero or more white space characters;
([^\r\n]+)
gobbles up the rest of the line and saves it in match group 1;
(?=((?!\[list])[\s\S])*\[/list])
ensures that every match group 1 must have a sub string [/list]
ahead of without encoutering a [list]
EDIT
Or better yet, do as Gumbo suggest in the comment to this answer: match all [list] ... [/list]
and then replace all [*] ...
in those.