tags:

views:

1412

answers:

9

I am working on a simple token replacement feature of our product. I have almost resolved all the issue but I missed one thing. A token must support attributes, and an attribute can also be a token. This is part of a bigger project. hope you can help.

The begining tag is "**#[**" and the ending tag is "**]**". Say, #[FirstName], #[LastName], #[Age, WhenZero="Undisclosed"].

Right now i am using this expression "\#\[[^\]]+\]". I have this working but it failed on this input:

blah blah text here...
**#[IsFreeShipping, WhenTrue="<img src='/images/fw_freeshipping.gif'/>
<a href='http://www.hellowebsite.net/freeshipping.aspx'&gt;$[FreeShipping]&lt;/a&gt;"]**
blah blah text here also...

It fails becauise it encouter the first ], it stops there. It returns:

*#[IsFreeShipping, WhenTrue="<img src='/images/fw_freeshipping.gif'/>
<a href='http://www.hellowebsite.net/freeshipping.aspx'&gt;$[Product_FreeShipping]*

My desired result should be

*#[IsFreeShipping, WhenTrue="<img src='/images/fw_freeshipping.gif'/>
<a href='http://www.hellowebsite.net/freeshipping.aspx'&gt;$[FreeShipping]&lt;/a&gt;"]*
+1  A: 

This is a little border-line for a regexp, since it depends on a context, but still...

#\[(\](?=")|[^\]])+\]

should do it.

The idea is to mention a closing square bracket can be part of the parsed content if followed by a double quotes, as part of the end of an attribute.

If that same square bracket were anywhere within the attribute, that would be a lot harder...


The advantage with lookahead expression is that you can specify a regexp with a non-fixed match length.
So if the attribute closing square bracket is not followed by a double quote, but rather by another known expression, you just update the lookahead part:

#\[(\](?=</a>")|[^\]])+\]

will match only the second closing square bracket, since the first is followed by </a>".

Of course, any kind of greedy expression (.*]) would not work, since it would not match the second closing square bracket, but the last one. (Meaning if there are more the one intermediate ], it will be parsed.)

VonC
This won't work. There is a '</a>' before '"', not ']'.
J.F. Sebastian
Thanks J.F., I did not see the revisions of the question. I just updated my answer to reflect the changes.
VonC
A: 

When I've done stuff like this before I've evaluated from the inner most matchable expression before stepping out to larger strings.

In this case your regex should probably try to replace $[FreeShipping] with it's value before evaluating the larger token containing the if clause.

Perhaps you can figure out a way to replace out the value token's like $[FreeShipping] before the ones without $ prepending the token

This is roughly but not exactly

http://en.wikipedia.org/wiki/Multi-pass_compiler versus http://en.wikipedia.org/wiki/One-pass_compiler

Writing this in one regex won't necessarily be any faster than looping over a few simple regex's. All regex's do is abstract string parsing.

Stewart Robinson
A: 

If you're only expecting a single match in any given input you could simply allow for a greedy match:

/#\[.*\]/

If you're expecting multiples you have a problem because you no longer have regular text. You'll need to escape the inner brackets in some way.

(Regex is a deep subject - it's quite possible that someone has a better solution)

annakata
A: 

I'd be interested to lear if I'm wrong, but if I recall correctly, you cannot do this using regular expressions. This looks like a Dyck language to me and you would need a pushdown automaton to accept the expressions. But I must admit I'm not quite sure if this holds true for the extended form of regexp's like those provided by Perl.

cg
+1  A: 

Your Regex matches exactly what your stated condition indicates : Start with an opening square bracket and match everything upto the first closing square bracket.

If you want to match nested square brackets, you need to specify exactly what is valid when nested. For instance, you could say that square brackets can be nested when enclosed within quotes.

Cerebrus
A: 

I really need to escape the inner close brackets, because those tokens enclosed in $[] are to be replaced on the last part of the process. Maybe this is the reason why ASP.NET uses <% %> to check code inline codes in pages. I can consider #[]#, i am still trying to find possible solution here. hope you can help.

thanks. rodel

A: 

It is possible to write a regex for the example you given but in general it fails. A single regex can't work for arbitrary nested expressions.

Your example shows that your DSL has 'if' conditions already. Not before long It could evolve into a Turing-complete language.

Why don't you use an existing template language such as Django template language:

Your example:

blah blah text here... #[IsFreeShipping, 
WhenTrue="<img src='/images/fw_freeshipping.gif'/>
<a href='http://www.hellowebsite.net/freeshipping.aspx'&gt;$[FreeShipping]&lt;/a&gt;"]
blah blah text here also...

Using Django template language:

blah blah text here... {% if IsFreeShipping %}
<img src='/images/fw_freeshipping.gif'/>
<a href='http://www.hellowebsite.net/freeshipping.aspx'&gt;{{ FreeShipping }}</a>
{% endif %} blah blah text here also...
J.F. Sebastian
A: 

this is exactly what we're trying to avoid on our product :). users don't have to know programming and syntax, and should be as simple as just knowing the tokens to use.

ill just be tagging this as pending or for discussion withour team. should you have ideas, just pm me. many tnx.

A: 

This works for your sample:

#\[(?:[^\]$]+|\$(?!\[)|\$\[[^\[\]]*\])*\]

It assumes that the inner square brackets can't themselves contain square brackets. If the inner tokens can also contain tokens, you're probably out of luck. Some regex flavors can handle recursive structures, but the resulting regexes are hideous even by regex standards. :D

Tis regex also treats the '$' as special only if it's followed by an opening square bracket. If you want to disallow its use otherwise, remove the second alternative: |\$(?!\[)

Alan Moore