tags:

views:

62

answers:

3

I am given a string that has place holders in the format of {{some_text}}. I would like to extract this into a collection using C# and believe RegEx is the best way to do this. RegEx is a little over my head but it seems powerful enough to work in this case. Here is my example:

<a title="{{element='title'}}" href="{{url}}">
<img border="0" alt="{{element='title'}}" src="{{element='photo' property='src' maxwidth='135'}}" width="135" height="135" /></a>
<span>{{element='h1'}}</span>
<span><strong>{{element='price'}}<br /></strong></span>

I would like to end up with something like this:

collection[0] = "element='title'";

collection[1] = "url";

collection[2] = "element='photo' property='src' maxwidth='135'";

collection[3] = "element='h1'";

collection[4] = "element='price'";

Notice that there are no duplicates either, but I do not want to complicate things if it is difficult to do.

I saw this post that does something similar but within brackets: http://stackoverflow.com/questions/1811183/how-to-extract-the-contents-of-square-brackets-in-a-string-of-text-in-c-using-re

My problem here is that I have double braces instead of just one character. How can I do this?

A: 

RegEx is more than powerful enough for what you need.

Try this regular expression:

\{\{.*?\}\}

That will match expressions between double brackets, lazily.

Edit: that will give you the strings, including the double brackets. You can parse them manually, but if the regex engine supports lookahead and lookbehind, you can get what's inside directly with something like:

(?<=\{\{).*?(?=\}\})
Santiago Lezica
+1  A: 

Taking exactly from the question you linked:

ICollection<string> matches =
    Regex.Matches(s.Replace(Environment.NewLine, ""), @"\{\{([^}]*)\}\}")
        .Cast<Match>()
        .Select(x => x.Groups[1].Value)
        .ToList();

foreach (string match in matches)
    Console.WriteLine(match);

I've changed the [ and ] to {{ and }} (escaped). This should make the collection you need. Be sure to read the first answer to the other question for the regex breakdown. It's important to understand it if you use it.

JoshD
A: 

You will need to get rid of the duplicates after you have the matches.

\{\{(.*?)}}

Result 1

  1. element='title'

Result 2

  1. url

Result 3

  1. element='title'

Result 4

  1. element='photo' property='src' maxwidth='135'

Result 5

  1. element='h1'

Result 6

  1. element='price'
tinifni