views:

486

answers:

4

Hi,

I'm writing a class called StringTemplate, which allows to format objects like with String.Format, but with names instead of indexes for placeholders. Here's an example :

string s = StringTemplate.Format("Hello {Name}. Today is {Date:D}, and it is {Date:T}.",
                                 new { Name = "World", Date = DateTime.Now });

To achieve this result, I look for placeholders and replace them with indexes. I then pass the resulting format string to String.Format.

This works fine, except when there are doubled braces, which are an escape sequence. The desired behavior (which is the same as String.Format) is described below :

  • "Hello {Name}" should be formatted as "Hello World"
  • "Hello {{Name}}" should be formatted as "Hello {Name}"
  • "Hello {{{Name}}}" should be formatted as "Hello {World}"
  • "Hello {{{{Name}}}}" should be formatted as "Hello {{Name}}"

And so on...

But my current regular expression doesn't detect the escape sequence, and always considers the substring between brackets as a placeholder, so I get things like "Hello {0}".

Here's my current regular expression :

private static Regex _regex = new Regex(@"{(?<key>\w+)(?<format>:[^}]+)?}", RegexOptions.Compiled);

How can I modify this regular expression to ignore escaped braces ? What seems really hard is that I should detect placeholders depending on whether the number of brackets is odd or even... I can't think of a simple way to do it with a regular expression, is it even possible ?


For completeness, here's the full code of the StringTemplate class :

public class StringTemplate
{
    private string _template;
    private static Regex _regex = new Regex(@"{(?<key>\w+)(?<format>:[^}]+)?}", RegexOptions.Compiled);

    public StringTemplate(string template)
    {
        if (template == null)
            throw new ArgumentNullException("template");
        this._template = template;
    }

    public static implicit operator StringTemplate(string s)
    {
        return new StringTemplate(s);
    }

    public override string ToString()
    {
        return _template;
    }

    public string Format(IDictionary<string, object> values)
    {
        if (values == null)
        {
            throw new ArgumentNullException("values");
        }

        Dictionary<string, int> indexes = new Dictionary<string, int>();
        object[] array = new object[values.Count];
        int i = 0;
        foreach (string key in values.Keys)
        {
            array[i] = values[key];
            indexes.Add(key, i++);
        }

        MatchEvaluator evaluator = (m) =>
        {
            if (m.Success)
            {
                string key = m.Groups["key"].Value;
                string format = m.Groups["format"].Value;
                int index = -1;
                if (indexes.TryGetValue(key, out index))
                {
                    return string.Format("{{{0}{1}}}", index, format);
                }
            }
            return string.Format("{{{0}}}", m.Value);
        };

        string templateWithIndexes = _regex.Replace(_template, evaluator);
        return string.Format(templateWithIndexes, array);
    }

    private static IDictionary<string, object> MakeDictionary(object obj)
    {
        Dictionary<string, object> dict = new Dictionary<string, object>();
        foreach (var prop in obj.GetType().GetProperties())
        {
            dict.Add(prop.Name, prop.GetValue(obj, null));
        }
        return dict;
    }

    public string Format(object values)
    {
        return Format(MakeDictionary(values));
    }

    public static string Format(string template, IDictionary<string, object> values)
    {
        return new StringTemplate(template).Format(values);
    }


    public static string Format(string template, object values)
    {
        return new StringTemplate(template).Format(values);
    }
}
+3  A: 

It may well be possible with regular expressions - but I'm not at all convinced that it will be the easiest solution to maintain. Given that you're really only interested in braces and colons here (I think), I would personally avoid using regular expressions.

I would construct a sequence of tokens, each one either a literal or a format string. Construct this just by walking along the string and noticing the opening and closing braces. Then evaluating the sequence is just a matter of concatenating the tokens, formatting each one where appropriate.

Then again I've never been much of a fan of regular expressions - just occasionally they're wonderful, but a lot of the time they feel like overkill. Maybe there's some clever way to get them to do what you want in this case...

Btw, you're going to need to define what you want to happen in cases where the braces aren't matched properly, e.g.

{{Name} foo
Jon Skeet
Thanks for your answer Jon ! I'd like to stick to regular expressions if it's possible, but I might have to do what you suggest eventually... Regarding unmatched braces, I want the same behavior as String.Format : the number of braces at the start and end of the placeholder doesn't need to be the same, but the parity has to be the same
Thomas Levesque
+3  A: 

Parity is generally very easy to decide using regular expressions. For example, this is an expression that matches any string with an even number of As, but not an odd number:

(AA)*

So all you need to do is find the expression that matches only an odd number of {s and }s.

{({{)*
}(}})*

(escaping the characters notwithstanding). So adding this idea to you current expression will yield something like

{({{)*(?<key>\w+)(?<format>:[^}]+)?}(}})*

However, this doesn't match the cardinality of braces on both sides. In other words, {{{ will match }, because they're both odd. Regular expressions can't count things, so you're not going to be able to find an expression that matches cardinality like you want.

Really, what you should be doing is parsing the strings with a custom parser that reads the string and counts instances of { but not instances of {{ in order to match them against instances of } but not }} on the other side. I think you'll find this is how String formatters in .NET work behind the scenes anyway, as regular expressions aren't suited for parsing nested structures of any kind.

Or you can use both ideas in concert: match potential tokens with a regular expression, then validate their braces balance using a quick check on the resulting match. That would probably end up being confusing and indirect, though. You're usually better off writing your own parser for this kind of scenario.

Welbog
Thanks Welbog, that looks interesting... The regex you suggest actually doesn't work (when it sees "Hello {{Name}}" it just matches "{Name}", ignoring the extra braces), but I think you're on the right track... I need to have a look at the regex documentation, I might find an option to make it work
Thomas Levesque
Oh, BTW, regular expressions *can* count... there is a construct called "balancing groups" that allows to match nested patterns. It's rather poorly documented, but there is an example in MSDN : http://msdn.microsoft.com/en-us/library/bs2twtah.aspx#BalancingGroupDefinitionExample. Anyway it doesn't really matter in my case, I just need the parity to be the same on both sides
Thomas Levesque
+1  A: 

You can use a regex to match a balanced pair, then figure out what to do with the braces. Remember that .NET regexs aren't "regular".

class Program {
    static void Main(string[] args) {
        var d = new Dictionary<string, string> { { "Name", "World" } };
        var t = new Test();
        Console.WriteLine(t.Replace("Hello {Name}", d));
        Console.WriteLine(t.Replace("Hello {{Name}}", d));
        Console.WriteLine(t.Replace("Hello {{{Name}}}", d));
        Console.WriteLine(t.Replace("Hello {{{{Name}}}}", d));
        Console.ReadKey();
    }
}

class Test {

    private Regex MatchNested = new Regex(
        @"\{ (?>
                ([^{}]+)
              | \{ (?<D>)
              | \} (?<-D>)
              )*
              (?(D)(?!))
           \}",
             RegexOptions.IgnorePatternWhitespace
           | RegexOptions.Compiled 
           | RegexOptions.Singleline);

    public string Replace(string input, Dictionary<string, string> vars) {
        Matcher matcher = new Matcher(vars);
        return MatchNested.Replace(input, matcher.Replace);
    }

    private class Matcher {

        private Dictionary<string, string> Vars;

        public Matcher(Dictionary<string, string> vars) {
            Vars = vars;
        }

        public string Replace(Match m) {
            string name = m.Groups[1].Value;
            int length = (m.Groups[0].Length - name.Length) / 2;
            string inner = (length % 2) == 0 ? name : Vars[name];
            return MakeString(inner, length / 2);
        }

        private string MakeString(string inner, int braceCount) {
            StringBuilder sb = new StringBuilder(inner.Length + (braceCount * 2));
            sb.Append('{', braceCount);
            sb.Append(inner);
            sb.Append('}', braceCount);
            return sb.ToString();
        }

    }

}
Gavin
Thanks Gavin ! I knew about balancing groups, but they're not necessary here, because the number of opening and closing braces doesn't have to be the same. This is a legal format string : "Hello {{{Name}" ; it would be formatted as "Hello {World". However, your idea of checking the number of braces in the replacement code is pretty good, I will look into it.
Thomas Levesque
I eventually used a technique similar to yours, so I accept your answer. Thanks !
Thomas Levesque
A: 

I eventually used a technique similar to what Gavin suggested.

I changed the regular expression so that it matches all braces around the placeholder :

private static Regex _regex = new Regex(@"(?<open>{+)(?<key>\w+)(?<format>:[^}]+)?(?<close>}+)", RegexOptions.Compiled);

And I changed the logic the MatchEvaluator so that it handles escaped braces properly :

        MatchEvaluator evaluator = (m) =>
        {
            if (m.Success)
            {
                string open = m.Groups["open"].Value;
                string close = m.Groups["close"].Value;
                string key = m.Groups["key"].Value;
                string format = m.Groups["format"].Value;

                if (open.Length % 2 == 0)
                    return m.Value;

                open = RemoveLastChar(open);
                close = RemoveLastChar(close);

                int index = -1;
                if (indexes.TryGetValue(key, out index))
                {
                    return string.Format("{0}{{{1}{2}}}{3}", open, index, format, close);
                }
                else
                {
                    return string.Format("{0}{{{{{1}}}{2}}}{3}", open, key, format, close);
                }
            }
            return m.Value;
        };

I rely on String.Format to throw a FormatException if necessary. I made a few unit tests, and so far it seems to work fine...

Thanks everyone for your help !

Thomas Levesque