views:

566

answers:

7

I have a template string and an array of parameters that come from different sources but need to be matched up to create a new "filled-in" string:

string templateString = GetTemplate();   // e.g. "Mr {0} has a {1}"
string[] dataItems = GetDataItems();     // e.g. ["Jones", "ceiling cat"}

string resultingString = String.Format(templateString, dataItems);
// e.g. "Mr Jones has a ceiling cat"

With this code, I'm assuming that the number of string format placeholders in the template will equal the number of data items. It's generally a fair assumption in my case, but I want to be able to produce a resultingString without failing even if the assumption is wrong. I don't mind if there are empty spaces for missing data.

If there are too many items in dataItems, the String.Format method handles it fine. If there aren't enough, I get an Exception.

To overcome this, I'm counting the number of placeholders and adding new items to the dataItems array if there aren't enough.

To count the placeholders, the code I'm working with at the moment is:

private static int CountOccurrences(string haystack)
{
    // Loop through all instances of the string "}".
    int count = 0;
    int i = 0;
    while ((i = text.IndexOf("}", i)) != -1)
    {
        i++;
        count++;
    }
    return count;
}

Obviously this makes the assumption that there aren't any closing curly braces that aren't being used for format placeholders. It also just feels wrong. :)

Is there a better way to count the string format placeholders in a string?


A number of people have correctly pointed out that the answer I marked as correct won't work in many circumstances. The main reasons are:

  • Regexes that count the number of placeholders doesn't account for literal braces ( {{0}} )
  • Counting placeholders doesn't account for repeated or skipped placeholders (e.g. "{0} has a {1} which also has a {1}")
+8  A: 

You can always use Regex:

using System.Text.RegularExpressions;
// ... more code
string templateString = "{0} {2} .{{99}}. {3}"; 
Match match = Regex.Matches(templateString, 
             @"(?<!\{)\{(?<number>[0-9]+).*?\}(?!\})")
            .Cast<Match>()
            .OrderBy(m => m.Groups["number"].Value)
            .LastOrDefault();
Console.WriteLine(match.Groups["number"].Value); // Display 3
Paulo Santos
Thanks for that, I'll give it a try.
Damovisa
For reference, the code that worked was: int len = new System.Text.RegularExpressions.Regex("{[0-9]+.*?}").Matches(template).Count;
Damovisa
The problem is that the character { and } are special in a Regular Expression, as per documentation: http://msdn.microsoft.com/en-us/library/3206d374.aspx
Paulo Santos
Yes, sorry, that's correct. The string was @"\{[0-9]+.*?\}"
Damovisa
This won't work - it does not take account of literal braces - {{ or }} and in any case counting the number of format specifiers isn't much use - see my answer.
Joe
Ok Joe. A solution would then be to change RegEx to get captures and loop through all of them and put them into a List while checking their previous existence.
Robert Koritnik
This answer should not have been marked as the correct answer, as it is wrong. You actually need the highest numbered tag, so if {4} is the highest, 5 parameters are needed.
Philippe Leybaert
I agree: this should not be the accepted correct answer, even the author of the original thinks it works for him. (Or... maybe it should, because of that? At least I don't think it should be.)
peSHIr
Caved to peer pressure - un-correct-marked :)
Damovisa
But @activa and peSHlr - this doesn't look for the highest numbered tag, it counts the number of tags. Having said that, it doesn't account for literal braces as Joe mentioned. So yes, it is wrong.
Damovisa
Disclaimer: my sentiments are with Jamie Zawinski as far as regular expressions are concerned. I couldn't tell you if the above code works without (a) comments stating exactly what it is trying to do, and (b) a full set of unit tests for all edge cases.
Joe
A: 

You could use a regular expression to count the {} pairs that have only the formatting you'll use between them. @"\{\d+\}" is good enough, unless you use formatting options.

John Fisher
Thanks, yeah, I'm not formatting anything - everything comes through as a string.
Damovisa
+3  A: 

Not actually an answer to your question, but a possible solution to your problem (albeit not a perfectly elegant one); you could pad your dataItems collection with a number of string.Empty instances, since string.Format does not care about redundant items.

jerryjvl
True, and something I thought of. I would be making an assumption about the maximum number of placeholders though. That and if the count matches (which it usually does), it's a bit of a waste of time and space...
Damovisa
How much waste it is depends a bit on how you create the 'dataItems' array... if you are constructing it in a 'new' already then the waste of time will be really negligible, and the waste of space is limited by the fact that you use a reference to 'string.Empty', which is a single instance no matter how often you refer to it; as long as the array does not stay around very long the scope of the space waste is really also fairly minimal... all of this obviously depends strongly on how and how often these arrays are created.
jerryjvl
+9  A: 

Counting the placeholders doesn't help - consider the following cases:

"{0} ... {1} ... {0}" - needs 2 values

"{1} {3}" - needs 4 values of which two are ignored

The second example isn't farfetched.

For example, you may have something like this in US English:

String.Format("{0} {1} {2} has a {3}", firstName, middleName, lastName, animal);

In some cultures, the middle name may not be used and you may have:

String.Format("{0} {2} ... {3}", firstName, middleName, lastName, animal);

If you want to do this, you need to look for the format specifiers {index[,length][:formatString]} with the maximum index, ignoring repeated braces (e.g. {{n}}). Repeated braces are used to insert braces as literals in the output string. I'll leave the coding as an exercise :) - but I don't think it can or should be done with Regex in the most general case (i.e. with length and/or formatString).

And even if you aren't using length or formatString today, a future developer may think it's an innocuous change to add one - it would be a shame for this to break your code.

I would try to mimic the code in StringBuilder.AppendFormat (which is called by String.Format) even though it's a bit ugly - use Lutz Reflector to get this code. Basically iterate through the string looking for format specifiers, and get the value of the index for each specifier.

Joe
Excellent elaboration. +1
peSHIr
Yeah, great point.Teaches me I should definitely wait a bit longer before marking a correct answer.
Damovisa
+6  A: 

Merging Damovisa's and Joe's answers. I've updated answer afer Aydsman's nad activa's comments.

int count = Regex.Matches(templateString, @"(?<!\{)\{([0-9]+).*?\}(?!})")  //select all placeholders - placeholder ID as separate group
                 .Cast<Match>() // cast MatchCollection to IEnumerable<Match>, so we can use Linq
                 .Max(m => int.Parse(m.Groups[1].Value)) + 1; // select maximum value of first group (it's a placegolder ID) converted to int

This approach will work for templates like:

"{0} aa {2} bb {1}" => count = 3

"{4} aa {0} bb {0}, {0}" => count = 5

"{0} {3} , {{7}}" => count = 4

Marqus
To correctly handle literal curly braces, change the regex to ignore them: @"(?<!\{)\{([0-9]+).*?\}(?!})"This way the (valid) "{4} aa {{0}} bb {0}, {0}" string also correctly matches ignoring the second zero.
Aydsman
+1  A: 

Since I don't have the authority to edit posts, I'll propose my shorter (and correct) version of Marqus' answer:

int num = Regex.Matches(templateString,@"(?<!\{)\{([0-9]+).*?\}(?!})")
             .Cast<Match>()
             .Max(m => int.Parse(m.Groups[0].Value)) + 1;

I'm using the regex proposed by Aydsman, but haven't tested it.

Philippe Leybaert
+1  A: 

Perhaps you are trying to crack a nut with a sledgehammer?

Why not just put a try/catch around your call to String.Format.

It's a bit ugly, but solves your problem in a way that requires minimal effort, minimal testing, and is guaranteed to work even if there is something else about formatting strings that you didn't consider (like {{ literals, or more complex format strings with non-numeric characters inside them: {0:$#,##0.00;($#,##0.00);Zero})

(And yes, this means you won't detect more data items than format specifiers, but is this a problem? Presumably the user of your software will notice that they have truncated their output and rectify their format string?)

Jason Williams
Valid suggestion, but I need it to accept the input and generate a message regardless of how "wrong" it is.
Damovisa