views:

145

answers:

2

I'm wondering if there is a library like Boost Format, but which supports named parameters rather than positional ones. This is a common idiom in e.g. Python, where you have a context to format strings with that may or may not use all available arguments, e.g.

mouse_state = dict(button=0, x=50, y=30)
"You clicked %(button)s at %(x)d,%(y)d." % mouse_state
"Targeting %(x)d, %(y)d." % mouse_state

In Googling I have found many libraries offering variations of positional parameters, but none that support named ones. Ideally the library has few dependencies so I can drop it easily into my code. C++ won't be quite as idiomatic for collecting named arguments, but probably someone out there has thought more about it than me.

Performance is important, in particular I'd like to keep memory allocations down (always tricky in C++), since this may be run on devices without virtual memory. But having even a slow one to start from will probably be faster than writing it from scratch myself.

A: 

The answer appears to be, no, there is not a C++ library that does this, and C++ programmers apparently do not even see the need for one, based on the comments I have received. I will have to write my own yet again.

Joe
Actually, I voted the question up because I find it interesting. I have written some kind of formatter taking a context (map) as argument, but the need was vastly different: I wanted to choose between different possible generated outputs rather than precisely controlling the formatting of numbers, padding, length, etc... I don't think the jump would be too important from the Boost.Format library... but the question is: do you want to try and read boost files ;) ?
Matthieu M.
+1  A: 

Well I'll add my own answer as well, not that I know (or have coded) such a library, but to answer to the "keep the memory allocation down" bit.

As always I can envision some kind of speed / memory trade-off.

On the one hand, you can parse "Just In Time":

class Formater:
  def __init__(self, format): self._string = format

  def compute(self):
    for k,v in context:
      while self.__contains(k):
        left, variable, right = self.__extract(k)
        self._string = left + self.__replace(variable, v) + right

This way you don't keep a "parsed" structure at hand, and hopefully most of the time you'll just insert the new data in place (unlike Python, C++ strings are not immutable).

However it's far from being efficient...

On the other hand, you can build a fully constructed tree representing the parsed format. You will have several classes like: Constant, String, Integer, Real, etc... and probably some subclasses / decorators as well for the formatting itself.

I think however than the most efficient approach would be to have some kind of a mix of the two.

  • explode the format string into a list of Constant, Variable
  • index the variables in another structure (a hash table with open-addressing would do nicely, or something akin to Loki::AssocVector).

There you are: you're done with only 2 dynamically allocated arrays (basically). If you want to allow a same key to be repeated multiple times, simply use a std::vector<size_t> as a value of the index: good implementations should not allocate any memory dynamically for small sized vectors (VC++ 2010 doesn't for less than 16 bytes worth of data).

When evaluating the context itself, look up the instances. You then parse the formatter "just in time", check it agaisnt the current type of the value with which to replace it, and process the format.

Pros and cons: - Just In Time: you scan the string again and again - One Parse: requires a lot of dedicated classes, possibly many allocations, but the format is validated on input. Like Boost it may be reused. - Mix: more efficient, especially if you don't replace some values (allow some kind of "null" value), but delaying the parsing of the format delays the reporting of errors.

Personally I would go for the One Parse scheme, trying to keep the allocations down using boost::variant and the Strategy Pattern as much I could.

Matthieu M.
In practice, on many platforms, you will find that the cost of the heap allocation for any kind of vector will be much larger than the "far from efficient" solution, which has major cache advantages. And on platforms without virtual memory, even fast allocations cause slow death by fragmentation.
Joe
@Joe: yes the "far from efficient" was a bit much. But it depends heavily on the kind of format. If there are 1 or 2 replacements in a 25 chars strings, it will be efficient; if there are a few dozen of occurrences of each variable in a few kilobytes of text, it'll slow down. That's always the issue with efficiency: small inputs are affected by constants while large inputs are affected by the big O :/
Matthieu M.