views:

849

answers:

7

I'm looking for a clean C++ way to parse a string containing expressions wrapped in ${} and build a result string from the programmatically evaluated expressions.

Example: "Hi ${user} from ${host}" will be evaluated to "Hi foo from bar" if I implement the program to let "user" evaluate to "foo", etc.

The current approach I'm thinking of consists of a state machine that eats one character at a time from the string and evaluates the expression after reaching '}'. Any hints or other suggestions?

Note: boost:: is most welcome! :-)

Update Thanks for the first three suggestions! Unfortunately I made the example too simple! I need to be able examine the contents within ${} so it's not a simple search and replace. Maybe it will say ${uppercase:foo} and then I have to use "foo" as a key in a hashmap and then convert it to uppercase, but I tried to avoid the inner details of ${} when writing the original question above... :-)

A: 

How many evaluation expressions do intend to have? If it's small enough, you might just want to use brute force.

For instance, if you have a std::map<string, string> that goes from your key to its value, for instance user to Matt Cruikshank, you might just want to iterate over your entire map and do a simple replace on your string of every "${" + key + "}" to its value.

Matt Cruikshank
A: 

Boost::Regex would be the route I'd suggest. The regex_replace algorithm should do most of your heavy lifting.

Harper Shelby
A: 

If you don't like my first answer, then dig in to Boost Regex - probably boost::regex_replace.

Matt Cruikshank
A: 

How complex can the expressions get? Are they just identifiers, or can they be actual expressions like "${numBad/(double)total*100.0}%"?

Martin C. Martin
A: 

Do you have to use the ${ and } delimiters or can you use other delimiters?

You don't really care about parsing. You just want to generate and format strings with placeholder data in it. Right?

For a platform neutral approach, consider the humble sprintf function. It is the most ubiquitous and does what I am assuming that you need. It works on "char stars" so you are going to have to get into some memory management.

Are you using STL? Then consider the basic_string& replace function. It doesn't do exactly what you want but you could make it work.

If you are using ATL/MFC, then consider the CStringT::Format method.

Glenn
Sorry, I need that format. Regarding sprintf, I would not be in control of the format strings and using a printf()-style call with a format string that is not a literal would be potentially dangerous.
divideandconquer.se
+4  A: 
#include <iostream>
#include <conio.h>
#include <string>
#include <map>

using namespace std;

struct Token
{
    enum E
    {
     Replace,
     Literal,
     Eos
    };
};

class ParseExp
{
private:
    enum State
    {
     State_Begin,
     State_Literal,
     State_StartRep,
     State_RepWord,
     State_EndRep
    };

    string   m_str;
    int    m_char;
    unsigned int m_length;
    string   m_lexme;
    Token::E  m_token;
    State   m_state;

public:
    void Parse(const string& str)
    {
     m_char = 0;
     m_str = str;
     m_length = str.size();
    }

    Token::E NextToken()
    {
     if (m_char >= m_length)
      m_token = Token::Eos;

     m_lexme = "";
     m_state = State_Begin;
     bool stop = false;
     while (m_char <= m_length && !stop)
     {
      char ch = m_str[m_char++];
      switch (m_state)
      {
      case State_Begin:
       if (ch == '$')
       {
        m_state = State_StartRep;
        m_token = Token::Replace;
        continue;
       }
       else
       {
        m_state = State_Literal;
        m_token = Token::Literal;
       }
       break;

      case State_StartRep:
       if (ch == '{')
       {
        m_state = State_RepWord;
        continue;
       }
       else
        continue;
       break;

      case State_RepWord:
       if (ch == '}')
       {
        stop = true;
        continue;
       }
       break;

      case State_Literal:
       if (ch == '$')
       {
        stop = true;
        m_char--;
        continue;
       }
      }

      m_lexme += ch;
     }

     return  m_token;
    }

    const string& Lexme() const
    {
     return m_lexme;
    }

    Token::E Token() const
    {
     return m_token;
    }
};

string DoReplace(const string& str, const map<string, string>& dict)
{
    ParseExp exp;
    exp.Parse(str);
    string ret = "";
    while (exp.NextToken() != Token::Eos)
    {
     if (exp.Token() == Token::Literal)
      ret += exp.Lexme();
     else
     {
      map<string, string>::const_iterator iter = dict.find(exp.Lexme());
      if (iter != dict.end())
       ret += (*iter).second;
      else
       ret += "undefined(" + exp.Lexme() + ")";
     }
    }
    return ret;
}

int main()
{
    map<string, string> words;
    words["hello"] = "hey";
    words["test"] = "bla";
    cout << DoReplace("${hello} world ${test} ${undef}", words);
    _getch();
}

I will be happy to explain anything about this code :)

nlaq
Why `m_char <= m_length` and not `m_char < m_length`? If the string ends with a literal the "lexme" will end will a null byte.
divideandconquer.se
A: 

If you are managing the variables separately, why not go the route of an embeddable interpreter. I have used tcl in the past, but you might try lua which is designed for embedding. Ruby and Python are two other embeddable interpreters that are easy to embed, but aren't quite as lightweight. The strategy is to instantiate an interpreter (a context), add variables to it, then evaluate strings within that context. An interpreter will properly handle malformed input that could lead to security or stability problems for your application.

David Nehme