views:

179

answers:

4

I've built an interpreter in C++ for a language created by me.

One main problem in the design was that I had two different types in the language: number and string. So I have to pass around a struct like:

class myInterpreterValue
{
 myInterpreterType type;
 int intValue;
 string strValue;
}

Objects of this class are passed around million times a second during e.g.: a countdown loop in my language.

Profiling pointed out: 85% of the performance is eaten by the allocation function of the string template.

This is pretty clear to me: My interpreter has bad design and doesn't use pointers enough. Yet, I don't have an option: I can't use pointers in most cases as I just have to make copies.

How to do something against this? Is a class like this a better idea?

vector<string> strTable;
vector<int> intTable;
class myInterpreterValue
{
 myInterpreterType type;
 int locationInTable;
}

So the class only knows what type it represents and the position in the table

This however again has disadvantages: I'd have to add temporary values to the string/int vector table and then remove them again, this would eat a lot of performance again.

  • Help, how do interpreters of languages like Python or Ruby do that? They somehow need a struct that represents a value in the language like something that can either be int or string.
+1  A: 

I think some dynamic languages cache all equivalent strings at runtime with a hash lookup and only store pointers. In each iteration of the loop where the string is staying the same, therefore, there would be just a pointer assigment or at most a string hashing function. I know some languages (Smalltalk, I think?) do this with not only strings but small numbers. See Flyweight Pattern.

IANAE on this one. If that doesn't help, you should give the loop code and walk us through how it's being interpreted.

Jesse Millikan
+3  A: 

I suspect many values aren't strings. So the first thing you can do is to get rid of the string object if you don't need it. Put it into an union. Another thing is that probably many of your strings are only small, thus you can get rid of heap allocation if you save small strings in the object itself. LLVM has the SmallString template for that. And then you can use string interning, as another answer says too. LLVM has the StringPool class for that: Call intern("foo") and get a smart pointer refering to a shared string potentially used by other myInterpreterValue objects too.

The union can be written like this

class myInterpreterValue {
 boost::variant<int, string> value;
};

boost::variant does the type tagging for you. You can implement it like this, if you don't have boost. The alignment can't be gotten portably in C++ yet, so we push some types that possibly require some large alignment into the storage union.

class myInterpreterValue {
 union Storage {
   // for getting alignment
   long double ld_;
   long long ll_;

   // for getting size
   int i1;
   char s1[sizeof(string)];

   // for access
   char c;
 };
 enum type { IntValue, StringValue } m_type;

 Storage m_store;
 int *getIntP() { return reinterpret_cast<int*>(&m_store.c); }
 string *getStringP() { return reinterpret_cast<string*>(&m_store.c); }


public:
  myInterpreterValue(string const& str) {
    m_type = StringValue;
    new (static_cast<void*>(&m_store.c)) string(str);
  }

  myInterpreterValue(int i) {
    m_type = IntValue;
    new (static_cast<void*>(&m_store.c)) int(i);
  }
  ~myInterpreterValue() {
    if(m_type == StringValue) {
      getStringP()->~string(); // call destructor
    }
  }
  string &asString() { return *getStringP(); }
  int &asInt() { return *getIntP(); }
};

You get the idea.

Johannes Schaub - litb
Should provide a way to check the type of value stored and throw an exception in asX if the type is wrong. (@OP: I imagine the latter was left out for clarity in the example, but you need to do it.)
Roger Pate
Well, the best way would be to use `Boost.Variant` anyway, litb was merely demonstrating :)
Matthieu M.
I wouldn't suggest to go this way, because it uses unsafe tricks while it's totally possible to avoid them. And another problem is that it makes debugging harder.
Smilediver
@Smilediver, personally, i would use the StringPool approach. But if most strings are small like "i", "j" and so on, i think a small-string approach together with that union can do it fine too. Notice that this union implementation isn't complete: It needs a copy constructor and copy assignment operator too, and like @Roger says it needs to do checks/asserts in the `asX` functions.
Johannes Schaub - litb
A: 

The easiest way to solve that would be to make it a pointer to string, and only allocate it when you create the string value. You can also use union to save on memory.

class myInterpreterValue
{
 myInterpreterType type;
 union {
  int asInt;
  string* asString;
 } value;
}
Smilediver
+1  A: 

In both Python and Ruby, integers are objects. So it's not a question of a "value" being either an integer or a string, it can be anything at all. Furthermore, everything in both of those languages is garbage collected. There's no need for copying of objects, pointers can be used internally so long as they are safely stored somewhere the garbage collector will see them.

So, one solution to your problem would be:

class myInterpreterValue {
    virtual ~myInterpreterValue() {}
    // example of a possible member function
    virtual string toString() const = 0;
};

class myInterpreterStringValue : public myInterpreterValue {
    string value;
    virtual string toString() const { return value; }
};

class myInterpreterIntValue : public myInterpreterValue {
    int value;
    virtual string toString() const {
        char buf[12]; // yeah, int might be more than 32 bits. Whatever.
        sprintf(buf, "%d", value);
        return buf;
    }
};

Then use virtual calls and dynamic_cast to switch on or check types, instead of comparing against values of myInterpreterType.

The usual thing to do at this point is worry that virtual function calls and dynamic cast might be slow. Both Ruby and Python use virtual function calls all over the place. Albeit not C++ virtual calls: for both languages their "standard" implementation is in C with custom mechanisms for polymorphism. But there's no reason in principle to assume that "virtual" means "performance out the window".

That said, I expect they probably both have some clever optimisations for certain uses of integers, including as loop counters. But if you're currently seeing that most of your time is spent copying empty strings, then virtual function calls by comparison are near-instantaneous.

The real worry is how you're going to do resource-management - depending what your plans are for your interpreted language, garbage collection might be more trouble than you want to go to.

Steve Jessop