views:

215

answers:

5

"Introduction"

I'm relatively new to C++. I went through all the basic stuff and managed to build 2-3 simple interpreters for my programming languages.

The first thing that gave and still gives me a headache: Implementing the type system of my language in C++

Think of that: Ruby, Python, PHP and Co. have a lot of built-in types which obviously are implemented in C. So what I first tried was to make it possible to give a value in my language three possible types: Int, String and Nil.

I came up with this:

enum ValueType
{
     Int, String, Nil
};

class Value
{
 public:
  ValueType type;
  int intVal;
  string stringVal;
};

Yeah, wow, I know. It was extremely slow to pass this class around as the string allocator had to be called all the time.

Next time I've tried something similar to this:

enum ValueType
{
     Int, String, Nil
};

extern string stringTable[255];
class Value
{
 public:
  ValueType type;
  int index;
};

I would store all strings in stringTable and write their position to index. If the type of Value was Int, I just stored the integer in index, it wouldn't make sense at all using an int index to access another int, or?

Anyways, the above gave me a headache too. After some time, accessing the string from the table here, referencing it there and copying it over there grew over my head - I lost control. I had to put the interpreter draft down.

Now: Okay, so C and C++ are statically typed.

  • How do the main implementations of the languages mentioned above handle the different types in their programs (fixnums, bignums, nums, strings, arrays, resources,...)?

  • What should I do to get maximum speed with many different available types?

  • How do the solutions compare to my simplified versions above?

+1  A: 

Regarding speed, you say:

It was extremely slow to pass this class around as the string allocator had to be called all the time.

You do know that you should be passing objects by reference the vast majority of the time? Your solution looks workable for a simple interpreter.

anon
That wasn't so easy most of the time. I had to make several copies as many functions modify the values temporarily.
sub
@sub Well that sounds dubious. I can't see why you would ever change a value, once it was created, unless the user is assigning to it.
anon
+3  A: 

One obvious solution is to define a type hierarchy:

class Type
{
};

class Int : public Type
{
};

class String : public Type
{
};

and so on. As a complete example, let us write an interpreter for a tiny language. The language allows declaring variables like this:

var a 10

That will create an Int object, assign it the value 10 and store it in a variable's table under the name a. Operations can be invoked on variables. For instance the addition operation on two Int values looks like:

+ a b

Here is the complete code for the interpreter:

#include <iostream>
#include <string>
#include <vector>
#include <sstream>
#include <cstdlib>
#include <map>

// The base Type object from which all data types are derived.
class Type
{
public:
  typedef std::vector<Type*> TypeVector;
  virtual ~Type () { }

  // Some functions that you may want all types of objects to support:

  // Returns the string representation of the object.
  virtual const std::string toString () const = 0;
  // Returns true if other_obj is the same as this.
  virtual bool equals (const Type &other_obj) = 0;
  // Invokes an operation on this object with the objects in args
  // as arguments.
  virtual Type* invoke (const std::string &opr, const TypeVector &args) = 0;
};

// An implementation of Type to represent an integer. The C++ int is
// used to actually store the value.  As a consequence this type is
// machine dependent, which might not be what you want for a real
// high-level language.
class Int : public Type
{
public:
  Int () : value_ (0), ret_ (NULL) { }
  Int (int v) : value_ (v), ret_ (NULL) { }
  Int (const std::string &v) : value_ (atoi (v.c_str ())), ret_ (NULL) { }
  virtual ~Int ()
  {
    delete ret_;
  }
  virtual const std::string toString () const
  {
    std::ostringstream out;
    out << value_;
    return out.str ();
  }
  virtual bool equals (const Type &other_obj)
  {    
    if (&other_obj == this) 
      return true;
    try
      {
        const Int &i = dynamic_cast<const Int&> (other_obj);
        return value_ == i.value_;
      }
    catch (std::bad_cast ex)
      {
        return false;
      }
  }
  // As of now, Int supports only addition, represented by '+'.
  virtual Type* invoke (const std::string &opr, const TypeVector &args)    
  {
    if (opr == "+")
      {
        return add (args);
      }
    return NULL;
  }
private:
  Type* add (const TypeVector &args)
  {
    if (ret_ == NULL) ret_ = new Int;
    Int *i = dynamic_cast<Int*> (ret_);
    Int *arg = dynamic_cast<Int*> (args[0]);
    i->value_ = value_ + arg->value_;
    return ret_;
  }
  int value_;
  Type *ret_;
};

// We use std::map as a symbol (or variable) table.
typedef std::map<std::string, Type*> VarsTable;
typedef std::vector<std::string> Tokens;

// A simple tokenizer for our language. Takes a line and
// tokenizes it based on whitespaces.  
static void
tokenize (const std::string &line, Tokens &tokens)
{
  std::istringstream in (line, std::istringstream::in);
  while (!in.eof ())
    {
      std::string token;
      in >> token;
      tokens.push_back (token);
    }
}

// Maps varName to an Int object in the symbol table.  To support
// other Types, we need a more complex interpreter that actually infers
// the type of object by looking at the format of value.
static void
setVar (const std::string &varName, const std::string &value,
        VarsTable &vars)
{
  Type *t = new Int (value);
  vars[varName] = t;
}

// Returns a previously mapped value from the symbol table.
static Type *
getVar (const std::string &varName, const VarsTable &vars)
{
  VarsTable::const_iterator iter = vars.find (varName);
  if (iter == vars.end ())
    {
      std::cout << "Variable " << varName 
                << " not found." << std::endl;
      return NULL;
    }
  return const_cast<Type*> (iter->second);
}

// Invokes opr on the object mapped to the name var01.
// opr should represent a binary operation. var02 will
// be pushed to the args vector. The string represenation of
// the result is printed to the console.
static void
invoke (const std::string &opr, const std::string &var01,
        const std::string &var02, const VarsTable &vars)
{
  Type::TypeVector args;
  Type *arg01 = getVar (var01, vars);
  if (arg01 == NULL) return;
  Type *arg02 = getVar (var02, vars);
  if (arg02 == NULL) return;
  args.push_back (arg02);
  Type *ret = NULL;
  if ((ret = arg01->invoke (opr, args)) != NULL)
    std::cout << "=> " << ret->toString () << std::endl;
  else
    std::cout << "Failed to invoke " << opr << " on " 
              << var01 << std::endl;
}

// A simple REPL for our language. Type 'quit' to exit
// the loop.
int 
main (int argc, char **argv)
{
  VarsTable vars;
  std::string line;
  while (std::getline (std::cin, line))
    {
      if (line == "quit")
        break;
      else
        {
          Tokens tokens;
          tokenize (line, tokens);
          if (tokens.size () != 3)
            {
              std::cout << "Invalid expression." << std::endl;
              continue;
            }
          if (tokens[0] == "var")
            setVar (tokens[1], tokens[2], vars);
          else
            invoke (tokens[0], tokens[1], tokens[2], vars);
        }
    }  
  return 0;
}

A sample interaction with the interpreter:

/home/me $ ./mylang

var a 10
var b 20
+ a b
30
+ a c
Variable c not found.
quit
Vijay Mathew
Can I store all of them in a single array? Or are they completely different types like string vs int? I'm relatively new to this.
sub
+1 @Vijay Mathew: My thoughts exactly
hhafez
@Vijay: Could you please explain how and where to store the instances of the classes?
sub
This is a common solution in many systems. It must be combined with dynamic allocation it is the direct evolution of the C solution I provided, where you add the hierarchy and substitute the C style cast with a dynamic cast. The actual type information is usually stored in the base class (as an enum) or obtained with the typeid() operator
David Rodríguez - dribeas
@sub The most C++ way is to use std::map as a symbol table. I will modify my answer with a more complete example.
Vijay Mathew
+1  A: 

C++ is a strongly typed language. I can see that you are coming from a non typed language and still think in those terms.

If you really need to store several types in a variable then take a look at boost::any.

However, if you are implementing an interpreter you should be using inheritance and classes that represent a specific type.

Fozi
That's not my problem. You didn't tell me *how* and *where* to store the instances of those classes.
sub
I have upvoted, as the answer is good: use boost::any. I fail to see how it is not your problem.
David Rodríguez - dribeas
+1, agree. boost::any is a solution to the question written, even if it's not a (complete) answer to the question in sub's head.
MSalters
+4  A: 

There are a couple of different things that you can do here. Different solutions have come up in time, and most of them require dynamic allocation of the actual datum (boost::variant can avoid using dynamically allocated memory for small objects --thanks @MSalters).

Pure C approach:

Store type information and a void pointer to memory that has to be interpreted according to the type information (usually an enum):

enum type_t {
   integer,
   string,
   null
};
typedef struct variable {
   type_t type;
   void * datum;
} variable_t;
void init_int_variable( variable_t * var, int value )
{
   var->type = integer;
   var->datum = malloc( sizeof(int) );
   *((int)var->datum) = value;
}
void fini_variable( variable_t var ) // optionally by pointer
{
   free( var.datum );
}

In C++ you can improve this approach by using classes to simplify the usage, but more importantly you can go for more complex solutions and use existing libraries as boost::any or boost::variant that offer different solutions to the same problem.

Both boost::any and boost::variant store the values in dynamically allocated memory, usually through a pointer to a virtual class in a hierarchy, and with operators that reinterpret (down casts) to the concrete types.

David Rodríguez - dribeas
I don't think that boost::variant (or even boost::any) _require_ dynamic allocation. They've got special tricks to sidestep that when possible. Roughly speaking, even though they have a `void*` for big objects on the heap, that pointer can be part of a union whose other members are used to hold small data types.
MSalters
@MSalters: Right. I hadn't looked into the variant implementation and it does use a buffer that gets reinterpreted depending on the particular type being used, as you explained. Boost any, on the other hand, is much simpler and uses dynamically allocated memory unconditionally.
David Rodríguez - dribeas
A: 

According to Vijay's solution the implementation will be:

Type* array;
// to initialize the array
array = new Type(size_of_array);
// when you want to add values
array[0] = new Int(42);
// to add another string value
array[1] = new String("fourty two");

The bit missing from his code is HOW to extract those values... Here's my version (actually I learned this from Ogre and modified it to my liking).

Usage is something like:

Any array[4];
// Automatically understands it's an integer
array[0] = Any(1);
// But let's say you want the number to be thought of as float
array[1] = Any<float>(2);
// What about string?
array[2] = Any<std::string>("fourty two");
// Note that this gets the compiler thinking it's a char*
// instead of std::string
array[3] = Any("Sometimes it just turns out to be what you don't want!");

Ok, now to see if a particular element is a string:

if(array[2].isType<std::string>()
{
   // Extract the string value.
   std::string val = array[2].cast<std::string>();
   // Make the string do your bidding!!!... /evilgrin
   // WAIT! But what if you want to directly manipulate
   // the value in the array?
   std::string& val1 = array[2].cast<std::string>();
   // HOHOHO... now any changes to val1 affects the value
   // in the array ;)
}

The code for the Any class is given below. Feel free to use it however you like :). Hope this helps!

In the header file... say Any.h

    #include <typeinfo>
    #include <exception>

    /*
     *    \class Any
     *    \brief A variant type to hold any type of value.
     *    \detail This class can be used to store values whose types are not 
     *        known before hand, like to store user-data.
     */
    class Any
    {
    public:
        /*!
         *    \brief Default constructor. 
         */

    Any(void);

    /*!
     *    \brief Constructor that accepts a default user-defined value.
     *    \detail This constructor copies that user-defined value into a 
     *        place holder. This constructor is explicit to avoid the compiler
     *        to call this constructor implicitly when the user didn't want
     *        the conversion to happen.
     *    \param val const reference to the value to be stored.
     */
    template <typename ValueType>
    explicit Any(const ValueType& val);

    /*!
     *    \brief Copy constructor.
     *    \param other The \c Any variable to be copied into this.
     */
    Any(const Any& other);

    /*!
     *    \brief Destructor, does nothing other than destroying the place holder.
     */
    ~Any(void);

    /*!
     *    \brief Gets the type of the value stored by this class.
     *    \detail This function uses typeid operator to determine the type
     *        of the value it stores.
     *    \remarks If the place holder is empty it will return Touchscape::VOID_TYPE.
     *        It is wise to check if this is empty by using the function Any::isEmpty().
     */
    const std::type_info& getType() const;

    /*!
     *    \brief Function to verify type of the stored value.
     *    \detail This function can be used to verify the type of the stored value.
     *    Usage:
     *    \code
     *    int i;
     *    Touchscape::Any int_any(i);
     *    // Later in your code...
     *    if (int_any.isType<int>())
     *    {
     *        // Do something with int_any.
     *    }
     *    \endcode
     *    \return \c true if the type matches, false otherwise.
     */
    template <typename T>
    bool isType() const;

    /*!
     *    \brief Checks if the type stored can be converted 'dynamically'
     *        to the requested type.
     *    \detail This would useful when the type stored is a base class
     *        and you would like to verify if it can be converted to type
     *        the user wants.
     *    Example:
     *    \code
     *    class Base
     *    {
     *        // class implementation.
     *    };
     *    class Derived : public Base
     *    {
     *        // class implementation.
     *    };
     *
     *    // In your implementation function.
     *    {
     *        //...
     *        // Somewhere in your code.
     *        Base* a = new Derived();
     *        Touchscape::Any user_data(a);
     *        my_object.setUserData(user_data);
     *        // Then when you need to know the user-data type
     *        if(my_object.getUserData().isDynamicType<Derived>())
     *        {
     *            // Do something with the user data
     *        }
     *    }
     *    \endcode
     *    \return \c true if the value stored can be dynamically casted to the target type.
     *    \deprecated This function will be removed and/or changed in the future.
     */
    template <typename T>
    bool isDynamicType() const;

    /*!
     *    \brief Convert the value stored to the required type.
     *    \detail This function is used just like a static-cast to retrieve
     *        the stored value.
     *    \return A reference to the stored value.
     *    \warning This function will throw std::bad_cast exception if it
     *        finds the target type to be incorrect.
     */
    template <typename T>
    T& cast();

    /*!
     *    \brief Convert the value stored to the required type (const version).
     *    \detail This function is used just like static_cast to retrieve
     *        the stored value.
     *    \return A \c const reference to the stored value.
     *    \warning This function will throw std::bad_cast exception if it
     *        finds the target type to be incorrect.
     */
    template <typename T>
    const T& cast() const;

    /*!
     *    \brief Dynamically converts the stored value to the target type
     *    \detail This function is just like dynamic_cast to retrieve
     *        the stored value to the target type.
     *    \return A reference to the stored value.
     *    \warning This function will throw std::bad_cast exception if it
     *        finds that the value cannot be dynamically converted to the target type.
     *    \deprecated This function will be removed and/or changed in the future.
     */
    template <typename T>
    T& dynamicCast();

    /*!
     *    \brief Dynamically converts the stored value to the target type (const version)
     *    \detail This function is just like dynamic_cast to retrieve
     *        the stored value to the target type.
     *    \return A const reference to the stored value.
     *    \warning This function will throw std::bad_cast exception if it
     *        finds that the value cannot be dynamically converted to the target type.
     *    \deprecated This function will be removed and/or changed in the future.
     */
    template <typename T>
    const T& dynamicCast() const;

    /*!
     *    \brief Swaps the contents with another \c Any variable.
     *    \return reference to this instance.
     */
    Any& swap(Any& other);

    /*!
     *    \brief Checks if the place holder is empty.
     *    \return \c true if the the place holder is empty, \c false otherwise.
     */
    bool isEmpty() const;

    /*!
     *    \brief Checks if the place holder is \b not empty.
     *    \return \c true if the the place holder is not empty, \c false otherwise.
     *    \remarks This is just a lazy programmer's attempt to make the code look elegant.
     */
    bool isNotEmpty() const;

    /*!
     *    \brief Assignment operator
     *    \detail Assigns a 'raw' value to this instance.
     *    \return Reference to this instance after assignment.
     */
    template <typename ValueType>
    Any& operator = (const ValueType& rhs);

    /*!
     *    \brief Default assignment operator
     *    \detail Assigns another \c Any type to this one.
     *    \return Reference to this instance after assignment.
     */
    Any& operator = (const Any& rhs);

    /*!
     *    \brief Boolean equality operator
     */
    bool operator == (const Any& other) const;

    /*!
     *    \brief Boolean equality operator that accepts a 'raw' type.
     */
    template<typename ValueType>
    bool operator == (const ValueType& other) const;

    /*!
     *    \brief Boolean inequality operator
     */
    bool operator != (const Any& other) const;

    /*!
     *    \brief Boolean inequality operator that accepts a 'raw' type.
     */


      template<typename ValueType>
        bool operator != (const ValueType& other) const;
    protected:
        /*!
         *    \class PlaceHolder
         *    \brief The place holder base class
         *    \detail The base class for the actual 'type'd class that stores
         *        the value for T

ouchscape::Any.
     */
    class PlaceHolder
    {
    public:

        /*!
         *    \brief Virtual destructor.
         */
        virtual ~PlaceHolder(){}

        /*!
         *    \brief Gets the \c type_info of the value stored.
         *    \return (const std::type_info&) The typeid of the value stored.
         */
        virtual const std::type_info& getType() const = 0;
        /*!
         *    \brief Clones this instance.
         *    \return (PlaceHolder*) Cloned instance.
         */
        virtual PlaceHolder* clone() const = 0;
    };

    /*!
     *    \class PlaceHolderImpl
     *    \brief The class that ultimately keeps hold of the value stored
     *        in Touchscape::Any.
     */
    template <typename ValueType>
    class PlaceHolderImpl : public PlaceHolder
    {
    public:
        /*!
         *    \brief The only constructor allowed.
         *    \param val The value to store.
         */
        PlaceHolderImpl(const ValueType& val)
            :m_value(val){}
        /*!
         *    \brief The destructor.
         *    \detail Does nothing
         */
        ~PlaceHolderImpl(){}

        /*!
         *    \copydoc Touchscape::PlaceHolder::getType()
         */
        const std::type_info& getType() const
        {
            return typeid(ValueType);
        }

        /*!
         *    \copydoc Touchscape::PlaceHolder::clone()
         */
        PlaceHolder* clone() const
        {
            return new PlaceHolderImpl<ValueType>(m_value);
        }

        ValueType m_value;
    };

    PlaceHolder* m_content;
};

/************************************************************************/
/* Template code implementation section                                 */
/************************************************************************/
template <typename ValueType>
Any::Any(const ValueType& val)
    :m_content(new PlaceHolderImpl<ValueType>(val))
{
}
//---------------------------------------------------------------------
template <typename T>
bool Any::isType() const
{
    bool result = m_content?m_content->getType() == typeid(T):false;
    return result;
}
//---------------------------------------------------------------------
template <typename T>
bool Any::isDynamicType() const
{
    bool result = m_content
        ?dynamic_cast<T>(static_cast<PlaceHolderImpl<T>*>(m_content)->m_value)!=NULL
        :false;
    return result;
}
//---------------------------------------------------------------------
template <typename T>
T& Any::cast()
{
    if (getType() != VOID_TYPE && isType<T>())
    {
        T& result = static_cast<PlaceHolderImpl<T>*>(m_content)->m_value;
        return result;
    }
    StringStream ss;
    ss<<"Cannot convert '"<<getType().name()<<"' to '"<<typeid(T).name()<<"'. Did you mean to use dynamicCast() to cast to a different type?";
    throw std::bad_cast(ss.str().c_str());
}
//---------------------------------------------------------------------
template <typename T>
const T& Any::cast() const
{
    Any& _this = const_cast<Any&>(*this);
    return _this.cast<T>();
}
//---------------------------------------------------------------------
template <typename T>
T& Any::dynamicCast()
{
    T* result = dynamic_cast<T>(static_cast<PlaceHolderImpl<T>*>(m_content)->m_value);
    if (result == NULL)
    {
        StringStream ss;
        ss<<"Cannot convert '"<<getType().name()<<"' to '"<<typeid(T)<<"'.";
        throw std::bad_cast(ss.str().c_str());
    }
    return *result;
}
//---------------------------------------------------------------------
template <typename T>
const T& Any::dynamicCast() const
{
    Any& _this = const_cast<Any&>(*this);
    return _this.dynamicCast<T>();
}
//---------------------------------------------------------------------
template <typename ValueType>
Any& Any::operator = (const ValueType& rhs)
{
    Any(rhs).swap(*this);
    return *this;
}
//---------------------------------------------------------------------
template <typename ValueType>
bool Any::operator == (const ValueType& rhs) const
{
    bool result = m_content == rhs;
    return result;
}
//---------------------------------------------------------------------
template <typename ValueType>
bool Any::operator != (const ValueType& rhs) const
{
    bool result = m_content != rhs;
    return result;
}

Now in the CPP file... Any.cpp

#include "Any.h"

static const std::type_info& VOID_TYPE(typeid(void));

Any::Any( void )
    :m_content(NULL)
{
}
//---------------------------------------------------------------------
Any::Any( const Any& other )
    :m_content(other.m_content?other.m_content->clone():NULL)
{
}
//---------------------------------------------------------------------
Any::~Any( void )
{
    SafeDelete(m_content);
}
//---------------------------------------------------------------------
const std::type_info& Any::getType() const
{
    return m_content?m_content->getType():VOID_TYPE;
}
//---------------------------------------------------------------------
Any& Any::swap( Any& other )
{
    std::swap(m_content, other.m_content);
    return *this;
}
//---------------------------------------------------------------------
Any& Any::operator=( const Any& rhs )
{
    Any(rhs).swap(*this);
    return *this;
}
//---------------------------------------------------------------------
bool Any::isEmpty() const
{
    bool is_empty = m_content == NULL;
    return is_empty;
}
//---------------------------------------------------------------------
bool Any::isNotEmpty() const
{
    bool is_not_empty = m_content != NULL;
    return is_not_empty;
}
//---------------------------------------------------------------------
bool Any::operator==( const Any& other ) const
{
    bool result = m_content == other.m_content;
    return result;
}
//---------------------------------------------------------------------
bool Any::operator!=( const Any& other ) const
{
    bool result = m_content != other.m_content;
    return result;
}
Vite Falcon
Forgot to mention. Speed-wise, it's effectively a pointer to the PlaceHolder clas instance, which stores the pointer to the value being passed around. Not TOO much overhead I guess. And it's type-safe and doesn't need complex libraries like Boost.
Vite Falcon