views:

116

answers:

4

In my application I have lots of different data types, e.g. Car, Bicycle, Person, ... (they're actually other data types, but this is just for the example).

Since I also have quite some 'generic' code in my application, and the application was originally written in C, pointers to Car, Bicycle, Person, ... are often passed as void-pointers to these generic modules, together with an identification of the type, like this:

Car myCar;
ShowNiceDialog ((void *)&myCar, DATATYPE_CAR);

The 'ShowNiceDialog' method now uses meta-information (functions that map DATATYPE_CAR to interfaces to get the actual data out of Car) to get information of the car, based on the given data type. That way, the generic logic only has to be written once, and not every time again for every new data type.

Of course, in C++ you could make this much easier by using a common root class, like this

class RootClass
   {
   public:
      string getName() const = 0;
   };

class Car : public RootClass
   {
   ...
   };

void ShowNiceDialog (RootClass *root);

The problem is that in some cases, we don't want to store the data type in a class, but in a totally different format to save memory. In some cases we have hundreds of millions of instances that we need to manage in the application, and we don't want to make a full class for every instance. Suppose we have a data type with 2 characteristics:

  • A quantity (double, 8 bytes)
  • A boolean (1 byte)

Although we only need 9 bytes to store this information, putting it in a class means that we need at least 16 bytes (because of the padding), and with the v-pointer we possibly even need 24 bytes. For hundreds of millions of instances, every byte counts (I have a 64-bit variant of the application and in some cases it needs 6 GB of memory).

The void-pointer approach has the advantage that we can almost encode anything in a void-pointer and decide how to use it if we want information from it (use it as a real pointer, as an index, ...), but at the cost of type-safety.

Templated solutions don't help since the generic logic forms quite a big part of the application, and we don't want to templatize all this. Additionally, the data model can be extended at run time, which also means that templates won't help.

Are there better (and type-safer) ways to handle this than a void-pointer? Any references to frameworks, whitepapers, research material regarding this?

+2  A: 

In this case, it sounds like you should simply use overloading. For example:

#ifdef __cplusplus // Only enable this awesome thing for C++:
#   define PROVIDE_OVERLOAD(CLASS,TYPE) \
    inline void ShowNiceDialog(const CLASS& obj){ \ 
         ShowNiceDialog(static_cast<void*>(&obj),TYPE); \
    }

    PROVIDE_OVERLOAD(Car,DATATYPE_CAR)
    PROVIDE_OVERLOAD(Bicycle,DATATYPE_BICYCLE)
    // ...

#undef PROVIDE_OVERLOAD // undefine it so that we don't pollute with macros
#endif // end C++ only 

If you create overloads for your various types, then you will be able to invoke ShowNiceDialog in a simple and type safe manner, but you will still be able to leverage your optimized C variant of it.

With the code above, you could, in C++, write something like the following:

 Car c;
 // ...
 ShowNiceDialog(c);

If you changed the type of c, then it would still use the appropriate overload (or give an error if there was no overload). It doesn't prevent one from using the existing type-unsafe C variant, but since the typesafe version is easier to invoke, I would expect that other developers would prefer it, anyway.

Edit
I should add that the above answers the question of how to make the API typesafe, not about how to make the implementation typesafe. This will help those using your system to avoid unsafe invocations. Also note that these wrappers provide a typesafe means for using types known already at compile-time... for dynamic types, it really would be necessary to use the unsafe versions. However, another possibility is that you could provide a wrapper class like the following:

class DynamicObject
{
    public:
         DynamicObject(void* data, int id) : _datatype_id(id), _datatype_data(data) {}
         // ...
         void showNiceDialog()const{ ShowNiceDialog(_datatype_data,_datatype_id); }
         // ...
    private:
         int _datatype_id;
         void* _datatype_data;
};

For those dynamic types, you would still not have much safety when it comes to constructing the object, but once the object were constructed, you would have a much safer mechanism. It would be reasonable to combine this with a typesafe factory so that users of your API would never actually construct the DynamicObject class themselves, and so would not need to invoke the unsafe constructor.

Michael Aaron Safyan
That is very unmaintainable. At least for a large codebase
the_drow
How do you handle types added at runtime with this solution?
jopa
@jopa, I've updated to explain that bit. My answer is about providing typesafety to the users of the code, not about making the implementation, itself, typesafe. It seems pretty clear from the question that Patrick has already determined that classes are not viable for the implementation, and hence I do not offer any suggestion for the implementation.
Michael Aaron Safyan
@the_drow, I agree, but the OPs existing codebase is such that this seems like the most practical fix for making it more typesafe.
Michael Aaron Safyan
A: 

I would use traits

template <class T>
struct DataTypeTraits
{
};

template <>
struct DataTypeTraits<Car>
{
   // put things that describe Car here
   // Example: Give the type a name
   static std::string getTypeName()
   {
      return "Car";
   }
};
template <>
struct DataTypeTraits<Bicycle>
{
   // the same for bicycles
   static std::string getTypeName()
   {
      return "Bicycle";
   }
};

template <class T>
ShowNiceDialog(const T& t)
{
   // Extract details of given object
   std::string typeName(DataTypeTraits<T>::getTypeName());
   // more stuff
}

This way you don't need to change ShowNiceDialog() whenever you add a new type you want to apply it to. All you need is a specialization of DataTypeTraits for the new type.

jopa
This requires that all the generic code is templatized (like you did here with ShowNiceDialog), which I don't want, since data types can be added at run time. Also about 25% of my application is written in this generic way, and making that all templated will probably blow up the size of the application.
Patrick
@Patrick I should have read the question until the end -:). With runtime added types my approach doesn't work. Then i would use some adapter factory that creates an object adapter based on the given type hint.Something like the adapters used in eclipse (http://www.eclipse.org/articles/article.php?file=Article-Adapters/index.html).
jopa
+3  A: 

If you don't want a full class, you should read up on FlyWeight pattern. It's designed to save up memory.

EDIT: sorry, lunch-time pause ;)

The typical FlyWeight approach is to separate properties that are common to a great number of objects from properties that are typical of a given instance.

Generally, it means:

struct Light
{
  kind_type mKind;
  specific1 m1;
  specific2 m2;
};

The kind_type is often a pointer, however it is not necessary. In your case it would be a real waste because the pointer itself would be 4 times as big as the "useful" information.

Here I think we could exploit padding to store the id. After all, as you said it's going to be expanded to 16 bits even though we only use 9 of them, so let's not waste the other 7!

struct Object
{
  double quantity;
  bool flag;
  unsigned char const id;
};

Note that the order of elements is important:

0x00    0x01    0x02    0x03
[      ][      ][      ][      ]
   quantity       flag     id

0x00    0x01    0x02    0x03
[      ][      ][      ][      ]
   id     flag     quantity

0x00            0x02            0x04
[      ][      ][      ][      ][      ][      ]
   id     --        quantity      flag     --

I don't understand the "extended at runtime" bit. Seems scary. Is this some sort of self-modifying code ?

Template allow to create a very interesting form of FlyWeight: Boost.Variant.

typedef boost::variant<Car,Dog,Cycle, ...> types_t;

The variant can hold any of the types cited here. It can be manipulated by "normal" functions:

void doSomething(types_t const& t);

Can be stored in containers:

typedef std::vector<types_t> vector_t;

And finally, the way to operate over it:

struct DoSomething: boost::static_visitor<>
{
  void operator()(Dog const& dog) const;

  void operator()(Car const& car) const;
  void operator()(Cycle const& cycle) const;
  void operator()(GenericVehicle const& vehicle) const;

  template <class T>
  void operator()(T const&) {}
};

It's very interesting to note the behavior here. Normal function overload resolution occurs, therefore:

  • If you have a Car or a Cycle you'll use those, every other child of GenericVehicle will us the 4th version
  • It's possible to specify a template version as a catch them all, and specify it appropriately.

I shall note that non-template methods can perfectly be defined in a .cpp file.

In order to apply this visitor, you use the boost::apply_visitor method:

types_t t;
boost::apply_visitor(DoSomething(), t);

// or

boost::apply_visitor(DoSomething())(t);

The second way seems odd, but it means you can use it in a most interesting fashion, as predicate:

vector_t vec = /**/;
std::foreach(vec.begin(), vec.end(), boost::apply_visitor(DoSomething()));

Read up on variant, it's most interesting.

  • Compile time check: you missed one operator() ? the compiler throws up
  • No necessity of RTTI: no virtual pointer, no dynamic type --> as fast as using a union, but with increased safety

You can of course segment your code, by defining multiple variants. If some sections of the code only deal with 4/5 types, then use a specific variant for it :)

Matthieu M.
I know the 'FlyWeight' pattern by name, but I never found a good reason to use it. Now may be a good time. Thanks for the tip.
Patrick
Same question as for the macro solution: How do you handle types added at runtime?
jopa
Well, I had the same question for `Patrick` actually: how are types added at runtime ? Either it's something scary I have no idea of or it's just that we did not understood each other.
Matthieu M.
@Matthieu, luckily it's not self-modifying code. The principle is that my customers can add data types depending on how they want to model their environment. Suppose my application only needs Cars and Bicycles, then I only have logic in my application that works on Cars and Bicycles. But I allow the user to add data types like e.g. Engine, and they can indicate that a Car should refer to an Engine, and that e.g. the speed of a car is determined by the Engine. In my application I can still use car->getSpeed(), but where it gets the speed is completely up to the user to parametrize.
Patrick
Well, not completely I'd guess. I suppose only a handful of types can be considered to get a speed from. You can use *Variant* to this effect: `typedef boost::variant<Engine,Sail> propulsion_type` and then have `car->getSpeed()` refer to it using the visitation explained.It does tie the code to the logic of course, but that's what type safety is about too: you can't ask compilation checks without having something to check for.To avoid surprise, draw the menu by iterating over `propulsion_type` collection. So that if someone adds one type, he'll have to edit it :)
Matthieu M.
+1  A: 

It's perfectly possible to change the packing of a class in, say, Visual Studio- you can use __declspec(align(x)) or #pragma pack(x) and there's an option in the property pages.

I would suggest that the solution is to store your classes in, say, vectors of each data member individually, then each class will hold just a reference to the master class and an index into these vectors. If the master class were to be a singleton, then this could be improved further.

class VehicleBase {
public:
    virtual std::string GetCarOwnerFirstName() = 0;
    virtual ~VehicleBase();
};
class Car : public VehicleBase {
    int index;
public:
    std::string GetCarOwnerFirstName() { return GetSingleton().carownerfirstnames[index]; }
};

Of course, this leaves some implementation details to be desired, such as the memory management of Car's data members. However, Car itself is trivial and can be created/destroyed at any time, and the vectors in GetSingleton will pack data members quite efficiently.

DeadMG
Currently I use the trick that you mention: however, since the generic part of my application currently works with void-pointers, and assumes that they point to stable instances, I need to keep the Car also constantly in memory (which is also 8 bytes), and thus there is no gain regarding memory consumption.I am currently investigating replace the void-pointers by reference-counting smart-pointers, since that would indeed allow me to make the Car's on the fly and also destroy them if nothing points to them anymore. Thanks anyway for the tip.
Patrick
Oh. You should never, ever take dumb pointers and assume they point to stable instances. That's a recipe for failure.
DeadMG
I agree, but 20 years ago - in plain C - that was the only alternative if you wanted to do that kind of generic programming. If I would have to do it again, I wouldn't do it with void-pointers again. On the other hand, if I would have to choose between a dumb-pointer solution that results in a working application, and a clean class-based solution that results in a non-working application (because of performance, because of the memory impact) I would choose the dumb-pointer solution, alas.
Patrick
I guess that re-writing this function is not an option?
DeadMG
This function was just an example. In practice about 25% of the application is written like that.
Patrick