views:

289

answers:

9

As discusses in The c++ Programming Language 3rd Edition in section 12.2.5, type fields tend to create code that is less versatile, error-prone, less intuitive, and less maintainable than the equivalent code that uses virtual functions and polymorphism.

As a short example, here is how a type field would be used:

void print(const Shape &s)
{
  switch(s.type)
  {
  case Shape::TRIANGE:
    cout << "Triangle" << endl;
  case Shape::SQUARE:
    cout << "Square" << endl;
  default:
    cout << "None" << endl;
  }
}

Clearly, this is a nightmare as adding a new type of shape to this and a dozen similar functions would be error-prone and taxing.

Despite these shortcomings and those described in TC++PL, are there any examples where such an implementation (using a type field) is a better solution than utilizing the language features of virtual functions? Or should this practice be black listed as pure evil?

Realistic examples would be preferred over contrived ones, but I'd still be interested in contrived examples. Also, have you ever seen this in production code (even though virtual functions would have been easier)?

A: 

I don't know of any realistic examples. Contrived ones would depend on there being some good reason that virtual methods cannot be used.

Steven Sudit
For the record, I'm not suggesting it's pure evil, just that it is pre-OO.
Steven Sudit
I'd be thrilled to see an explanation for the downvote, but I'm not holding my breath. I believe I've just called your bluff and you've got nothing.
Steven Sudit
I have no idea who downvoted you, but since your answer seems to say that there are no real examples (your second sentence implies that any example would necessarily be contrived) and other answers definitely show otherwise (e.g. structures passed through any form of IPC), so I can see how someone would consider it a bad answer.
Ben Voigt
@Ben: Such a person would have deficiencies in their literacy, as I explicitly stated that any counterexamples would have to depend on the existence of circumstances, such as shared memory, that would prevent virtual methods from being used.
Steven Sudit
And, to be clear, I do see "structures passed through any form of IPC" as contrived because IPC/RPC requires some form of serialization. The contrivance is in demanding that the serialized form be executable as is; a requirement I've never seen, except self-generated by people trying to over-optimize.
Steven Sudit
@Steven: How many programs use a producer/consumer queue to communicate between threads? And how many of them, rather than re-inventing the wheel, re-use an existing API such as Windows message queues or BSD datagram sockets which also accept data from other sources? It really doesn't seem so contrived to me, maybe because I've done it, and the point is that if there's any possibility of an address collision where some other software sends you a message, you had better not interpret that message content as a function pointer or a pointer to an object with a v-table.
Ben Voigt
Especially with interoperating with a framework that controls the main dispatch loop of your thread, you don't have the luxury of defining your own queue which cannot accept messages from outside, you have to use whatever messages the framework listens to. With .NET that's pretty much only the Windows message queue, while the Win32 API defines functions like MsgWaitForMultipleObjectsEx with rich wakeup capabilities, .NET doesn't use them.
Ben Voigt
@Ben: I don't want to drag this out pointlessly, but I'm not sure I understand your example. For inter-thread communication within a process, you can just pass a reference (or pointer) to the instance, and it's fine to call through its vtable. If you mean inter-process communication, particularly where memory is not at all shared, then you're right that this won't work, but as I said before, the right answer would be to convert your instance to a portable representation, as by serialization. In essence, you're selling Discriminated Unions as a serialization format, but I'm not buying.
Steven Sudit
As for .NET, I don't think you want to go there, because that platform features multiple serialization techniques and (as of 4.0) support for producer/consumer queues.
Steven Sudit
.NET queues aren't usable from native C++. For a native C++ component to be compatible with both .NET and native C++ hosts, it had better use window messages for synchronization because .NET threads don't check any other objects in the main message loop. Once you're using window messages, you might receive a message from some other software that decided to broadcast using that particular message number, and of course that message isn't going to follow your rules. Is this getting clearer or more confusing?
Ben Voigt
Also, for .NET producer/consumer queues, is there any way to put a thread to sleep such that it wakes when either the queue becomes non-empty or a UI event occurs? With the Win32 API this is easy, I couldn't figure out a way to do it with the .NET message loop (and .NET modal dialogs don't provide any way, that I know of, to use a custom message loop implemented around MsgWaitForMultipleObjectsEx).
Ben Voigt
My advice about .NET was that you shouldn't go there, but you did anyhow. The producer/consumer queues I spoke of are not related to the Windows message loop. They're not associated with an HWND or even a thread. They're just, you know, queues, except with appropriate synchronization. The consumer blocks until the producer either inserts a work item or shuts the queue down. In short, they have very little to do with what we're talking about here. Let's please not discuss them further.
Steven Sudit
+6  A: 

When you "know" you have a very specific, small, constant set of types, it can be easier to hardcode them like this. Of course, constants aren't and variables don't, so at some point you might have to rewrite the whole thing anyway.

This is, more or less, the technique used for discriminated unions in several of Alexandrescu's articles.

For example, if I was implementing a JSON library, I'd know each Value can only be an Object, Array, String, Integer, Boolean, or Null—the spec doesn't allow any others.

Roger Pate
+1: The obvious instance of a small, constant set of types being an external standard that's unlikely to change quickly (e.g., ISO standards take quite a while to change, even at best).
Jerry Coffin
Ok, I looked at an article, but I didn't see anything that would convince me to use Pascal-style discriminated unions in C++, as opposed to just leveraging polymorphism through virtual methods. What am I missing?
Steven Sudit
I wrote a JSON library where, despite the limited number of objects at the JSON level, it was convenient for me to implement more than one type of Object, which meant subclassing. (One type used a dictionary, the other used reflection on a DTO.) Again, maybe I'm missing something, but the argument for this technique seems underwhelming.
Steven Sudit
@Steven: You can, of course, use polymorphism instead. The above would give you tighter control over allocation (e.g. you can save space space with a small-object optimization) and could be faster in some scenarios. If it's not clear above, my Value type acts like any of the other types in a dynamic-language type of way, and a discriminated union allows a Value object to *change its own type* on the fly. But yes, an Array would still model a vector<Value> and an Object model map<String,Value>.
Roger Pate
@Steven: It's been some time since I read those articles (I still have the deadtree magazines somewhere with them ;), so I can't recall if he covers *motivation* or just assumes that and covers *implementation.*
Roger Pate
The first link, at http://www.oonumerics.org/tmpw01/alexandrescu.pdf, talks about the *features*, but not really the *motivation*.
Steven Sudit
A: 

Aren't there costs associated to virtual functions and polymorphism? Like maintaining a vtable per class, increase of each class object size by 4 byptes, runtime slowness (I have never measured it though) for resolving the virtual function appropriately. So, for simple situations, using a type field appears acceptable.

ArunSaha
The cost of a pointer to a vtable is countered by the cost of holding a type enum. Likewise, instead of checking for each type in a switch/case, we do a straight jump into a table offset. I don't believe there's much speed to be gained here with the type trick.
Steven Sudit
There are costs, yes, but the type field has some similar costs. The type member takes up some space; the switch statement to choose the right code to execute takes some time much as the vtable lookup would. I've also never measured, but I'd be surprised if there was a significant difference in favor of type fields.
JoshD
The difference between check against the type enum vs dynamic call generally is insignificant. However, the static check makes static calls which can be inlined with the potential for big savings.
Ben Voigt
+2  A: 

A type enum can be serialized via memcpy, a v-table can't. A similar feature is that corruption of a type enum value is easy to handle, corruption of the v-table pointer means instant badness. There's no portable way to even test a v-table pointer for validity, calling dynamic_cast or typeinfo to do RTTI checks on an invalid object is undefined behavior.

For example, one instance where I choose to use a type hierarchy with static dispatch controlled by a discriminator and not dynamic dispatch is when passing a pointer to a structure through a Windows message queue. This gives me some protection against other software that may have allocated broadcast messages from a range I'm using (it's supposed to be reserved for app-local messages, do not pass GO if you think that rule is actually respected).

Ben Voigt
If something is randomly corrupting memory, I think you're hosed no matter what. A v-table can be serialized, actually, because you just add another method (or piggy back on RTTI) that returns an enum indicating the type or how the object needs to be deserialized. Or build it into the virtual serialize method.
Roger Pate
@Roger: You haven't fully grasped my message-passing example if you are thinking in terms of random corruption. The difference between `memcpy` and calling a virtual `serialize` method seems to have eluded you as well. Let me tie both of them together with a single example: an shm file. There's no guarantee that v-tables are stored at the same address in all processes sharing the file, so v-table pointers just won't work. You'd need to serialize and deserialize continually. And if a user accidentally overwrites the file, with v-table pointers you cannot detect and report the problem.
Ben Voigt
Uhm, so long as we're within the same process, a memcpy should work fine, since the instance merely contains a pointer to the vtable, not the table itself. The pointer is just as valid after being copied. If you mean full serialization, as to a file, then there are techniques for that as well.
Steven Sudit
I get the idea that people are reading the first sentence of my answer then stopping... Yes there are techniques for reconstructing the dynamic type (meaning vfptr) of an object during serialization, but they are very expensive compared to just storing the bits as a block, and way too expensive for use with shared memory. (They also usually subvert the type system as well, creating objects without first calling the constructor.)
Ben Voigt
@Steven: You can't memcpy a non-POD type; that's UB.
Roger Pate
@Ben: True, I only briefly mentioned it in a comment on my answer, but control over allocation (such as for shared memory) is the trade-off that *didn't* elude me. Perhaps I reacted too strongly to the memory corruption issue: I still say if that's a valid concern, you have a much bigger problem.
Roger Pate
@Roger: I still suggest that data loss, as big a problem as that might be, is nowhere near as bad as jumping and executing code from an unintended address. Any time you have data coming in from an untrusted source (which could be malicious or could, in my original example of broadcast windows messages, simple not be designed to be interpreted in this way) you don't want to interpret it as code pointers, which means that such structures need to contain discriminant data and not v-table pointers.
Ben Voigt
@Ben: I'm definitely not saying to memcpy a non-POD object (anything with a virtual member is automatically non-POD), as per my comment above. Have I missed something, or is there another way you see to execute an unintended address?
Roger Pate
Using a value stored in any shared structure (including posted messages, RPC, datagrams, shared memory, etc) as a pointer to an object with virtual functions and making a virtual call. If the object uses type tags, the worst case from using a bad data pointer is a segmentation fault / access violation, which can be trapped, but with dynamic polymorphism unintended code could be executed.
Ben Voigt
@Roger: Yes, it's definitely UD, but it's quite likely to work, and that's often enough for optimization fetishists. The alternative is to replace the enum with a pointer to an array of function pointers; a do-it-yourself vtable. Copying this would be well-defined. Of course, the fact that you'd be reinventing the vtable is a not-so-subtle hint that maybe you're over-optimizing. :)
Steven Sudit
A: 

I think that if the type corresponds precisely to the implied classes then type is wrong. Where it gets complicated is where the type does not quite match or its not so cut and dried.

Taking your example what if type was Red, Green, Blue. Those are types of shapes. You could even have a color class as a mixin; but its probably too much.

pm100
A: 

I am thinking of using a type field to solve the problem of vector slicing. That is, I want a vector of hierarchical objects. For example I want my vector to be a vector of shapes, but I want to store circles, rectangles, triangles etc.

You can't do that in the most obvious simple way because of slicing. So the normal solution is to have a vector of pointers or smart pointers instead. But I think there are cases where using a type field will be a simpler solution, (avoids new/delete or alternative lifecycle techniques).

Bill Forster
+1  A: 

The following guideline is from Clean Code by Robert C. Martin. "My general rule for switch statements is that they can be tolerated if they appear only once, are used to create polymorphic objects, and are hidden behind an inheritance relationship so that the rest of the system can't see them".

The rationale is this: if you expose type fields to the rest of your code, you'll get multiple instances of the above switch statement. That is a clear violation of DRY. When you add a type, all these switches need to change (or, even worse, they become inconsistent without breaking your build).

Alex Emelianov
A: 

The best example I can think of (and the one I've run into before), is when your set of types is fixed and the set of functions you want to do (that depend on those types) is fluid. That way, when you add a new function, you modify a single place (adding a single switch) rather than adding a new base virtual function with the real implementation scattered all across the classes in your type hierarchy.

Chris Dodd
+1  A: 

My take is: It depends.

A parameterized Factory Method design pattern relies on this technique.

class Creator {
    public:
        virtual Product* Create(ProductId);
};

Product* Creator::Create (ProductId id) {
        if (id == MINE)  return new MyProduct;
        if (id == YOURS) return new YourProduct;
        // repeat for remaining products...

        return 0;
}

So, is this bad. I don't think so as we do not have any other alternative at this stage. This is a place where it is absolutely necessary as it involves creation of an object. The type of the object is yet to be known.

The example in OP is however an example which sure needs refactoring. Here we are already dealing with an existing object/type (passed as argument to function).

As Herb Sutter mentions -

"Switch off: Avoid switching on the type of an object to customize behavior. Use templates and virtual functions to let types (not their calling code) decide their behavior."

Chubsdad
This isn't exactly what I meant. In here, you're still creating either a MyProduct or a YourProduct. Two different classes. My question would involve always creating an 'OurProduct' but that product has a type _in_ the class that determines if it's to behave like a MyProduct or a YourProduct. The actuall classes MyProduct and YourProduct wouldn't exist.This is such with the Shape class in my example. There is no triangle class. There is only a Shape with type == TRIANGLE.
JoshD
"place where it is absolutely necessary"... not so... objects can register their ids and creation functions, or you can successively invoke registered construct-if-you-can functions (read up on creation patterns in GoF), so the kind of centralised switching shown above isn't necessary. Not saying that it isn't still a simpler and perfectly acceptable solution for more localised code bases.
Tony