tags:

views:

465

answers:

5

Is there a compile-time way to detect / prevent duplicate values within a C/C++ enumeration?

The catch is that there are multiple items which are initialized to explicit values.

Background:

I've inherited some C code such as the following:

#define BASE1_VAL    (5)
#define BASE2_VAL    (7)

typedef enum
{
  MsgFoo1A = BASE1_VAL,       // 5
  MsgFoo1B,                   // 6
  MsgFoo1C,                   // 7
  MsgFoo1D,                   // 8
  MsgFoo1E,                   // 9
  MsgFoo2A = BASE2_VAL,       // Uh oh!  7 again...
  MsgFoo2B                    // Uh oh!  8 again...
} FOO;

The problem is that as the code grows & as developers add more messages to the MsgFoo1x group, eventually it overruns BASE2_VAL.

This code will eventually be migrated to C++, so if there is a C++-only solution (template magic?), that's OK -- but a solution that works with C and C++ is better.

+2  A: 

I don't believe there's a way to detect this with the language itself, considering there are conceivable cases where you'd want two enumeration values to be the same. You can, however, always ensure all explicitly set items are at the top of the list:

typedef enum
{
  MsgFoo1A = BASE1_VAL,       // 5
  MsgFoo2A = BASE2_VAL,       // 7
  MsgFoo1B,                   // 8
  MsgFoo1C,                   // 9
  MsgFoo1D,                   // 10
  MsgFoo1E,                   // 11
  MsgFoo2B                    // 12
} FOO;

So long as assigned values are at the top, no collision is possible, unless for some reason the macros expand to values which are the same.

Usually this problem is overcome by giving a fixed number of bits for each MsgFooX group, and ensuring each group does not overflow it's allotted number of bits. The "Number of bits" solution is nice because it allows a bitwise test to determine to which message group something belongs. But there's no built-in language feature to do this because there are legitimate cases for an enum having two of the same value:

typedef enum
{
    gray = 4, //Gr[ae]y should be the same
    grey = 4,
    color = 5, //Also makes sense in some cases
    couleur = 5
} FOO;
Billy ONeal
Sorry, I should have mentioned that the enums should stay ordered consecutively, i.e. MsgFoo1B should be identical to (MsgFoo1A + 1) - my bad.
Dan
+7  A: 

I can think of one bizarre trick you can do to catch this at compile time, but it might not always work for you. Start by inserting a "marker" enum value right before MsgFoo2A.

typedef enum
{
    MsgFoo1A = BASE1_VAL,
    MsgFoo1B,
    MsgFoo1C,
    MsgFoo1D,
    MsgFoo1E,
    MARKER_1_DONT_USE, /* Don't use this value, but leave it here.  */
    MsgFoo2A = BASE2_VAL,
    MsgFoo2B
} FOO;

Then add a file to check that MARKER_1_DONT_USE is not greater than BASE2_VAL (or MsgFoo2A if you like).

extern int IGNORE_ENUM_CHECK[MARKER_1_DONT_USE > BASE2_VAL ? -1 : 1];

Almost every compiler ever written will generate an error if MARKER_1_DONT_USE is greater than BASE_2_VAL. GCC spits out:

test.c:16: error: size of array ‘IGNORE_ENUM_CHECK’ is negative
Dietrich Epp
+1 for evil preprocessor hacks :)
Billy ONeal
It doesn't use the preprocessor, but it is an evil hack nonetheless.
Dietrich Epp
Oops -- misread. Yes, very fun evil hack :)
Billy ONeal
Dan
+1  A: 

You could roll a more robust solution of defining enums using Boost.Preprocessor - wether its worth the time is a different matter.

If you are moving to C++ anyway, maybe the (proposed) Boost.Enum suits you (available via the Boost Vault).

Another approach might be to use something like gccxml (or more comfortably pygccxml) to identify candidates for manual inspection.

Georg Fritzsche
I wish I could upvote this (because I upvote almost anything mentioning boost) but I've not used this specific library before and cant :( Do you have an example which demonstrates how it might solve the OP's problem?
Billy ONeal
@Billy: I mostly only know what Boost.PP can do - i haven't had time to really look into it. The proposed Boost.Enum should be a good start though, using its example for generating the enum definition and also a suitable compile-time check shouldn't be too hard.
Georg Fritzsche
Makes me realize I'm not using Boost nearly enough, or at least nowhere near its functionality - thanks.
Dan
+3  A: 

I don't know of anything that will automatically check all enum members, but if you want to check that future changes to the initializers (or the macros they rely on) don't cause collisions:

switch (0) {
    case MsgFoo1A: break;
    case MsgFoo1B: break;
    case MsgFoo1C: break;
    case MsgFoo1D: break;
    case MsgFoo1E: break;
    case MsgFoo2A: break;
    case MsgFoo2B: break;
}

will cause a compiler error if any of the integral values is reused, and most compilers will even tell you what value (the numeric value) was a problem.

Ben Voigt
Clever -- never occurred to me - kudos.
Dan
+5  A: 

I didn't see "pretty" in your requirements, so I submit this solution implemented using the Boost Preprocessor library.

As an up-front disclaimer, I haven't used Boost.Preprocessor a whole lot and I've only tested this with the test cases presented here, so there could be bugs, and there may be an easier, cleaner way to do this. I certainly welcome comments, corrections, suggestions, insults, etc.

Here we go:

#include <boost/preprocessor.hpp>

#define EXPAND_ENUM_VALUE(r, data, i, elem)                          \
    BOOST_PP_SEQ_ELEM(0, elem)                                       \
    BOOST_PP_IIF(                                                    \
        BOOST_PP_EQUAL(BOOST_PP_SEQ_SIZE(elem), 2),                  \
        = BOOST_PP_SEQ_ELEM(1, elem),                                \
        BOOST_PP_EMPTY())                                            \
    BOOST_PP_COMMA_IF(BOOST_PP_NOT_EQUAL(data, BOOST_PP_ADD(i, 1)))

#define ADD_CASE_FOR_ENUM_VALUE(r, data, elem) \
    case BOOST_PP_SEQ_ELEM(0, elem) : break;

#define DEFINE_UNIQUE_ENUM(name, values)                                  \
enum name                                                                 \
{                                                                         \
    BOOST_PP_SEQ_FOR_EACH_I(EXPAND_ENUM_VALUE,                            \
                            BOOST_PP_SEQ_SIZE(values), values)            \
};                                                                        \
                                                                          \
namespace detail                                                          \
{                                                                         \
    void UniqueEnumSanityCheck##name()                                    \
    {                                                                     \
        switch (name())                                                   \
        {                                                                 \
            BOOST_PP_SEQ_FOR_EACH(ADD_CASE_FOR_ENUM_VALUE, name, values)  \
        }                                                                 \
    }                                                                     \
}

We can then use it like so:

DEFINE_UNIQUE_ENUM(DayOfWeek, ((Monday)    (1))
                              ((Tuesday)   (2))
                              ((Wednesday)    )
                              ((Thursday)  (4)))

The enumerator value is optional; this code generates an enumeration equivalent to:

enum DayOfWeek
{
    Monday = 1,
    Tuesday = 2,
    Wednesday,
    Thursday = 4
};

It also generates a sanity-check function that contains a switch statement as described in Ben Voigt's answer. If we change the enumeration declaration such that we have non-unique enumerator values, e.g.,

DEFINE_UNIQUE_ENUM(DayOfWeek, ((Monday)    (1))
                              ((Tuesday)   (2))
                              ((Wednesday)    )
                              ((Thursday)  (1)))

it will not compile (Visual C++ reports the expected error C2196: case value '1' already used).

Thanks also to Matthieu M., whose answer to another question got me interested in the Boost Preprocessor library.

James McNellis
Looks great. Maybe move the check into a `detail` namespace to keep it out of the way. EDIT: <3.
GMan
@GMan: Good idea. Done.
James McNellis
@GMan: Now that I did that, though... doesn't the Boost.Preprocessor library work with C? (I'd presume that it would.) If so, maybe not using `namespace` would be better so it works for both languages. Eh... I'll think about that in the morning.
James McNellis
@James: Hm, good point. I'm pretty sure Boost.PP (is the only Boost library that) works in C. In C, a prefix of `detail_` would work well enough. You could even use `__cplusplus` to choose, so the same header works on both. (Generating different sources obviously.)
GMan
Woo great :) A pity you could not find a better way to define the value rather than embedding a sequence in a sequence, I'm sure there exists a way to do it with a simple comma, but I'm afraid that would require doing without the help of Boost PP and I haven't had the courage to launch myself into it yet :) As for the switch, may I suggest you write something like a `toString` method ? Instead of being a pure compiler guard we could after all use the method for real work :) */me feels very proud that my response got cited*
Matthieu M.
@Matthieu, the [ENUM macro](http://stackoverflow.com/questions/300592/enum-in-c-like-enum-in-ada/1436615#1436615) uses `=` to define an initial value.
Johannes Schaub - litb
Absolutely right -- "pretty" definitely takes a backseat to robust, correct, and easy to use. And ultimately, the DEFINE_UNIQUE_ENUM() macro hides the complexity. Kudos.
Dan
@litb: the problem with the use of `=` is that you cannot use the enumerated values for anything else afterward. In that case for example, you could not use the `switch` trick.
Matthieu M.