I've seen several questions/discussions here about the best way to handle and persist enum-like values (e.g. http://stackoverflow.com/questions/492096/persisting-data-suited-for-enums , http://stackoverflow.com/questions/256978/how-to-persist-an-enum-using-nhibernate ), and I'd like to ask what the general consenus is.
I've tried to summarize my understanding. I'm marking this community wiki in the hope of getting a sort of expert consensus :-). So here it goes:
In the code
In the code, enums should be handled using either the language's native enum type (at least in Java and C#), or using something like the "typesafe enum pattern". Using plain constants (Integer or similar) is discouraged, a you lose type safety (and make it hard to understand which values are legal input for e.g. a method).
The choice between these two depends on how much additional functionality is to be attached to the enum:
- If you want to put loads of functionality into the enum (which is good, because you avoid switch()ing on it all the time), a class is usually more appropriate.
- On the other hand, for simple enum-like values, the language's enum is usually clearer.
In particular, at least in Java an enum cannot inherit from another class, so if you have several enums with similar behavior which you'd like to put into a superclass, you cannot use Java's enums.
Persisting enums
To persist enums, each enum value should be assigned a unique ID. This can be either an integer, or a short string. A short string is preferred, since it can be mnemonic (makes it easier for DBAs etc. to understand the raw data in the db).
- In the software, every enum should then have mapping functions to convert between the enum (for use inside the software) and the ID value (for persisting). Some frameworks (e.g. (N)Hibernate) have limited suppport for doing this automatically. Otherwise, you have to put it into the enum type/class.
- The database should (ideally) contain a table for each enum listing the legal values. One column would be the ID(see above), which is the PK. Additional columns might make sense for e.g. a description. All table columns that will contain values from that enum can then use this "enum table" as a FK. This guarantees that incorrect enum values can never be persisted, and allows the DB to "stand on its own".
One problem with this approach is that the list of legal enum values exists in two places (code and database). This is hard to avoid and therefore often considered acceptable, but there are two alternatives:
- Only keep the list of values in the DB, generate the enum type at build time. Elegant, but means that a DB connection is required for a build to run, which seems problematic.
- Define the list of values in the code to be authoritative. Check against the values in the DB at runtime (usually at startup), complain/abort on mismatch.