I have a form where users submit different fields to create events. The number and type of fields requested are different on each form, depending on the category of event. What is the best way of going about designing this database - should the events contain every possible field and simply null the unused fields? Thanks!
That is an answer, but if you want to have it good, you can create separate tables for different types of events, or simply serialize data do one field and save it. That would make searching harder, but is an option.
As an answer I would say, that if there will be many NULLs, do the separate tables, if not, make many fields in one.
It depends on how drastically different your forms are. I say have different fields for each entry item...having 1 field with multiple items in it will just make queries much more difficult. If your forms aren't too different, then 1 table with each possible field would be ok, but if your table has 20+ fields in it, then I'd suggest splitting those tables up. I'd also recommend a header table with a "form type" field to help searches as well.
You should normalize your table as far as you can to reduce the number of nulls in the database. Records should be meaningful if they're stored. One method might be to have a table of categories that relates 1->m to a table of events. Then you could have a table of expected fields in forms (giving each an int id). Then an intermediate table would store the actual submitted data.
catID|Category
|
--------------------
|
eventID | event | catID
|
------------------------------
| |
fldID | fldName | eventID |
| |
----------- -----------
| |
dataID | fldID | eventID | data
I'd think carefully about this abstraction, but you can also have a linked table that contains the event details:
Table Event: id, Name
Table EventDetail: id, EventID, DetailFieldName, DetailText
One Event
record can have many EventDetail
records.
This is flexible, but again there are tradeoffs to consider. Your queries will get more complicated, and presenting the results have an additional layer of indirection (you have to loop through all of the EventDetail
records for a particular Event
record to present the whole thing).
You can go all out and also normalize out the DetailFieldName into an EventDetailField
table if you want.
However, you do end up with only a couple of tables, you can get rid of NULLs completely if you want, and you don't have to design a new table for each particular event type.
Choose your poison. ;) Normalizing has its place, but I've also found that it makes certain tasks very difficult if you normalize too much.
If you begin to consider Joel's advice, please go to here.
or here
or here
And if you don't believe any of them, build the 4 tables he mentions. There's only 4, doesn't take long. Then load some data into them... then try to write the queries you want to write...
Changing Column meaning:
This can really screw with cardinality estimates. You dinner plates might be in the 4 - 20 range, the concert seats between 1000 - 2000. Some cardinality calculations look at the spread from min to max and assume and equal distribution (when lacking other statistics)...
From 4 to 2000 means that anywhere GENERIC_COLUMN = n, the % of rows you'll hit is 1/1996th of the total... but really, if you said where EVNT_TYPE = Dinner and GENERIC_COLUMN = n it would REALLY be between 4 and 20, or 1/16th of the total rows... so a huge swing in the card estimate. (This can be fixed with histograms, but the point of showing the automation issues is just to hightlight that if it's an issue to a machine, it's probably not as clean as it could be.)
So if you were to do this (MUCH BETTER than an EAV but...)
I would recommend creating a view for each object.
Table EVENT ( common fields, Generic_Count) View DINNER ( common fields, Generic_Count as Plates) WHERE type = Dinner View CONCERT ( common fields, Generic_Count as Seats) WHERE type = Concert
Then give NO ONE select against EVENT
But this is where you get into trouble by NOT starting with a conceptual data model first.
You'd have an ENTITY for EVENT and another for DINNER which inherits completely from EVENT and another for CONCERT which inherits completely from EVENT. Then you could set a differentiating column in the inheritance object which let's you set the "TYPE" column and then you could even decide how many tables to build with a flick of a switch. 1 table, 2 tables or 3 tables..
At least you can do that in powerDesigner.
Why is DDL considered so 'bad?
The creation of EAV models and questions like this are organized around the idea that DDL is to be avoided. Why ALTER TABLE when you can INSERT a new attribute row? People make poor data model design decisions based on the wrong Utility Function. These functions are things like 'no nullable columns', 'the fewer the tables the better', 'no ddl just to add a new attribute. Insert into Attribute table instead'.
Think of data modeling like this: sculptors will say that the wood or stone already has the figure inside of the block, they are just removing pieces of it to reveal it.
Your problem space already has a data model, it's just your job to discover it... it will have as many tables and columns as it needs. Trying to force it to conform to one of the above utility functions is where things go horribly wrong.
In your case, would you ever like to know all the events you've added in the past 2 weeks? Now think of the possible models. One table per event type would mean summing over n tables to find that answer and with each new event type a new table added and every "All event" query would be changing. You could build a UNION ALL view of those tables but you'd have to remember to add each new table to the view. Debugging through views like that is a pain.
Assuming that you might want a lot of metrics about ALL events, one table makes more sense (At least for some common portion of your event data - Like Event Name, Sponsor ID, Venue ID, event Start Time, event end time, venue available for setup time, etc.) Those field are (let's stipulate) are common to every event.
So now what to do with the other columns? Two options, nullable fields or vertically partition the table. The later is an optimization of the former. And if you read any database optimization books or blogs the major thing I take from them is that premature optimization kills. I see people implementing lots of strategies for problems before they even know if they will have that problem. A coworker had a slow query he wanted me to help with. It was loaded with optimizer hints. I removed them and the SQL screamed... I don't know WHY he hinted it but he was not doing it effectively and I'm pretty sure he never saw an issue so this was all just premature optimization.
Vertical partitioning is something you do when you have large data volumes and you have some frequently accessed data and other data that is not so useful. You can pack a table with a lot fewer blocks if you only pack some of the commons. More rows per block = faster tablescans... doesn't really affect the speed of finding a single row via an index. As you can see vertical partitioning has a specific problem it can solve (others too like row chaining) so if you're sure that's GOING to be an issue then by all means begin that way.