views:

93

answers:

3

I am trying to teach myself how to use SQL, namely mysql.

What I am trying to understand is how to deal with many different types of data with in the same table. Say I am building a web application, and I have many different content types (blog item, comment item, files, pages, forms) that I need to store different data fields for each. Would I create a new table for each different content type since each content type has its own unique field requirements, or is there a better way to do this? It seems a little much to create a new table for content each type. If I had 30 types of content in my web app, that would be 30 tables just for the types, which seems a little much. And, if I had a new content type, I would have to create a new table that contained all the required fields I would need for that type.

Is there a better way to do something like this, when I have many different types of content that each requires different fields of data that needs to go into the database? Can I somehow check to see what type the content is, then select another table that holds all the different field types?

A little confused about what to do.

A: 

Just to give an example:

Stack Overflow itself uses the same database table (called Posts) for questions and answers. Even though these two types of data are not identical, the site creators considered them similar enough to put them into one table. There's a PostTypeId field that says whether this post is a question or an answer. On answers, the Title field would be NULL, on questions, other columns might be ignored.

Comments, on the other hand, are in a different table. Of course you could theoretically put them into the same Posts table and have a PostTypeId for comments. But the overhead this would create (because of the lightweightness of comments) justifies creating a new table.

I know this isn't really an answer, and other developers might even have decided to put questions and answers into different tables; but it gives some perspective. Long story short: It depends :)

balpha
Yes, this is along the lines of what I am looking for. Only, on a larger scale with many many content types.
Nic Hubbard
A: 

Sketch interactions

First try not to think about database design, but how entities should interact between themselves. Think of it as each entity has its own Class, which represents required data.

It's always a good start to take pencil and paper and sketch your interactions between these entities, on what interactions (or relations) are you trying to accomplish. Learning the Database design process

Extendability and reuse

For example you want to have a User, which can post BlogPosts each BlogPost can have a set of Tags and relevant set of Comments. Attachments can be injected into BlogPost and also into Comment.

Reusability and extendability is the key. When sketching your interactions try to isolate dependencies. Think of it in OO manner. Let's explore the Attachment a little more. You can create an Attachment table and then extend Attachement by creating BlogPostAttachment and CommentAttachment where you can easily create relations between these dependable entities. This creates an easily extendable content type which you can further reuse in eg. UserDetailsAttachment

ORM's to rescue

By studying example code usage of Object relational mappers like Doctrine or Propel you can grasp some ideas for table extendabity. Practical examples are always the best one.

Related SO questions, which you may be interested in

I know, it's a long way to go, but considering factors of creating large scale DB applications with many relations and entity types it best to use help of ORM in the long run

Juraj Blahunka
I think what I am looking for here is a EAV model. Is this a good idea?
Nic Hubbard
not exactly.. I offer you a simple way to nicely extend your code by building a flexible database model. ORMs makes your coding easier and less painful. EAV model would suffer from loosing DB integrity, you would have to do all checks yourself.
Juraj Blahunka
A: 

You needn't be afraid of using many many tables - the database will happily deal with lots of them without complaining. If you let each content type have its own table, you get certain advantages:

  1. Simplicity: Each table can be fairly simple, and the constraints are straightforward. For example if ContentType1 has a field with a relation to another table, you can make that a foreign key in the database design and the RDBMS will take care of data integrity for you.
  2. Indexing efficiency: if ContentType2 needs to be indexed by date but ContentType3 needs to be indexed by name (to take a simple example), having them in two separate tables means each index is there for exactly the data it needs and nothing else. Combining them in one table means you need both indexes covering the combined dataset, which is messier and uses up more disk space.

If you need to output a list combining two content types, a UNION of the two tables is both easy; and if you need to do that often with large amounts of data, an indexed view can make it cheap.

On the other hand, if you have two content types which are very similar (as in the StackOverflow case above for example), you can get some advantages from combining them into one table:

  1. Simplicity: You only need to code the table once - if done right (i.e. the two content types are really very similar), this can make your codebase smaller and simpler.
  2. Extensibility: if a third content type crops up which is again similar to the first two, and similar in the same way that the first two match each other, the table can straightforwardly be extended to store all three content types.
  3. Indexing for performance. If the most common way of getting at the data is to combine the two content types and order them by date (say), a field which is common to both content types, then it can be inefficient to have two separate tables which must repeatedly be UNIONed and then sorted. Combining the two content types in one table lets you put a single index on the date field, allowing faster querying (though remember you can get a similar benefit from indexed views).

If you normalize rigorously, you will have a database where every entity type has its own table in the database. However, denormalization in various ways (such as combining two entity types in one table) can have benefits which might (depending on the size and shape of your data) outweight the costs. I'd advise a strategy of keeping all content types separate at least at first, and consider combining them as a tactical denormalization if it turns out to be necessary.

vincebowdren