ansaurus

Question

Should I use a string table to make database more efficient?

Answer 1

+4 A:

What are you measuring efficiency by? Assuming there is no other data associated with each FavoriteFood (in which case obviously you want two tables), a one-table approach is probably more time efficient, as the unnecessary join would incur an extra processing cost. On the other hand, a two-table approach may be more space-efficient, since it takes less space to store an index than a string, but that depends on how the particular database that you're using optimizes storage of repeated strings.

Tyler McHenry 2009-06-21 14:44:51

Actually you are right, join process would consume more time and performance and will be worse than repeated string records.

Braveyard 2009-10-21 07:11:27

Answer 2

+2 A:

In case you have another table to store the strings, it will be easier when you want to update the descriptions, for example, if u need to update all Pizzas to Italian Pizza, then u can do with one row update if u use a separate table. Another advantage would be translations, u can use the other table to store translations of the string in different languages and select the one based on the current language.

But the problem with that approach would be for inserts. U need to insert in both tables and also need to maintain the foreign key constraints, so it adds a bit of complexity to a simple table.

Arvind 2009-06-21 14:48:19

Answer 3

+3 A:

You should be thinking in terms of what makes a good design in terms of your problem domain rather than efficiency (unless you expect to have tens of millions+ rows).

A well designed database should be in 3NF (third normal form). Only denormalise when you have identified a performance problem by measuring.

Mitch Wheat 2009-06-21 14:49:58

Assuming there is no auxiliary data associated with the food items (e.g. nutrition info), then his design is already in 3NF. Not every bit of repeated data must be an integer in order to have a properly normalized database.

Tyler McHenry 2009-06-21 14:59:25

@Tyler McHenry: I didn't say it did, did I? I was making the point that design is more important than optimising a performance problem you don't have.

Mitch Wheat 2009-06-21 15:16:01

I would say that the best reason to go with non-normalized data in this case would be simply because it's user entered data. So getting them to find their already entered string in the 1000 strings that are already entered is going to be difficult. You're going to end up with 6 different variations of each thing anyway, because users will want to be original, and say they like pepperoni pizza, or pizza with just cheese, or pizza with 6 kinds of meat on it. Even though Pizza would just suffice, you're going to get 6 different kinds of pizza, so there isn't much of a point in normalizing it.

Kibbee 2009-06-21 17:14:10

Ugh, "well-designed" != 3NF in a lot of situations.

jfar 2009-09-16 15:56:53

@jfar: thx for the downvote. Would you mind explaining your 'Ugh!' Please give an example where 3NF is bad (apart from cubes).

Mitch Wheat 2009-09-16 16:00:52

3NF is a bad choice if you read performance is critical or for some application development scenarios where your ORM restricted to 1:1 mapping. Sometimes the extra bit of writes/updates and data integrity checks are a small price to pay for faster development. Also, I tend to vote down any "this way is the only way answers".

jfar 2009-09-16 18:21:48

@jfar: that's hasn't been my experience. 3NF is what you should design for, and then maybe (just maybe) in extreme circumstances, denormalise as little as possible. If you have performance problems with a 3NF schema, your indexes are not right.

Mitch Wheat 2009-09-17 00:21:57

Answer 4

+1 A:

Pros for having a separate "Strings" table:

Likely, less space, if strings repeat really frequently
Likely, faster typical queries - because of less I\O

Cons:

You'll write more complex queries to achieve the same result
If the repetition factor is rather small, you'll get higher query execution time. To resolve each ID to string (or back), database server will perform a single lookup (seek operation) per each ID. So you get additional log(Strings.Count()) factor ~ for each query doing this.

But actually this is really effecient. E.g. most of full-text search engines use nearly this approach to store document-word maps.

Alex Yakunin 2009-06-21 17:07:10

ansaurus

tags:

views:

answers:

Should I use a string table to make database more efficient?

related questions