Tables in a dataset. Quantity and redundancy

views:

answers:

+1 Q:

Tables in a dataset. Quantity and redundancy

Are there any best practices pertaining to the number of tables used in a dataset? In a relational database, all tables eventually relate, so how specific or broad should I be when creating my datasets?

Ex: If I have a dataset for product orders, should I also include the products tables, as well as the customers tables that the orders belong to, the shipping information tables, and so on; or should I simply have the base productorders table and related lookup tables?

+3 A:

I would suggest that you only make your datasets as large as you need them to be. If you are including a lot of extraneous information that will rarely be used, you are going to be taxing your resources a lot more and a lot more often than you should.

TheTXI 2009-06-09 12:58:46

+1 A:

"should I also include ... or should I simply"

What are your use cases? What will people DO with your data? This defines the problem domain. It defines precisely what data must be present.

Read this: http://www.ibm.com/developerworks/web/library/wa-dbdsgn1.html

Additional Notes.

First, a table must map to each real-world thing in the problem domain. Ask your users what they do, what decisions they make, what actions they take. This will define what real-world things require tables.

An order is a thing. A product which is ordered is a thing. A customer is a thing.

Second -- to make the relational model work -- you need to normalize this so that repeating elements are rows in a table. Also, you need to normalize this so that each row has an identifier ("key") for the one distinct thing in that row. Also, you need to normalize this to remove derivable data.

A line on an order is a thing, related to an order.

A feature of a product is a thing, related to the product as a whole.

This "Normalization" process gives precise and complete rules for defining a relational database. It tells you exactly how many tables you'll need to represent the real-world things in your problem domain.

S.Lott 2009-06-09 13:02:03

I regret that I have but 1 vote to give for this answer.

Rob Allen 2009-06-09 13:36:16

Split it based on usage / logical areas If you use some tables together, have it together. If some data is only a reference data, then it should probably go into another dataset... If you're working with orders - load only what's necessary.

Anyway, maybe you'd like to look at more convenient data access methods, like ORMs for example? Try to look at NHibernate and see if it fits your scenarios...

Michał Chaniewski 2009-06-09 13:02:17

+1 A:

I can't say if it is a best practice or not, but where I work we have datasets with over 75 tables that perform just fine. Some of the tables are only a few records, but several have many thousands of records. We are using binary remoting to transport these tables. Not sure if xml serialization would give us the same (or even close to the same) performance. The last time I checked the size of our largest dataset serialized to disk it was near 3mb.

Does anyone else have experience with large datasets? When our project started I never imagined that we would need to pack so much into one dataset, so I have been very pleased with our results.

Kleinux 2009-06-09 13:04:40

I actually work in this environment and I can vouch for this statement. Our datasets perform quite well. From the my perspective, I work on the website, I'm using one very large cached dataset for almost all of my data reads. The speed is pretty phenomenal. It's a bit of an extreme scenario, however.As for what's "best practice". Best practice is whatever works well and is easy to maintain. You can listen to people blow hard about this all day and night, but in the end it only matters that it works and is maintainable.

blesh 2009-07-27 14:48:38

I used to split the model in different datasets.
However, this has bitten me quite a few times.
When there was a relation/reference between tables in the different datasets I had to write a lot of "fixup" code.

The only 2 problems with 1 big dataset imo are a slower design experience and slower object instantiation. (a few ms)

The second is no issue for me since I use 1 dataset per unit of work.

Julian de Wit 2009-06-10 06:27:57

ansaurus

tags:

views:

answers:

Tables in a dataset. Quantity and redundancy

related questions